How to implement brand identification logic using (mostly) Azure services to detect potential phishing campaigns targeting your organisation.

Introduction

In 2020 I participated in the XSOAR hackathon to develop automations/playbooks for Palo Alto’s SOAR platform. The playbook I developed ingested Certificate Transparency logs to find potential phishing campaigns where a screenshot was created for each potential target. The screenshot was sent to Google’s Vision API to identify potential brands and increase the alert’s risk score if the logo belonged to a specific company.

I always wanted to re-create brand identification logic using mostly Azure services in order to integrate the functionality with Microsoft Sentinel. This post will describe how to deploy these components.

Since the Azure logo detection API (part of Cognitive Services) rarely works on local brands I’ll add two examples on how to use both the Azure API and the Google API with Azure Functions.

Components

A minimal deployment consists out of three main components:

  • Azure Function containing Python code to create a screenshot and call the Azure Cognitive Services API or Google Vision API;
  • Azure Key Vault storing our secrets;
  • Either an Azure Cognitive Vision service or Google Cloud project with the Vision API enabled to identify brands on images.

The sections below describe how to configure the components manually/interactively but typically Terraform/Bicep is used.

Dependencies

  • Visual Studio Code (with the Azure Functions and Azurite addons) Python 3.10 (optionally via pyenv)
  • An active Azure subscription
  • Optional but most acurate for brand detection: A Google Cloud subscription

How to

Step 1: Create/register a Vision API

Option 1: Azure Cognitive Services

Log in to Azure portal and create a new Computer Vision services form the Cognitive Services overview. After filling in all the details you’ll be presented with the newly created resource. If networking details were filled in the service can be accessed over the internet as long as you have the necessary secrets/keys. You can find the keys to access the newly created resource under the Keys and Endpoint section. We’ll use these credentials and store them in Azure Key Vault later.

Option 2: Google Vision (most accurate)

To use Google Vision’s API navigate to the Google marketplace and enable the API for your project.

After enabling the API you will be redirected to the management page for the API service. It should prompt you to create a new set of credentials. Follow the steps to create a new service account and download the private key file (json) at the last page. The individual values stored inside the json file will be used with Azure Key Vault later.

Note: You can also choose to use a static Google Cloud API key but that won’t be covered as part of this post.

Step 2: Create Azure Function

Register a new Azure Function (Python 3.10 + consumption-based) and enable authentication via Azure AD by following the steps described here.

Note: The previously linked page mentions an Azure App Service but the authentication configuration is identical for Azure Functions.

Following the previously described steps will enforce authentication and redirect all unauthorized requests back to the Microsoft login page for our Azure Function. We’ll also need to enable and assign a managed identity to our Azure Function via the Identity page in order to access the required API secrets from a key vault.

Next up is adding a post build configuration to install our headless browser after deploying our app (described later on this page). Open the Configuration settings on the Azure Functions’s page and create a two new application settings:

POST_BUILD_COMMAND

PYTHONPATH=/tmp/zipdeploy/extracted/.python_packages/lib/site-packages /tmp/oryx/platforms/python/3.10.4/bin/python3.10 -m playwright install --with-deps chromium

PLAYWRIGHT_BROWSERS_PATH

/home/site/wwwroot

Step 3: Create a new KeyVault

Create a new Azure Key Vault and add the required secrets depending on which service you will be using. For Google cloud you’ll have the open the previously created service account credential file (json) and copy/paste the values accordingly. Your Key Vault should contain the following values in order for the example function to work:

Option 1: Azure Secrets

  • az-key: Either key 1 or key 2 of the vision workspace
  • az-endpoint: The endpoint of the vision workspace

Option 2: GCloud Secrets

  • g-project-id: value of project_id (gcloud json)
  • g-pkey: value of private_key (gcloud json)
  • g-private-key-id: value of private_key_id (gcloud json)
  • g-client-email: value of client_email (gcloud json)
  • g-client-id: value of client_id (gcloud json)
  • g-cert-url: value of client_x509_cert_url (gcloud json)

Now we’ll have to provide access to our Azure Function From the Access Control menu, select Add Role Assignment. For this example we’ll use a built-in role called the Key Vault Secrets User. In the Members section select the role and the choose previously created/assigned managed identity from our Azure Function.

Step 4: Deploy Azure Function

If you’re already familiar with developing Azure Functions, you can safely skip the first section. We’ll be setting up a development environment for VSCode + Python (assuming you have a Mac).

Setting up the dev environment

Install the Azure Functions Core Tools as described here.

Azure Functions support Python version 3.10. The easiest way to manage different versions of Python is using pyenv, which is what I typcally be use when starting a project.

mkdir logodetection # Create a new directory for your project
cd logodetection # Enter newly created directory
pyenv install 3.10 # Install Python version 3.10
pyenv local 3.10 # Set the default version to 3.10 for the current directory

Note for Apple Sillicon users: Unfortunately the function tools are not natively supported for ARM based devices. Microsoft documented a workaround by emulating your terminal using Rosetta but a much better patch is mentioned on Github. Save the contents from the post to a file called Makefile and execute make install_func_arm64_worker

Create a new function in VSCode

Start op Visual Studio Code and open the previously created project directory. Select the Azure icon on the Visual Studio code menu and press the + icon on the to create a new Azure HTTP function (see screenshot below).

Be sure to select the Programming Model V2 for Python when prompted. When asked for Authentication methods, select Anonymous. Authentication is handled using Azure Active Directory as configured in Step 2. VSCode should have created a directory structure with the necessary files required to deploy your app.

Replace the api/function_app.py and api/requirements.txt files with the contents from my repository. The functions_app.py file contains all the logic to create a screenshot using Playwright and call either the Azure or Google API to fetch potential logos/brands identified in the image. The code also contains some custom headers to hide the fact that we’re using a specific browser/headers. You should now be ready to deploy your code to Azure.

Open the Azure plugin in VSCode. The previously created Azure Function should be visible from the Resources pane. Select the name of your Azure Function, in my case it is called LogoDetect, and select Deploy to Function App.

When the deployment is finished the output displays the HTTP Trigger URLs for both the Azure Cognitive Services and Google’s Vision API integration.

If we navigate to one of the URLs (in my case Im using Google Cloud) and specify the uri we should receive a list of logos/brands recognised . This only works when succesfully authenticated of course!

Example of ING Group

Example of belastingdienst

And thats it! You can now start using the Azure Functions in your workflow to determine of specific URLs are targeting a specific organisation 🎉🎉

If you use Microsoft Sentinel you can enrich your incidents by running the Function App via Logic Apps and check if reported phishing URLs are targetting a specific brand or combine this with monitoring public sources (like certificate transparency) to prioritise alerts.

Final note

The example does not make use of any caching. Ideally each unique URL’s response is cached for a fixed amount of time to prevent a sudden increase in costs for using Azure Cognitive Services API or Google’s Vision API.

Furthermore, in theory phishing pages can check if specific IP ranges belong to Microsoft and thus block requests from our Azure Function. However, I’ve been building other tools to track phishing campaigns and luckily haven’t seen my requests being blocked so far 😉