Detecting Phishing Pages with Template Matching

Published on: 23.11.2017

Password Fishing (Phishing) has become the Number One vector of cybersecurity incidents and data breaches. According to a recent SANS Institute report, 95% of all incidents start with phishing attacks targeting any size of companies, from small family-owned businesses to large multinational corporations.

Generating Phishing pages is very easy and can be automated easily. Detecting the attack is the difficult part. Under these circumstances, security professionals have to rely on advanced sandbox solutions to detect the attack, but many dynamic analysis products don't even offer Phishing detection.

A standard Phishing attack starts with an email, designed to trick users by making them believe it is legitimate and contains very urgent/good news:

Or that something is wrong and their action is needed:

When the victim clicks on the embedded link, the web browser will be redirected to a page similar to Google, Amazon etc. On that page they will be asked to divulge sensitive personal information like passwords, usernames, credit card numbers, etc:

In v19 of Joe Sandbox we added a powerful new feature, the Phishing detection. When you submit a URL to Joe Sandbox, the analysis engine will automatically open Internet Explorer and navigate to the link. Even more, all links on the page are clicked upon in order to get the full behavioral picture. On each page Joe Sandbox checks for various artifacts:

Is there a password login and no HTTPS?
Are there non-working links?
Does the page header match the URL?
How many links are on the page?
Where is the password sent to?
etc...

Adopting this methodology, Joe Sandbox is able to do a first good classification. However, it is more and more common that Phishing pages do not show any bad or suspicious characteristics. They use HTTPS, have working links and everything looks legit. To detect those tough cases we have developed a new technology, called template matching.

Template matching is a known image problem where the challenge is to find a smaller image in a bigger one. A good example is detecting a person on an image when we only have the pass photo of that person:

Template

The same problem & solution can be applied to the Phishing detection case:

Given a web page which has a password login
Check if a known logo is present on the page
If a match is found, check if the URL is correct

Let's take as an example the Yahoo Phishing sample above. In this case, we start with checking the existence of the Yahoo logo or various other logos on the webpage:

Template

If there is a match, we could then check if the URL belongs to Yahoo or not. By default, this works very well. But what if the scale, color, and size of the image vary? Well, we can change the picture to grayscale, detect edges and compare the images with various scales.

In Joe Sandbox v21, we implemented a full-blown logo template matching engine to detect Phishing:

The full analysis report is available here:

Interested in Joe Sandbox? Register for free at Joe Sandbox Cloud Basic or contact us for an in-depth technical demo!

Solutions

Sandbox

Lab

Agent

Plugins

Endpoint

Overview