Nearly any malware today uses the Internet for communication. Often to download second stage malware, to register at its command and control server, or to spread and propagate. By capturing and analyzing suspicious Internet traffic during the execution, a malware analysis system can detect various interesting artifacts such as domains or IPs used to host malicious content. However allowing Internet access for a malware analysis sandbox also has a big drawback: fingerprinting.
How does fingerprinting work? Check out the chart below:
The bad guys start by submitting an info collector program to the target sandboxes. The info collector will enumerate all system settings, hardware ids, serial, tokens, etc., and will report back all the data via Internet.
A new malware is then created which embeds the previously collected unique hardware ids and checks for them early during its execution. If some of the ids match, the malware sample will not exhibit any malicious behavior. This approach is very simple to implement and does not require many resources. Even if the bad guys do not have direct access to a particular sandbox, the fingerprinting can still succeed due to the global sharing of malware samples.
Today fingerprinting is used a lot and poses a big problem for malware analysis systems. There was even a public online tracking page (http://blog.kleissner.org/?p=588
) showing IP addresses, user names, etc., of online sandboxes:
We also found various info collectors (e.g. 26b79a7370720b0822bb786043b86448) over the last months:
Adaptive Internet Simulation
To solve the problem of fingerprinting, sandbox vendors lately introduced randomization. Randomization will generate random values for all ids and serial numbers. However randomization has several shortcomings. First not all serials and ids can be randomized. Many ids are used by the license verification of the operating system, and changing them will trigger the verification check. Next the number of unique ids, names and settings is enormous, and various tokens influence the system. Finally randomization today is done during the installation of the sandbox. This means that a system is randomized once and then will stay the same for months: more than enough time to do fingerprinting.
We have come up with a different idea to solve the problem which we call Adaptive Internet Simulation (AIS). AIS is a full network proxy which sits between the sandbox and the Internet:
The networks proxy has two main goals:
- Prevent leaks such as hardware ids and serials.
- Simulate where appropriate.
To achieve both goals, we developed a port independent protocol identification engine, a flexible configuration syntax to define what traffic is considered a leak, and a generic simulation framework.
An info collector running on Joe Sandbox with AIS will not be able to leak the collected sensitive data anymore, however it still runs as if it were connected directly to the Internet. So AIS is nearly transparent.
Extensive tests have shown that AIS works very well without any impact on the behavior of the malware.
Besides preventing fingerprinting, AIS has a number of additional nice features:
- Block "noise traffic" from the OS, such as updates or notifications.
- Simulation in cases where an IP or a domain is not available anymore.
- Simulation in cases where resources (e.g. files) are no longer available.
is currently available as a plugin for various Joe Sandbox products.
Example reports (sample source Threatwave