Research Seminar

Identifying and monitoring phishing campaigns through analysis and de-obfuscation of malicious code.

A seminar by Professor Ettore Merlo, presented by IMC2

Summary:

Phishing is a growing problem today. Attackers (“phishers”) typically deploy source code on a host website to trick a user into providing certain personal information of interest to the attackers (identity, credit card number, etc.).

In this talk, we present the techniques and experimental results obtained from the analysis of source code belonging to phishing kits collected during attacks. The results show that up to 90% of the analyzed kits share 90% or more of their code with at least one other kit in the database. A plausible genealogy of the evolution of phishing kits can be obtained based on source code similarity.

Obfuscation and dynamic code generation are often used by malicious code authors to attempt to evade detection. In this talk, we also present methods for analyzing obfuscation structures and mechanisms to identify and reveal hidden malicious code.

We also present the experimental results of risk reduction based on the temporal analysis of the similarity between phishing kits. This temporal analysis can reduce exposure to risks from new phishing schemes and can also identify and monitor the evolution of phishing campaigns.

Detailed description:

In this talk, we present the techniques and the experimental results obtained from the analysis of phishing kits source code that have been collected during phishing attacks and recovered by forensics teams. Phishing kits are ready-to-deploy sets of files that can be simply copied on a web server and used almost as they are.

Reported experimental results show that as much as 90% of the analyzed kits share 90% or more of their source code with at least another kit in the database. Differences are small in length and less than about 1000 programming words, such as identifiers, constants, strings or other values, in 90% of cases.

A plausible genealogy of phishing kits modifications and evolution can be obtained by identifying and linking together kits with the highest degree of similarity. Reconstructed plausible genealogies of phishing kits show that often kit generation is based on identical or near-identical copies at low cost changes.

Obfuscation and dynamic code generation are often used by malicious code authors as an attempt to evade detection by relying on dynamic code generation at runtime to deobfuscate and execute a hidden payload. In this talk, we also present a method for analyzing malicious code in presence of dynamic code obfuscation.

When dynamic code execution is programmed, code analysis becomes excessively difficult to perform, without executing the source code. Therefore, our method makes use of static analysis and dynamic interpretation of code, to decode obfuscated fragments of source code and reveal their hidden intents. Several identified obfuscation methods are an indication of maliciousness and can therefore help us to classify the revealed source code as malicious.

Obfuscation techniques can be highly intricate. However, the very effort invested by attackers in obfuscation and the structures they have designed and reused across attacks can also serve as a distinctive signature of the attack designer. In this talk, we present how to analyze the structures of these obfuscation mechanisms to identify distinctive designer signatures of malicious software.

Experiments have been performed on two extensive datasets comprising over 30,000 phishing kits, that are written in PHP, JavaScript and HTML, for several hundreds million lines of code (MLOC).

In the experimental datasets, we identified approximately 18,000 instances of dynamically generated code, resulting in 569 unique designer signatures.

One remarkable advantage of our method compared to the state-of-the-art approaches is that it can extract a useful partial signature, even if the deobfuscation process remains incomplete. Other methods heavily rely on the payload extraction, making them useless when deobfuscation fails.

Defense tools based on phishing kits similarity can anticipate new and unknown future attacks and are more robust with respect to changes and variations made in an attempt to avoid detection. In this talk we present experimental results of risk reduction based on temporal analysis of similarity among phishing kits, among distinct signature collections over time, and among payloads. Similarity based temporal analysis can reduce risk exposure to new phishing schemes and also can identify and monitor the evolution of phishing kits campaigns, obfuscation signatures, and payload deployment strategies.