How to Detect and Classify Metamorphic Malware with Sequential Pattern Mining (MalSPM)

Malware are malicious software that can harm computers and networks by stealing data, encrypting files, or damaging devices. Malware are a serious threat to cybersecurity, especially when they can change their appearance to evade detection by antivirus software. This is called metamorphic malware, and it is a challenging problem for malware analysis and classification.

In this blog post, I will describe a new method called MalSPM (Metamorphic Malware Behavior Analysis and Classification using Sequential Pattern Mining) that can detect and classify metamorphic malware based on their behavior during execution. The MalSPM method was presented in a research paper that you can read for more details:


Nawaz, M. S., Fournier-Viger, P., Nawaz, M. Z., Chen, G., Wu, Y. (2022) MalSPM: Metamorphic Malware Behavior Analysis and Classification using Sequential Pattern Mining. Computers & Security, Elsever, to appear

I will now explain what are the main features of metamorphic malware, how MalSPM analyzes them using sequential pattern mining (SPM), and what are the advantages of using MalSPM.

What are metamorphic malware?

Metamorphic malware are malware that can modify their code or structure without changing their functionality. This means that they can produce different variants of themselves that look different but behave the same. For example, a metamorphic virus can change its encryption algorithm or insert junk code into its body to avoid being recognized by signature-based antivirus software.

Metamorphic malware pose a serious challenge for malware detection and classification because they can bypass static analysis techniques that rely on code similarity or predefined patterns. Therefore, dynamic analysis techniques that monitor the behavior of malware during execution are more suitable for dealing with metamorphic malware.

How does MalSPM analyze metamorphic malware?

MalSPM is a method that uses sequential pattern mining to analyze and classify metamorphic malware based on their behavior during execution. SPM is a data mining task that consists of finding frequent subsequences in a dataset of sequences. In the case of MalSPM, SPM was applied to a dataset that contains sequences of API calls made by different malware on the Windows operating system (OS). This allows to extract patterns representing the characteristics of different families of malware. API calls are functions provided by the OS that allow applications to perform various tasks such as accessing files, creating processes, or sending network packets. API calls are an attractive and distinguishable feature for malware analysis and detection because they can reflect the actions of executable files.

MalSPM first applies SPM algorithms to find patterns indicating frequent API calls in the dataset. These patterns can be of different types such as sequential rules between API calls as well as maximal and closed sequences of API calls. These patterns represent common behaviors of different types of malware such as ransomware, trojan, and worm. For example, here are a few sequential patterns that were extracted by MalSPM from the dataset of malware API calls:

Each line in this table is a pattern. For example, the first line indicates that a frequent pattern for a malware is to call the API NtClose, followed by NtQueryvalueKey, and then followed by NtClose, and that this pattern appears in 919 malware sequences.

Then after extracting the patterns, MalSPM uses them for the classification of different malware. This is done by using the discovered patterns as feature to train classifiers. In this paper, the performance of seven classifier was compared using various metrics. Moreover, the performance of MalSPM was compared with state-of-the-art malware detection methods and it was found that MalSPM outperformed these methods.

Here is a picture that illustrates the overall process of malware detection using MalSPM.

What are the benefits of using MalSPM?

MalSPM has several benefits for malware detection and classification.

First, it can handle metamorphic malware that can change their appearance by focusing on their behavior rather than their code.

Second, it can discover common and specific behaviors of different types of malware by using SPM techniques.

Third, it can achieve high accuracy and efficiency by using effective pruning strategies and database projection. Fourth, it can provide interpretable results by using sequential rules and patterns that can explain the logic behind the classification.

Code and datasets

For more details, please see the research paper of MalSPM. The datasets can be found here. And if you want to try the algorithms for extracting sequential patterns from that paper, please see the SPMF data mining software, which offers very fast implementations of those algorithms.

Conclusion

In conclusion, MalSPM is a novel method that uses sequential pattern mining to analyze and classify metamorphic malware based on their behavior during execution. It can deal with the challenges posed by metamorphic malware and provide useful insights for cybersecurity researchers and practitioners. For more details, please see the research paper.

==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

This entry was posted in Data Mining, Industry, Pattern Mining, spmf and tagged , , , , , , , , . Bookmark the permalink.

One Response to How to Detect and Classify Metamorphic Malware with Sequential Pattern Mining (MalSPM)

  1. Pingback: What are the applications of sequential pattern mining? | The Data Blog

Leave a Reply

Your email address will not be published. Required fields are marked *