Scalable, behavior-based malware clustering

Bayer, Ulrich;Comparetti, Paolo Milani;Hlauschek, Clemens;Kruegel, Christopher;Kirda, Engin
NDSS 2009, 16th Annual Network and Distributed System Security Symposium, February 8-11, 2009, San Diego, USA

 

 

 

Anti-malware companies receive thousands of malware samples every day. To process this large quantity, a number of automated analysis tools were developed. These tools execute a malicious program in a controlled environment and produce reports that summarize the program's actions. Of course, the problem of analyzing the reports still remains. Recently, researchers have started to explore automated clustering techniques that help to identify samples that exhibit similar behavior. This allows an analyst to discard reports of samples that have been seen before, while focusing on novel, interesting threats. Unfortunately, previous techniques do not scale well and frequently fail to generalize the observed activity well enough to recognize related malware. In this paper, we propose a scalable clustering approach  to identify and group malware samples that exhibit similar behavior. For this, we first perform dynamic analysis to obtain the execution traces of malware programs. These execution traces are then generalized into behavioral profiles, which characterize the activity of a program in more abstract terms. The profiles serve as input to an efficient clustering algorithm that allows us to handle sample sets that are an order of magnitude larger than previous approaches. We have applied our system to real-world malware collections. The results demonstrate that our technique is able to recognize and group malware programs that behave similarly, achieving a better precision than previous approaches. To underline the scalability of the system, we clustered a set of more than 75 thousand samples in less than three hours.


Type:
Conférence
City:
San Diego
Date:
2009-02-08
Department:
Sécurité numérique
Eurecom Ref:
2783
Copyright:
© ISOC. Personal use of this material is permitted. The definitive version of this paper was published in NDSS 2009, 16th Annual Network and Distributed System Security Symposium, February 8-11, 2009, San Diego, USA and is available at :
See also:

PERMALINK : https://www.eurecom.fr/publication/2783