COZZI Emanuele

Person has left EURECOM
  • COZZI Emanuele

Thesis

Binary Analysis for Linux and IoT Malware

Objective
Security companies collect million of malware samples every day. This big-data aspect is a
new concept in malware analysis but it is certainly here to stay. On top of traditional samples,
the upcoming Internet of Things (IoT) revolution will inevitably increase both the amount and
the diversity of the collected artifacts.
However, despite its promises, big data collection has so far brought to our field more
challenges than advantages – mainly resulting in a burden for researchers and malware analysts.
In fact, on the one hand more samples mean less time to analyze them and larger infrastructures
required to store the files and execute them in dynamic analysis sandboxes. On the other hand,
security companies are clearly struggling to sift through this increasing amount of data in the
attempt to extract some actionable intelligence to better protect their customers and improve
their services. As a result, while there is a clear global trend towards collecting more and more
data, most of this data is just sitting unused on some server, taking terabytes of storage space
without actually being used, exploited, and often even properly understood by the company
that collect it.
On top of this poor understanding of big malware dataset, new advanced techniques are
making the analysis of individual samples more complex and more time-consuming. For instance,
ROP-only malware, disk-less samples, and advanced obfuscations are reducing our
ability to automatically process and understand new malicious files.
The goal of this thesis is to harness the information stored in large malware datasets to
improve the samples analysis, provide intelligence information, detect correlation, or simply
study trends and evolution of different techniques used by malware writers. In this challenging
context, this dissertation will also explore new techniques to extend current static and dynamic
analysis approaches to the analysis of novel and sophisticated malware samples. This can
involve heavily obfuscated and packed binaries or new form of malicious code and will rely
on existing large-scale malware collection systems to provide the required data to conduct
experiments.
Research Overview
The first objective of this thesis is to advance the state of the art in binary and malware analysis.
For example, recent efforts have been done to better understand packed samples [7], better analyze
their behavior [3,8], or to reverse new form of advanced rootkits [5]. However, these works
only scratched the surface of the techniques we need to analyze complex malware samples –
both in a fully automated fashion and as tools to support manual reverse engineering.
1
A second objective of this thesis is the investigation of new form of malware, starting from
malware running on other operating systems of platform. Only recently researchers have started
looking at more “exotic” form or malware [4], but there is still a lot to explore in this area. For
example, as a starting point we plan to develop an open source infrastructure to analyze Linuxbased
malware samples. Internet routers and IoT devices are rapidly becoming prime targets
for malicious code – ranging from simple botnet to more sophisticated targeted attacks. Unfortunately,
the security industry is still largely unprepared for this threat. Most of the tool and
the knowledge about the behavior and the characteristics of malware derives from a decade
of research on Windows binaries. However, Linux samples have its unique set of characteristics,
including the widespread use of static linking, the broad set of CPU architectures, its own
packing ecosystem, and completely different techniques to achieve persistence and process infection.
This task includes the development of dedicated tools, as well as their application
to tens of thousands Linux malware samples – with the goal of extracting and measuring the
prevalence of different techniques and the characteristics of this rapidly increasing form of malware.
As a result, this part of the project would not only produce a usable platform, but also a
precious knowledge base about the behavior and key indicators of Linux malware – that can be
extremely useful for malware analysts, to improve the detection of these samples, and to guide
incident response on infected devices.
Finally, part of the research in this area will also focus on the problem of cyber-attribution [1,
2] – proposing new techniques to identify reused components and detect malware samples
likely developed by the same group. As currently pointed out by Graziano et al. [6], the current
malware collection infrastructure is very efficient, but the vertiginous amount of samples analyzed
every day in dynamic analysis sandboxes makes it impossible to tell apart the interesting
malware from the surrounding noise of less relevant samples.