[panda-users] Presenting my thesis work, SysTaint, which employs PANDA for malware analysis

Gabriele Viglianisi vigliag at gmail.com
Tue May 8 05:23:31 EDT 2018


Thanks!

I'd be more than happy to submit pull requests for the individual
plugins, or a subset of those. For the majority of the changes and
plugins that should be easy, although I would probably need some
guidance regarding some matters with the build system.

It could be more difficult to upstream the "main" SysTaint plugin,
which currently requires the use of my TCGTaint plugin as taint
tracking implementation, as it needs for some registers (like ESP and
EBP) to be excluded from tainting. TCGTaint currently requires some
additional hooks in QEMU's TCG code; this can probably be improved, so
that it only requires access to the buffer of intermediate TCG ops,
but that may require some work.

Having a PANDA2 dataset similar to malrec would surely be great, and
my analysis process can indeed be applied to similar PANDA2
recordings. This would also allow researchers to analyze the behavior
of malware samples that were able to contact their C2 server when the
recording was taken but are no longer able to.

I should however note that my analysis process is not currently meant
to be run in an unattended fashion on a dataset, since it requires the
user to manually inspect the intermediate and final results and
provide inputs during some phases (eg. tell it which are the processes
he wants to analyze). It can however surely be improved and built
upon, and of course used, as I did in my experiments, in the attended
manner.


2018-05-07 15:13 GMT+02:00 Brendan Dolan-Gavitt <brendandg at nyu.edu>:
> This is very cool! I am wondering if it might make sense to try to upstream
> this and enable your analyses for processing the samples we receive through
> malrec [1]. I have been meaning to switch malrec over to PANDA 2.0; and
> having these analyses available would be very nice...
>
> [1] https://giantpanda.gtisc.gatech.edu/malrec/dataset/
>
> On Sun, May 6, 2018 at 5:23 AM Gabriele Viglianisi <vigliag at gmail.com>
> wrote:
>>
>> Dear PANDA users,
>> I've finally finished my master thesis and I'm happy to share it with you.
>>
>> I've applied PANDA to the study of malware, with the goals of
>> providing a replacement for debugging and making it easier to study
>> malware that communicate with external servers. Some of the techniques
>> I've used are similar to the ones described in the Dispatcher paper by
>> Caballero et al., but instead of on performing protocol reverse
>> engineering, my focus was on building an easy-to-employ tool to study
>> a sample by inspecting its data-flow.
>>
>> My approach consists in:
>>
>> - Executing the malware sample in a virtual machine (manually or
>> through Cuckoo Sandbox) and obtaining a PANDA recording
>> - Collecting information on all processes in the recording via
>> asidstory and rekall
>> - Using network logs and "stringsearch" to find the processes of interest
>> - Collecting statistics on the functions the malware processes called,
>> and detecting encryption functions
>> - Tracing system calls and applying taint analysis to find
>> data-dependencies between them, as well as the function calls using
>> the tracked data
>> - Logging the collected data to disk, so that it can be interactively
>> queried by an analyst to quickly locate relevant data and code,
>> complementing both Cuckoo Sandbox's analyses and the usual reverse
>> engineering tools.
>>
>> I've made changes to some existing PANDA plugins, and developed some new
>> ones:
>>
>> - Callstack_instr was refactored and expanded, so that it assigns an
>> identifier to each call, allowing per-call information to be
>> collected.
>> - A new "ProcInfoDump" plugin exposes the guest's memory to a
>> python+rekall script embedded in the same process via PyBind11, so
>> that it can be used to quickly inspect memory at various points in
>> time.
>> - "StringSearch2" is an easier to use version of StringSearch
>> - "FnMemLogger" collects statistics about the functions the malware
>> uses, by monitoring the first 5 calls to each function. For each call,
>> it obtains the size, entropy and number of ASCII characters of each
>> buffer the function reads or writes, together with the number of basic
>> block and instructions executed, and the ratio of arithmetic
>> operations over the total. This data is then analyzed by scripts to
>> automatically detect encryption functions via heuristics.
>> - "TCGTaint" is a tcg-based taint tracking implementation, adapted
>> from Qtrace. The way it hooks in QEMU's TCG is not the ideal, but it's
>> fast, flexible and gets the job done
>> - "SysTaint" is the main analysis plugin, it collects information on
>> selected system and function calls, monitoring memory accesses, and
>> employing taint tracking.
>>
>> Additionally, I patched Cuckoo Monitor so that it emits hypercalls
>> when it intercepts a call to a known system library. This allows, when
>> the sample's execution is recorded while it is being analyzed by
>> Cuckoo Sandbox, to be able to quickly jump from the entries in
>> Cuckoo's behavioral log to the in-depth data collected by SysTaint.
>>
>> I tested my work by analyzing the execution of Zeus, Citadel, Dridex
>> and Emotet, locating the data sent through the network, finding its
>> provenance, and the code that transformed and encrypted the original
>> data. More details are provided in the thesis. There are still many
>> things that can be improved, but as a prototype this tool works
>> already, and can provide the analysts with plenty of information
>> without having to debug the malware or execute it more than once.
>>
>> You can find my work here:
>>
>> - Thesis:
>> https://www.gabrieleviglianisi.com/files/GabrieleViglianisi-SysTaint-Thesis.pdf
>> - Thesis defense slides:
>>
>> https://www.gabrieleviglianisi.com/files/GabrieleViglianisi-SysTaint-Thesis-Defense.pdf
>> - PANDA fork with the added plugins: https://github.com/vigliag/panda
>> - Fork of Cuckoo Monitor with the added hypercalls:
>> https://github.com/vigliag/cuckoo_monitor_panda
>> - I can also share my python scripts and jupyter notebooks. They still
>> need some cleanup, but feel free to ask if interested
>>
>> Please feel free to contact me if you have any questions. I'd also be
>> happy to upstream my changes and plugins to PANDA.
>>
>> Best regards,
>> Gabriele
>> _______________________________________________
>> panda-users mailing list
>> panda-users at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/panda-users
>
> --
> Brendan Dolan-Gavitt
> Assistant Professor, Department of Computer Science and Engineering
> NYU Tandon School of Engineering


More information about the panda-users mailing list