[panda-users] A TCG-based taint plugin for Panda

Mon Nov 20 17:46:04 EST 2017

Performance-wise, I haven't profiled the code, so I don't know if the
shadow memory is a bottleneck, it's probably worth investigating, but the
performances have been good in general.

I still haven't been able to track down the problems I was having with
taint2 (I have to repeat the tests now that I fixed the problem with
launching panda from path, maybe it was linking the helpers wrong), but in
my tests tcgtaint was more accurate and ended up tainting far fewer
locations with much smaller labelsets, so the performance improvements may
also be due to the smaller amount of data to handle.

I'm also not tainting in exactly the same way, as with taint2 I was
delaying tainting at the end of the block (which may not have been a good
idea to begin with), as it wouldn't allow me to taint after memory write...
there are a bunch of variables involved. I can maybe do a benchmark without
tainting anything and report back the results.

Tainting more instructions may require some effort. Most of the SSE
instructions (/target/i386/ops_sse.h) for example are implemented via
helpers, and tainting those would, as far as i can tell, require manually
adding tainting-propagating code to each helper. Which is probably just a
couple lines of C, but in too many different places.

Leaving aside the instructions relying on helper calls, there's probably
still some low-hanging instructions worth adding to taint tracking, as, for
example "tcg_gen_add2_i32", those should be trivial to handle. In my
usecase, I only cared about mov, memcpy, strcpy and the likes. Considered
how difficult it would be to achieve 100% instructions coverage, I think
tcgtaint is mainly suited for coarse-grained data-provenance tasks, and not
as much for locating attacker-controlled variables.

As an aside, there could be some non-taint-related helpers worth adding to
Panda. Generating helpers for 'call', and 'ret' for example could make
callstack_instr simpler and faster (no need for capstone). I'm also
wondering how much of that could be implemented on top of panda's callbacks
(without touching qemu's code), if it had a callback like insn_translate
but exposing the generated tcg intermediate instructions and the tcg
context... It sounds elegant, but I'm not sure having access to the
generated tcg ops would be enough for callstack_instr, for instance.

Marginally related: there's an ongoing initiative by an elder colleague at
Politecnico di Milano for a better way of exposing tcg to external
applications:
https://kvmforum2017.sched.com/event/Bnoh/libtcg-exposing-qemus-tcg-frontend-to-external-tools-alessandro-di-federico-politecnico-di-milano
(he's
using tcg and llvm lifting for static binary analysis and translation in
rev.ng)

2017-11-20 20:41 GMT+01:00 Brendan Dolan-Gavitt <brendandg at nyu.edu>:

> Really interesting, I will have to take a look! I saw that the basic
> shadow memory is essentially an unordered_map from the physical address in
> memory to an std::set of labels. I'm a little surprised this is performant,
> because we had tried this a few years back and found it was pretty slow
> (hence our current sparse virtual memory approach).
>
> How hard would it be to add additional helpers? (Not saying I would want
> you to do it, just how hard it would be if someone later wanted to make the
> taint support more complete)
>
> On Mon, Nov 20, 2017 at 9:58 PM, Gabriele Viglianisi <vigliag at gmail.com>
> wrote:
>
>> Hi everbody,
>>
>> I had some difficulties with taint2, which I was unable to debug with my
>> current qemu and llvm knowledge, so I decided to make a quick attempt to
>> port the tcg-based taint instrumentation from Qtrace (
>> https://github.com/rpaleari/qtrace), a different qemu port by @rpaleari.
>> I've suprisingly managed to get something working well enough for my
>> purposes (or so appears from a few tests), and I thougth I'd share it with
>> you. Most of the credit goes to Roberto Paleari for publishing the original
>> Qtrace code.
>>
>> The plugin is not a full replacement for taint2, as it's not as general
>> or powerful, but it seems lighter on resources. It only supports the i386
>> target, doesn't instrument helpers, and it is composed of two parts: the
>> instrumentation (additional helper calls generated inside of tcg's
>> frontend), and the "tcgtaint" plugin, managing the data structures, and
>> exposing an interface similar to the one of taint2.
>>
>> The approach is probably not the cleanest, as it requires some insertions
>> to qemu's tcg code, and is maybe not general enough to be included in panda.
>> If by the way Panda developers are interesting in including it (in the
>> main or in a separate branch), I'd be happy to do some cleanup and prepare
>> a PR.
>>
>> All the relevant code is in ifdefs (you can grep for
>> "CONFIG_QTRACE_TAINT"), and is only built when the "--enable-tcgtaint"
>> switch is present. The instrumentation helper calls are only generated when
>> requested via the plugin api, and shouldn't affect the performances when
>> disabled.
>>
>> Differences with respect to taint2:
>> - comparatively very light on memory (unless you actually taint a lot)
>> - a little faster
>> - can be enabled and disabled via the plugin api, can also be told to
>> only instrument user-level code
>> - doesn't instrument helpers (the main use case is being able to tell the
>> provenance of some piece of data, so the focus was on getting movs, memcpys
>> and similar working correctly)
>> - the instrumentation is added before stores, so you safely can taint on
>> a "virt_mem_after_write" callback
>> - doesn't support tainting through pointer dereference
>> - doesn't suppot tainted branches
>> - only partial support to xmm registers (movs only)
>>
>> Differences with respect to the original instrumentation in qtrace:
>> - it uses panda's memory functions and callbacks instead of qtrace's
>> - I moved load instrumentation deeper down in tcg's frontend, so to
>> support `rep mov` and similar
>> - added partial support to xmm registers, in order to support `movqda`
>> - some bug fixes
>>
>> The code still lacks some comments, copyright notices, and documentation,
>> but it should work. Any comment or feedback is welcome!
>>
>> You can find the code at the tcgtaint branch in my repo
>> https://github.com/vigliag/panda/tree/tcgtaint
>>
>> changes wrt panda's master are here: https://github.com/panda-re/pa
>> nda/compare/master...vigliag:tcgtaint
>>
>> Best regards,
>> Gabriele
>>
>> _______________________________________________
>> panda-users mailing list
>> panda-users at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/panda-users
>>
>>
>
>
> --
> Brendan Dolan-Gavitt
> Assistant Professor, Department of Computer Science and Engineering
> NYU Tandon School of Engineering
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/panda-users/attachments/20171120/8d3d90ca/attachment.html