[panda-users] Obtaining LLVM bitcode from PANDA plugin

Wed Feb 18 11:13:15 EST 2015

The translation to LLVM is in fact a bit messy.  If you're running x86,
you have to consider that each x86 instruction is being decomposed into
multiple TCG ops.  And then there's the likely chance that those TCG ops
will get decomposed into even more LLVM ops.  This is required in order to
generate LLVM code that is semantically correct enough to execute.

It could potentially be optimized further, especially for user mode (see
the optimizellvm branch).  This is quite a bit more difficult with
whole-system mode as a mechanism is needed to maintain consistency with
QEMU's CPUState in the event of exceptions/interrupts.  We haven't made an
effort to do this.

The easiest way to see differences in code translation is to pass qemu a
'-d in_asm,op,llvm_ir' argument while running LLVM, then to check out the
/tmp/qemu.log file.

Hopefully this helps,
-Ryan

On 2/18/15, 10:55 AM, "Federico "fox" Scrinzi" <fox91 at anche.no> wrote:

>On 18/02/2015 16:40, Whelan, Ryan - 0559 - MITLL wrote:
>> You can also check out the llvm_trace plugin, which also includes all
>> important QEMU helper functions.  There's a chance it's in a state of
>> disrepair since I haven't used it in a while, but definitely worth a
>>try.
>> 
>> If you're interested in translation, that occurs (thanks to S2E) in
>> qemu/tcg/tcg-llvm.cpp.
>
>Thank you Ryan,
>I had a look at the llvm_trace plugin but I cannot fully understand it,
>I lack quite some knowledge about QEMU internals and LLVM.
>
>
>> From: frank adkins <brisngrfreak at hotmail.com
>> As far as I know, if you're just looking for a string representation
>> then I've used the following:
>> 
>> 1. to dump the IR straight to stdout:
>> 2. to do something more precise with it:
>
>Thanks Frank! That was really helpful :)
>
>
>Another question (maybe silly): i see some panda/qemu-related code in
>the llvm translation (e.g.: calls to helper_panda_insn_exec). I thought
>the translation was at a lower level compared to the panda stuff. Why is
>that? I guess it is not possible to get a "cleaner" LLVM code, is it? My
>final goal would be to create less-obfuscated code/CFG from the recoding
>of an obfuscated program, so any way to get more human-friendly LLVM
>code would be appreciated.
>
>
>Cheers,
>Federico
>
>-- 
>f.
>
>https://github.com/volpino
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5213 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/panda-users/attachments/20150218/20e44906/attachment.bin