[panda-users] New Taint System & Network Tainting

Tue Apr 28 13:34:06 EDT 2015

Hi Elke,

I'd advise you to use the new taint system (taint2), although there is 
definitely some work that needs to be done to bring the old page 
directory-based system in line with the current stuff. What needs to be 
done is to make a new abstract superclass that is implemented by 
FastShad and ShadDir** and then abstract that away from the taint ops. 
There is also a small amount of work to be done in some other files. 
taint_ops.cpp needs to handle IO read and writes correctly; it currently 
throws them away. And llvm_taint_lib.cpp needs to produce the correct 
code when it sees an IO write.

I have been meaning to port over network/HD taint for a while but just 
haven't had time, since it hasn't really been a high priority. Obviously 
if you implement that and submit a pull req I'd be happy to merge.

The reason we're trying to move away from taint1 is that it's very slow 
(we've seen 2-3x speedups from taint2) and frequently runs out of RAM.

On 04/28/2015 05:03 AM, Eike Siewertsen wrote:
> Hi, again,
>
> so after some more research I realised that implementing IO memory
> with the new FastShad is not trivial. From what I can see, that is
> mainly because the IO memory addresses are larger than 32bits, or too
> large to be allocatable with mmap. Using the old hierarchical shadow
> memory would require implementing copying between the two memories.
>
> Which means I think I will stick with the old taint system. Now I have
> a further question: I can get the instruction number of tainted
> instructions, but for symbolic execution, what I am looking for is the
> /actual/ LLVM instructions referenced, in addition to the values of
> any parameters. Does anyone have any hints for how I can achieve that?
> I was looking at llvm_trace for a while now, but it dumps the entire
> basic blocks. I can relate the output instructions from
> tainted_instructions in qemu's "-d in_asm", but can I access the
> recompiled LLVM instruction directly, too?
>
> Thank you for your time,
>
> Eike
>
> On 26 April 2015 at 18:43, Eike Siewertsen <eikes at student.chalmers.se> wrote:
>> Hi,
>>
>> first of all, thanks a lot for all the work you put into Panda & co.,
>> it is an incredible platform.
>>
>> For my master thesis I am looking to taint bytes received over network
>> with the new taint system (taint2) and later perform symbolic
>> execution on the executed instructions and collect constraint on the
>> network input on branches - but that is for later. Right now I read up
>> on taint2 and played around with it. I discovered that the buffer
>> pointed to by handle_packet is actually special IO memory, something
>> which taint2 apparently doesn't support yet.
>>
>> Now I am wondering what the best next step would be now:
>>
>> Do you think it's feasible for me to try to implement that
>> functionality in taint2 (based on how the old taint plugin does it),
>> or are there significant difficulties involved? From what I gather the
>> only difference is that there is a separate IO shadow memory - but at
>> what point do the received tainted bytes propagate into the RAM of the
>> receiving application? Is this happening in recv()?
>>
>> Could I just have a callback on the recv syscall and taint the buffer
>> directly in memory?
>>
>> Or would it be best if I just use the old taint plugin for this for now?
>>
>> Thank you very much for your time and any help,
>>
>> Eike
> _______________________________________________
> panda-users mailing list
> panda-users at mit.edu
> http://mailman.mit.edu/mailman/listinfo/panda-users