[panda-users] Tracing instructions from kernel space

Brendan Dolan-Gavitt brendandg at nyu.edu
Sat Nov 10 00:25:25 EST 2018


Hmm, is this being done on a live VM rather than a replay? It's
possible you need to flush the translation cache with
panda_do_flush_tb() in your init_plugin. Enabling memory callbacks
changes the way guest code is translated (specifically, it ensures
that every memory access goes through the "slow path" which is
instrumented, rather than using a TLB lookup which is not
instrumented) – so code that executed before your plugin was loaded
may be cached without instrumentation.

-Brendan

On Sat, Nov 10, 2018 at 12:09 AM, Jayashree Mohan
<jayashree2912 at gmail.com> wrote:
> To reproduce the problem I am facing try this :
>
> I've enabled tracking the entire memory region in the plugin here :
> https://github.com/williewillus/panda_scratchpad/blob/jaya_dev/personal_plugins/panda/plugins/writetracker/writetracker.cpp
>
> and the test workload file I am using is here :
> https://github.com/williewillus/panda_scratchpad/blob/jaya_dev/communication/write-cacheline.c
>
> It simply writes 64 B of data into the file.
>
> What I am trying to do is -
> 1. Create a pmem device / ramdisk  and mount it on ext4-dax at /mnt/pmem0
> 2. Now start up the VM on QEMU and load the writetracker plugin above
> 3. On the VM, execute the workload above, which writes 64B, followed by a
> fsync
> 4. Unload the write tracker plugin. This would have traced memory writes
> along with the address and data in the wt.out file in the host.
> 5. You can simply parse the file to see the address and data by using the
> src/reader.cpp file in the repo.
>
> On doing the above steps, I see the behaviour I described. When I write 64B,
> I don't see the data, which is a series of RRRRR... in the parsed wt.out
> file. Rather, if I just change this to write 63 B instead, I can see it.
>
> Sample snippet from the parsed file when I write 63B :
>
> [pc 0xffffffff81456121] write to VA 41411000, size 8, Data : RRRRRRRR
> [pc 0xffffffff81456121] write to VA 41411008, size 8, Data : RRRRRRRR
> [pc 0xffffffff81456121] write to VA 41411010, size 8, Data : RRRRRRRR
> [pc 0xffffffff81456121] write to VA 41411018, size 8, Data : RRRRRRRR
> [pc 0xffffffff81456121] write to VA 41411020, size 8, Data : RRRRRRRR
> [pc 0xffffffff81456121] write to VA 41411028, size 8, Data : RRRRRRRR
> [pc 0xffffffff81456121] write to VA 41411030, size 8, Data : RRRRRRRR
> [pc 0xffffffff81456149] write to VA 41411038, size 4, Data : RRRR
> [pc 0xffffffff8145615d] write to VA 4141103c, size 1, Data : R
> [pc 0xffffffff8145615d] write to VA 4141103d, size 1, Data : R
> [pc 0xffffffff8145615d] write to VA 4141103e, size 1, Data : R
>
> Though it says VA, the physical address is being printed, which is something
> around 1GB. My emulated pmem device occupies 1GB- 1GB+128MB in the physical
> memory - so this makes sense.
>
> Irrespective of the address, even if I run the workload on a ramdisk, I
> would expect to see 'RRRR....'  in the trace file, which I don't if I write
> multiples of aligned 64B.
>
> Hope this helps reproduce the issue. Let me know if you need more details or
> if I am doing some mistake here.
>
> Thanks,
> Jayashree Mohan
>
>
>
> Thanks,
> Jayashree Mohan
>
>
>
> On Fri, Nov 9, 2018 at 10:19 PM Brendan Dolan-Gavitt <brendandg at nyu.edu>
> wrote:
>>
>> Hmm, something definitely seems odd – with this test program:
>>
>> https://gist.github.com/moyix/ed0d6dde9bc8164ff5e58030282d72af
>>
>> and then testing with -panda stringsearch:str="averylongstring" I can
>> see writes in the kernel in __copy_from_user_ll_nozero – but only if I
>> do a memcpy to a different userspace buffer first!
>>
>> Could you share the userland program you're using to test so I can
>> compare?
>>
>> -Brendan
>>
>> On Fri, Nov 9, 2018 at 9:06 PM, Jayashree Mohan <jayashree2912 at gmail.com>
>> wrote:
>> > Hi Brendan,
>> >
>> > We verified by enabling tracing the entire memory region rather than
>> > confining it to 1-2G. However, the writes still cannot be traced. The
>> > behaviour we see if rather interesting. When we write something less
>> > than a cacheline(64B) using a write system call followed by fsync, it
>> > gets traced by PANDA during the PANDA_CB_VIRT_MEM_AFTER_WRITE
>> > callback. However, when we write anything in multiples of aligned
>> > cachelines, we don't see any memory write traces. For example, if I
>> > write 258B into a file, I can see the last two bytes of data alone.
>> > This seems weird as PANDA is not tracing full aligned cachelines. Do
>> > you seem to understand why this could be happening?
>> >
>> > Thanks,
>> > Jayashree Mohan
>> >
>> > On Fri, Nov 9, 2018 at 11:41 AM Brendan Dolan-Gavitt <brendandg at nyu.edu>
>> > wrote:
>> >>
>> >> The only thing I can think of from looking at your code briefly is
>> >> your use of the physical address range to restrict it to only log
>> >> writes in the 1-2GB range. Could it be that the kernel does
>> >> copy_from_user at the start and copies it into someplace outside that
>> >> range, then writes it back with copy_to_user at the end?
>> >>
>> >> On Fri, Nov 9, 2018 at 11:01 AM, Jayashree Mohan
>> >> <jayashree2912 at gmail.com> wrote:
>> >> > Hi Brendan,
>> >> >
>> >> > Thanks for the reply.
>> >> >
>> >> > Take a look at the plugin here :
>> >> >
>> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_williewillus_panda-5Fscratchpad_blob_master_personal-5Fplugins_panda_plugins_writetracker_writetracker.cpp&d=DwIFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=A4wu5Zmpus3hDmokNWeJTO0SLjrxguzCAxn30Hc-o48&m=kHkbR-cTusA4zz6euYKwT5Pzf8nrwZP1idV_9NTfZaA&s=BDZjQuADJrBGUzNIgo2EW_xVvAN7aN-caR1Rl0PqNgo&e=
>> >> >
>> >> > We load this plugin, and in the VM, write a simple program that
>> >> > writes to
>> >> > the pmem device mounted within the memory region being tracked. I see
>> >> > memcpy
>> >> > writes being traced, but not the ones due to write system call.
>> >> >
>> >> > I'll try checking if any of my writes originate in the kernel.
>> >> >
>> >> > Thanks,
>> >> > Jayashree Mohan
>> >> >
>> >> >
>> >> >
>> >> > Thanks,
>> >> > Jayashree Mohan
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Nov 9, 2018 at 9:56 AM Brendan Dolan-Gavitt
>> >> > <brendandg at nyu.edu>
>> >> > wrote:
>> >> >>
>> >> >> Yes, it should definitely be tracing memory accesses in the kernel
>> >> >> (it
>> >> >> traces all memory accesses on the system) – could you post your
>> >> >> plugin
>> >> >> code?
>> >> >>
>> >> >> You may also want to simply log all memory accesses, along with the
>> >> >> current program counter and (optionally) whether or not they
>> >> >> originate
>> >> >> in the kernel (using the panda_in_kernel API) to debug.
>> >> >>
>> >> >> On Fri, Nov 9, 2018 at 10:35 AM, Jayashree Mohan
>> >> >> <jayashree2912 at gmail.com> wrote:
>> >> >> > Hi all,
>> >> >> >
>> >> >> > I am using PANDA to trace all store instructions in an emulated
>> >> >> > pmem
>> >> >> > device. I do this by writing a plugin that issues calbacks on
>> >> >> > "PANDA_CB_VIRT_MEM_AFTER_WRITE" events. If I run a simple workload
>> >> >> > that does write() system calls followed by mmap and memcpy, I can
>> >> >> > see
>> >> >> > the callbacks being triggered for the user-space memcpy calls to a
>> >> >> > file, but not anytime during the write system call. Does PANDA
>> >> >> > allow
>> >> >> > tracing instructions from the kernel space?
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Jayashree Mohan
>> >> >> > _______________________________________________
>> >> >> > panda-users mailing list
>> >> >> > panda-users at mit.edu
>> >> >> >
>> >> >> >
>> >> >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__mailman.mit.edu_mailman_listinfo_panda-2Dusers&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A4wu5Zmpus3hDmokNWeJTO0SLjrxguzCAxn30Hc-o48&m=t_XD-sNNGDpfuGLf63sp5f-I-OP6dhEVNn-r9F-giQU&s=o4Ml-SG3gwaAZ7JrRz3N2W7BvJdTyZvua-jgEyicY2Q&e=
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Brendan Dolan-Gavitt
>> >> >> Assistant Professor, Department of Computer Science and Engineering
>> >> >> NYU Tandon School of Engineering
>> >>
>> >>
>> >>
>> >> --
>> >> Brendan Dolan-Gavitt
>> >> Assistant Professor, Department of Computer Science and Engineering
>> >> NYU Tandon School of Engineering
>>
>>
>>
>> --
>> Brendan Dolan-Gavitt
>> Assistant Professor, Department of Computer Science and Engineering
>> NYU Tandon School of Engineering



-- 
Brendan Dolan-Gavitt
Assistant Professor, Department of Computer Science and Engineering
NYU Tandon School of Engineering



More information about the panda-users mailing list