[panda-users] How can I get the original assembly code(opcode)?

InGap Jeong laughfool at gmail.com
Mon Aug 17 23:08:50 EDT 2015


Thanks for the replies.

My Goal is to get 'the mybin.exe's execution trace log' like the PIN tool
trace log.

ex) 0x804826 mov eax, ebx   (eax = 0x10101010 , ebx = 0x20202020)
    0x804828 mov ebx, ecx   (eax = 0x20202020 , ebx = 0x20202020, ecx =
0x30303030)
    ...


First, I need the mybin.exe's execution flow(opcode).

1) when I tried to get opcode from env->eipv(PANDA_CB_INSN_EXEC), it seems
to be translated by the PANDA.
2) So, I tried to get opcode from pc (PANDA_CB_INSN_TRANSLATE) as told
@Igor.
but it's results are same.

I made a plugin to test with following code.

// only pid == target_pid
if (get_pid(env,eproc) == 0x73c){
                unsigned char buf[20] = {0,};
                if (types == 1){
// if called from PANDA_CB_INSN_TRANSLATE
                        fprintf(plugin_log, "[T] ");
                        panda_virtual_memory_rw(env, pc, buf, 20, 0);
                }else{
// if called from PANDA_CB_INSN_EXEC
                        fprintf(plugin_log, "[E] ");
                        panda_virtual_memory_rw(env, env->eip, buf, 20, 0);
                }

                //calc instruction length
                int inst_len = 0;
                inst_len = InstructionLength((BYTE*) buf);
                panda_disas(plugin_log, buf, inst_len);
                if(inst_len > 0){
                        fprintf(plugin_log, "opcode = 0x%02x", buf[0]);
                        if(inst_len > 1){
                                int k;
                                for(k = 1; k < inst_len; k++){
                                        fprintf(plugin_log, "%02x", buf[k]);
                                }
                        }
                        fprintf(plugin_log, "\n");
                }
                fprintf(plugin_log, "PC = " TARGET_FMT_lx " EIP = "
TARGET_FMT_lx " EAX = " TARGET_FMT_lx "... \n", pc, env->eip,
env->regs[R_EAX]...);



and result is

[T] 0x7f9383cc0640:  mov    0xa8(%rsi),%eax
opcode = 0x8b86a8000000
PC = 77d075f3 EIP = 77d075f3 EAX = 005fc8b8 EDI = 00000001 EBP = 0012fea8
ESP = 0012fe7c
[T] 0x7f9383cc0640:  cmp    %ebx,%eax
opcode = 0x3bc3
PC = 77d075f9 EIP = 77d075f3 EAX = 005fc8b8 EDI = 00000001 EBP = 0012fea8
ESP = 0012fe7c
[T] 0x7f9383cc0640:  je     0x7f9383cc06a0
opcode = 0x745e
PC = 77d075fb EIP = 77d075f3 EAX = 005fc8b8 EDI = 00000001 EBP = 0012fea8
ESP = 0012fe7c
[E] 0x7f9383cc0670:  mov    0xa8(%rsi),%eax
opcode = 0x8b86a8000000
PC = 77d075f3 EIP = 77d075f3 EAX = 005fc8b8 EDI = 00000001 EBP = 0012fea8
ESP = 0012fe7c
[E] 0x7f9383cc0670:  cmp    %ebx,%eax
opcode = 0x3bc3
PC = 77d075f9 EIP = 77d075f9 EAX = 00144d6c EDI = 00000001 EBP = 0012fea8
ESP = 0012fe7c
[E] 0x7f9383cc0670:  je     0x7f9383cc06d0
opcode = 0x745e
PC = 77d075fb EIP = 77d075fb EAX = 00144d6c EDI = 00000001 EBP = 0012fea8
ESP = 0012fe7c

As you can see, PANDA_CB_INSN_TRANSLATE' opcode and PANDA_CB_INSN_EXEC's
opcode is same.

I think the result is binary execution flow on guest machine, so I compare
the result with original binary code(mybin.exe).
but it seems to be not matched.
I don't know why it is not matched. Is somthing wrong? How can i get the
original execution flow?

2015-08-18 2:03 GMT+09:00 Brendan Dolan-Gavitt <brendandg at gatech.edu>:

> "pc" and env->eip can be different! QEMU typically only updates
> env->eip every basic block. The insn_exec callback will provide the
> precise program counter value as its argument though (it stores it at
> translation time so it can be passed in).
>
> Manolis is right that this won't give you the original binary back.
> One thing you can do is take a memory snapshot during replay and then
> use Volatility to extract the binary image from memory. This will
> preserve the headers, data sections, etc. However, depending on the
> amount of RAM available, some pages might be swapped out.
>
> If what you're looking to do is just disassemble something, you can
> use the recently added panda_disas function:
>
> void panda_disas(FILE *out, void *code, unsigned long size)
>
> Alternatively, if you want to have some machine-parseable description
> of the disassembled instruction, you can use distorm; an example of
> that can be found in the callstack_instr plugin.
>
> -Brendan
>
> On Mon, Aug 17, 2015 at 9:08 AM, Manolis Stamatogiannakis
> <mstamat at gmail.com> wrote:
> > Igor, are you sure that the "pc" argument and "env->eip" will contain
> > different arguments? I'd guess that "pc" is provided as convenience so
> that
> > you can avoid architecture-specific #ifdef macros in your plugin code
> > ("env->eip" is x86 specific).
> >
> > InGap, could you elaborate on what you attempt to achieve?
> >
> > Reconstructing mybin.exe from an execution trace is a non-trivial task.
> Even
> > in the (unlikely) case you have full coverage of mybin.exe in the
> execution
> > trace (i.e. every instruction in mybin.exe was executed at least once),
> the
> > order of the instructions as executed still may be different than the
> order
> > they appear in the binary. Moreover, executables are not plain
> instruction
> > dumps. They contain a lot of structured information (see
> > https://en.wikipedia.org/wiki/Portable_Executable) that you will not be
> able
> > to recapture just by observing the execution.
> >
> > M.
> >
> >
> >
> > 2015-08-17 8:33 GMT+02:00 Igor R <boost.lists at gmail.com>:
> >>
> >> > I trying to get the "mybin.exe'' 's original assembly code(opcode) in
> >> > the PANDA plugin.
> >> > (for tracing binary's opcode, registers, memory ..)
> >> >
> >> > Host OS : ubuntu x64
> >> > Guest OS : windows xp x86
> >> > Test binary : mybin.exe
> >> >
> >> > I got the opcode using panda_virtual_memory_rw function at
> >> > PANDA_CB_INSN_TRANSLATE.
> >> > ex) panda_virtual_memory_rw(env, env->eip, buf, 20, 0);
> >> >
> >> > but, It is not same as original assembly code('mybin.exe').
> >> > It seems to be translated by the PANDA.
> >>
> >>
> >>
> >> Quoting from the documentation:
> >> <<
> >> insn_translate: called before the translation of each instruction
> >>
> >> Callback ID: PANDA_CB_INSN_TRANSLATE
> >>
> >> Arguments:
> >>
> >> CPUState *env: the current CPU state
> >> target_ulong pc: the guest PC we are about to translate
> >> >>
> >>
> >> So, if you need the opcode of the instruction being translated, you
> >> should read the memory from "pc" address (rather than env->ip).
> >> _______________________________________________
> >> panda-users mailing list
> >> panda-users at mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/panda-users
> >
> >
> >
> > _______________________________________________
> > panda-users mailing list
> > panda-users at mit.edu
> > http://mailman.mit.edu/mailman/listinfo/panda-users
> >
> _______________________________________________
> panda-users mailing list
> panda-users at mit.edu
> http://mailman.mit.edu/mailman/listinfo/panda-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/panda-users/attachments/20150817/6e8bee41/attachment.html


More information about the panda-users mailing list