[panda-users] How can I get the original assembly code(opcode)?

Brendan Dolan-Gavitt brendandg at gatech.edu
Tue Aug 18 08:49:36 EDT 2015


Just making a guess based on the PC addresses you're posting, I
*think* the part you're tracing is probably in library code (the main
executable usually gets a lower load address; the default is 0x10000).
If you want to trace only instructions in mybin.exe you'd need to find
out where it's loaded into memory and then restrict the logging so
that it only prints when the PC is in those ranges.

-Brendan

On Mon, Aug 17, 2015 at 11:08 PM, InGap Jeong <laughfool at gmail.com> wrote:
> Thanks for the replies.
>
> My Goal is to get 'the mybin.exe's execution trace log' like the PIN tool
> trace log.
>
> ex) 0x804826 mov eax, ebx   (eax = 0x10101010 , ebx = 0x20202020)
>     0x804828 mov ebx, ecx   (eax = 0x20202020 , ebx = 0x20202020, ecx =
> 0x30303030)
>     ...
>
>
> First, I need the mybin.exe's execution flow(opcode).
>
> 1) when I tried to get opcode from env->eipv(PANDA_CB_INSN_EXEC), it seems
> to be translated by the PANDA.
> 2) So, I tried to get opcode from pc (PANDA_CB_INSN_TRANSLATE) as told
> @Igor.
> but it's results are same.
>
> I made a plugin to test with following code.
>
> // only pid == target_pid
> if (get_pid(env,eproc) == 0x73c){
>                 unsigned char buf[20] = {0,};
>                 if (types == 1){
> // if called from PANDA_CB_INSN_TRANSLATE
>                         fprintf(plugin_log, "[T] ");
>                         panda_virtual_memory_rw(env, pc, buf, 20, 0);
>                 }else{
> // if called from PANDA_CB_INSN_EXEC
>                         fprintf(plugin_log, "[E] ");
>                         panda_virtual_memory_rw(env, env->eip, buf, 20, 0);
>                 }
>
>                 //calc instruction length
>                 int inst_len = 0;
>                 inst_len = InstructionLength((BYTE*) buf);
>                 panda_disas(plugin_log, buf, inst_len);
>                 if(inst_len > 0){
>                         fprintf(plugin_log, "opcode = 0x%02x", buf[0]);
>                         if(inst_len > 1){
>                                 int k;
>                                 for(k = 1; k < inst_len; k++){
>                                         fprintf(plugin_log, "%02x", buf[k]);
>                                 }
>                         }
>                         fprintf(plugin_log, "\n");
>                 }
>                 fprintf(plugin_log, "PC = " TARGET_FMT_lx " EIP = "
> TARGET_FMT_lx " EAX = " TARGET_FMT_lx "... \n", pc, env->eip,
> env->regs[R_EAX]...);
>
>
>
> and result is
>
> [T] 0x7f9383cc0640:  mov    0xa8(%rsi),%eax
> opcode = 0x8b86a8000000
> PC = 77d075f3 EIP = 77d075f3 EAX = 005fc8b8 EDI = 00000001 EBP = 0012fea8
> ESP = 0012fe7c
> [T] 0x7f9383cc0640:  cmp    %ebx,%eax
> opcode = 0x3bc3
> PC = 77d075f9 EIP = 77d075f3 EAX = 005fc8b8 EDI = 00000001 EBP = 0012fea8
> ESP = 0012fe7c
> [T] 0x7f9383cc0640:  je     0x7f9383cc06a0
> opcode = 0x745e
> PC = 77d075fb EIP = 77d075f3 EAX = 005fc8b8 EDI = 00000001 EBP = 0012fea8
> ESP = 0012fe7c
> [E] 0x7f9383cc0670:  mov    0xa8(%rsi),%eax
> opcode = 0x8b86a8000000
> PC = 77d075f3 EIP = 77d075f3 EAX = 005fc8b8 EDI = 00000001 EBP = 0012fea8
> ESP = 0012fe7c
> [E] 0x7f9383cc0670:  cmp    %ebx,%eax
> opcode = 0x3bc3
> PC = 77d075f9 EIP = 77d075f9 EAX = 00144d6c EDI = 00000001 EBP = 0012fea8
> ESP = 0012fe7c
> [E] 0x7f9383cc0670:  je     0x7f9383cc06d0
> opcode = 0x745e
> PC = 77d075fb EIP = 77d075fb EAX = 00144d6c EDI = 00000001 EBP = 0012fea8
> ESP = 0012fe7c
>
> As you can see, PANDA_CB_INSN_TRANSLATE' opcode and PANDA_CB_INSN_EXEC's
> opcode is same.
>
> I think the result is binary execution flow on guest machine, so I compare
> the result with original binary code(mybin.exe).
> but it seems to be not matched.
> I don't know why it is not matched. Is somthing wrong? How can i get the
> original execution flow?
>
> 2015-08-18 2:03 GMT+09:00 Brendan Dolan-Gavitt <brendandg at gatech.edu>:
>>
>> "pc" and env->eip can be different! QEMU typically only updates
>> env->eip every basic block. The insn_exec callback will provide the
>> precise program counter value as its argument though (it stores it at
>> translation time so it can be passed in).
>>
>> Manolis is right that this won't give you the original binary back.
>> One thing you can do is take a memory snapshot during replay and then
>> use Volatility to extract the binary image from memory. This will
>> preserve the headers, data sections, etc. However, depending on the
>> amount of RAM available, some pages might be swapped out.
>>
>> If what you're looking to do is just disassemble something, you can
>> use the recently added panda_disas function:
>>
>> void panda_disas(FILE *out, void *code, unsigned long size)
>>
>> Alternatively, if you want to have some machine-parseable description
>> of the disassembled instruction, you can use distorm; an example of
>> that can be found in the callstack_instr plugin.
>>
>> -Brendan
>>
>> On Mon, Aug 17, 2015 at 9:08 AM, Manolis Stamatogiannakis
>> <mstamat at gmail.com> wrote:
>> > Igor, are you sure that the "pc" argument and "env->eip" will contain
>> > different arguments? I'd guess that "pc" is provided as convenience so
>> > that
>> > you can avoid architecture-specific #ifdef macros in your plugin code
>> > ("env->eip" is x86 specific).
>> >
>> > InGap, could you elaborate on what you attempt to achieve?
>> >
>> > Reconstructing mybin.exe from an execution trace is a non-trivial task.
>> > Even
>> > in the (unlikely) case you have full coverage of mybin.exe in the
>> > execution
>> > trace (i.e. every instruction in mybin.exe was executed at least once),
>> > the
>> > order of the instructions as executed still may be different than the
>> > order
>> > they appear in the binary. Moreover, executables are not plain
>> > instruction
>> > dumps. They contain a lot of structured information (see
>> > https://en.wikipedia.org/wiki/Portable_Executable) that you will not be
>> > able
>> > to recapture just by observing the execution.
>> >
>> > M.
>> >
>> >
>> >
>> > 2015-08-17 8:33 GMT+02:00 Igor R <boost.lists at gmail.com>:
>> >>
>> >> > I trying to get the "mybin.exe'' 's original assembly code(opcode) in
>> >> > the PANDA plugin.
>> >> > (for tracing binary's opcode, registers, memory ..)
>> >> >
>> >> > Host OS : ubuntu x64
>> >> > Guest OS : windows xp x86
>> >> > Test binary : mybin.exe
>> >> >
>> >> > I got the opcode using panda_virtual_memory_rw function at
>> >> > PANDA_CB_INSN_TRANSLATE.
>> >> > ex) panda_virtual_memory_rw(env, env->eip, buf, 20, 0);
>> >> >
>> >> > but, It is not same as original assembly code('mybin.exe').
>> >> > It seems to be translated by the PANDA.
>> >>
>> >>
>> >>
>> >> Quoting from the documentation:
>> >> <<
>> >> insn_translate: called before the translation of each instruction
>> >>
>> >> Callback ID: PANDA_CB_INSN_TRANSLATE
>> >>
>> >> Arguments:
>> >>
>> >> CPUState *env: the current CPU state
>> >> target_ulong pc: the guest PC we are about to translate
>> >> >>
>> >>
>> >> So, if you need the opcode of the instruction being translated, you
>> >> should read the memory from "pc" address (rather than env->ip).
>> >> _______________________________________________
>> >> panda-users mailing list
>> >> panda-users at mit.edu
>> >> http://mailman.mit.edu/mailman/listinfo/panda-users
>> >
>> >
>> >
>> > _______________________________________________
>> > panda-users mailing list
>> > panda-users at mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/panda-users
>> >
>> _______________________________________________
>> panda-users mailing list
>> panda-users at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/panda-users
>
>
>
> _______________________________________________
> panda-users mailing list
> panda-users at mit.edu
> http://mailman.mit.edu/mailman/listinfo/panda-users
>


More information about the panda-users mailing list