2012-07-18

Notes on x86 Opcodes

Here are some notes I made while trying to understand the opcodes in this fun article on generating small elf binaries. Hopefully they will be of use.

  00000000 B801000000        mov        eax, 1
  00000005 BB2A000000        mov        ebx, 42
  0000000A CD80              int        0x80

In the above snippet:
  • B8 is the move instruction, in which the destination register is encoded into the opcode itself. The mov instruction is B8+r where r = 0..7, 0 being al/ax/eax depending on the data size. 
    • One might naively expect then that the next opcode to move 42 into ebx would be B9, and one would be wrong. The registers are not numbered alphabetically. The next 2 registers in sequence is cl/cx/ecx for r=1 and dl/dx/edx for r=2, and finally we get to bl/bx/ebx for r=3. This explains why the next mov instruction has value BB.
      • Note also the little endian encoding of bytes in the operand.

  00000000 31C0              xor        eax, eax
  00000002 40                inc        eax
  00000003 B32A              mov        bl, 42
  00000005 CD80              int        0x80

In the above snippet:
  • 31 encodes xor
    • C0 is the MOD-REG-R/M byte, which has the format:
      • 0..2: R/M (register or memory, 0..7)
      • 3..5: REG (register, 0..7)
      • 6..7: MOD (addressing mode, 0..4)
    • In this case, we have:
      • MOD=11
      • REG=0
      • R/M=0
      • Which says: addressing mode is register addressing mode, and the destination and source registers are eax (REG=0) and eax (R/M=0).
  • 40 is an instruction like B8, where the register to modify is encoded in the opcode itself. 
  • B3 is B0+r with r=3. This is like B8+r except B8+r deals with 16/32bit data while B0+r deals with 8bit data.
    • Recall again that the bl/bx/ebx is the 4th register, not the second.
Instruction information was sourced from the very useful X86 Opcode and Instruction Reference.

Cheers,
Steve