Assignment: RE

Changelog:

Your Task

  1. Download the executable version of a “Go Fish” game written in C (based on the version distributed with OpenBSD, which is based on a version written by Muffy Brockey before 1990). Also download the source code here.

  2. Answer the questions about the executable fish.exe linked above on the answer sheet. (Do not answer the questions about a recompiled version of the executable.)

  3. Download the mystery executable here

  4. Answer the questions about the executable mystery.exe linked above on the answer sheet answer sheet.

Hints for examining the executables

  1. The objdump command is my recommended way of decoding an executable. A command like

     objdump -sRrd --file-offset something.exe  > output.txt
    

    will provide a fairly complete dump of information about something.exe and write it to output.txt.

    Each of the options s, R, r, and d enable certain different data to be included (so if you omit one of them, you’ll get output with less information), see objdump --help or man objdump for information about these options and other options we do not use.

    See “Interpreting objdump output” below for information on what you should expect from this “objdump” output.

  2. Some other useful objdump options include:

    • objdump -x to include all “program header” information
  3. The way programs are typically built on Linux, execution of the program does not actually start in main but starts in a function called _start that is provided by the compiler — this is the start address specified in the program header. This function calls a special function in the C standard library called __libc_start_main. It is this function that actually calls main and takes care of exiting when main returns.

Interpreting objdump output

When run using the command we suggest above, the objdump output includes the following parts in this order:

File format information

Information like “elf64-x86-64” about the format of the exectuable.

Section contents

Objdump will dump the contents of “sections” in a format like:

      Contents of section .text:
 4011d0 f30f1efa 41544c8d 257b1000 005589fd  ....ATL.%{...U..
 4011e0 31ff5348 89f3e865 ffffff48 89c7e88d  1.SH...e...H....
 4011f0 ffffff4c 89e24889 de89efe8 a0ffffff  ...L..H.........

    

In this example:

  1. “.text” is the name of the section. Generally, the section name will indicate its purpose. “text” usually means the section containing machine code.

  2. The leftmost column indicates the address (in hexadecimal) where this data will be loaded in memory.

  3. The next four columns are the hexadecimal values actually placed in memory. These values are written in the order the bytes appear in memory, so the value 0x12345678 in little endian will appear as 78563412. The final columns are the same values represented as characters, except a period (.) is used to represent bytes which do not correspond to a printable ASCII character.

Disassembly

For each section that is marked as containing machine code, objdump will attempt to turn the machine code into assembly.

The assembly will be split into what objdump guesses are the functions (but it may not always correctly identify where a function starts and ends).

You may notice some name@plt functions in the disassembly that are unusual and deserve special discussion:

      00000000004011b0 <exit@plt> (File Offset: 0x11b0):
  4011b0:▶      f3 0f 1e fa          ▶  endbr64 
  4011b4:▶      f2 ff 25 b5 2e 00 00 ▶  bnd jmpq *0x2eb5(%rip)        # 404070 <exit@GLIBC_2.2.5>
  4011bb:▶      0f 1f 44 00 00       ▶  nopl   0x0(%rax,%rax,1)

    
  1. These are an artifact of dynamic linking. The plt stands for “procedure linkage table”, and this function’s purpose is to stand in for a standard library function. In this case, that function is “exit”.

  2. The instruction bnd jmpq *0x2eb5(%rip) reads a pointer from memory at 0x2eb5 + %rip. Then it jumps to the location that points to. The comment inserted by objdump # 404070 <exit@GLIBC_2.2.5> indicates that that 0x2eb5 + %rip will be the address 0x404070, and that that address should contain a pointer to the symbol exit@GLIBC_2.2.5 which will be loaded from another file.

    Since the code for exit is not included in the executable, that pointer will be filled in as part of running the executable (either when loading the executable or sometime later):

    1. Running objdump -R exec.exe will give a list of the “dynamic relocation records” in the executable that will be fixed when the executable is loaded. For example:

      fish.exe:     file format elf64-x86-64
      
      DYNAMIC RELOCATION RECORDS
      OFFSET           TYPE              VALUE 
      ...
      0000000000404070 R_X86_64_JUMP_SLOT  exit@GLIBC_2.2.5
      ...
      

      includes an indication that address 0x404070 should be replaced with the address of exit@GLIBC_2.2.5, and the TYPE field R_X86_64_JUMP_SLOT indicates how that address should be formatted.

    2. Running objdump -p exec.exe will show general executable headers which include an “DYNAMIC” section. This section lists where external functions can like exit@GLIBC_2.2.5 will be found. For example:

      fish.exe:     file format elf64-x86-64
      
      ...
              
      Dynamic Section:
        NEEDED               libc.so.6
        INIT                 0x0000000000401000
        FINI                 0x0000000000401e28
      ...
      

      indicates that libc.so.6 is the only file this executable expects to find extra functions in.

“normal functions”

An example of disassembly you might see for a function _init would look like:

      0000000000401000 <_init>  (File Offset: 0x1000):
  401000:▶      f3 0f 1e fa          ▶  endbr64 
  401004:▶      48 83 ec 08          ▶  sub    $0x8,%rsp
  401008:▶      48 8b 05 d9 2f 00 00 ▶  mov    0x2fd9(%rip),%rax        # 403fe8 <__gmon_start__> (File Offset: 0x3fe8)
  40100f:▶      48 85 c0             ▶  test   %rax,%rax
  401012:▶      74 02                ▶  je     401016 <_init+0x16>
  401014:▶      ff d0                ▶  callq  *%rax
  401016:▶      48 83 c4 08          ▶  add    $0x8,%rsp
  40101a:▶      c3                   ▶  retq   

    

In this example:

  1. The first line indicates that there is a label called _init which has the address 0x401000 when the executable is loaded.

    (If the executable could be loaded at multiple addresses (called a “position independent executable”), then this address will make sense to other addresses in the objdump output, but most likely the operating system will choose another address when the executable is run.)

    The “File Offset” is the number of bytes into the executable at which the machine code is located. The machine code is stored contiguously, so if 0x401000 is at offset 0x1000, then 0x401008 is at offset 0x1008.

  2. Each following line is an instruction. The value before the colon indicates the memory address in hexadecimal of the first byte of the instruction. The hexadecimal values after the colon are the bytes of the instruction in hexadecimal. Following this is the disassembled instruction itself. (Some long instructions may require multiple lines for their hexadecimal values.)

    Within the disassembled instructions, objdump attempts to provide information about addresses in addition to showing the addresses encoded in the instruction. In cases where the label is exactly equal to the address, like for the label __gmon_start__ in the example above, the format is address <LABEL> with the address in hexadecimal. In cases where the address does not correspond to a label, the format is address <LABEL+offset>. For example 401016 <_init+0x16> indicates the address 0x401016, which is 0x16 bytes after the label _init.

    On 64-bit x86, some instructions specify an address relative to %rip. %rip represents the “instruction pointer”, which in 2150 and 3330 we have called the “program counter”. It is the address of the current instruction, so 0x2fd9(%rip) means memory 0x2fd9 bytes after the address of the current instruction. objdump’s disassembly includes a comment indicating what address is computed. In the case of the example above, the address is 0x403fe8, which is the address of __gmon_start__.

  3. No dynamic linking is involved. But if it is, like if you see symbols whose name contains @plt, see the section on dynamic linking above.