re-tools

running example

$ file mystery
mystery: ELF 64-bit LSB pie executable, x86-64,
version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2,
BuildID[sha1]=9819a3cfb39d01ad2a376c54318f104139422a8f,
for GNU/Linux 3.2.0, stripped
  • LSB = little endian
  • pie = position-independent executable
  • interpreter = program that loads this

aside: file(1)

$ man file
FILE(1)                      General Commands Manual                    FILE(1)

NAME
       file — determine file type

....

  • looks for ‘‘magic numbers’’ near beginning of file data
  • hand-managed database of common patterns

from file’s source

0   name        elf-le
>16 leshort     0       no file type,
!:mime  application/octet-stream
>16 leshort     1       relocatable,
!:mime  application/x-object
>16 leshort     2       executable,
!:mime  application/x-executable
>16 leshort     3       ${x?pie executable:shared object},

...
0   string      \177ELF     ELF
!:strength *2
>4  byte        0       invalid class
>4  byte        1       32-bit
>4  byte        2       64-bit
>5  byte        0       invalid byte order
>5  byte        1       LSB
>>0 use     elf-le
>5  byte        2       MSB
>>0 use     \^elf-le
>7  byte        0       (SYSV)

finding strings

$ hexdump -c mystery
00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  03 00 3e 00 01 00 00 00  c0 60 00 00 00 00 00 00  |..>......`......|
00000020  40 00 00 00 00 00 00 00  08 5e 03 00 00 00 00 00  |@........^......|
00000030  00 00 00 00 40 00 38 00  0d 00 40 00 1e 00 1d 00  |....@.8...@.....|
[... many more lines ...]
00000e60  00 5f 49 54 4d 5f 64 65  72 65 67 69 73 74 65 72  |._ITM_deregister|
00000e70  54 4d 43 6c 6f 6e 65 54  61 62 6c 65 00 5f 5f 67  |TMCloneTable.__g|
00000e80  6d 6f 6e 5f 73 74 61 72  74 5f 5f 00 5f 49 54 4d  |mon_start__._ITM|
00000e90  5f 72 65 67 69 73 74 65  72 54 4d 43 6c 6f 6e 65  |_registerTMClone|
00000ea0  54 61 62 6c 65 00 77 61  64 64 63 68 00 63 6c 65  |Table.waddch.cle|
00000eb0  61 72 6f 6b 00 6e 6f 65  63 68 6f 00 6d 76 70 72  |arok.noecho.mvpr|
[... many more lines ...]

exercise: heuristic?

  • could scan through pages of hexdump for something interesting…
  • good heuristic for automating this process?

strings utility (1)

$ strings mystery
/lib64/ld-linux-x86-64.so.2
*7lT1
A9B*
m8m7
_ITM_deregisterTMCloneTable
__gmon_start__
_ITM_registerTMCloneTable
waddch
clearok

        prints help
        identify object
        left
        down
        right

        save game
        quit

strings utility (2)

$ strings --bytes=40 mystery
character you want help for (* for all):
you feel a wrenching sensation in your gut
your armor appears to be weaker now. Oh my!
you feel a sting in your arm and now feel weaker
Level: %d  Gold: %-5d  Hp: %*d(%*d)  Str: %2d(%d)  Ac: %-2d  Exp: %d/%ld  %s
Ok, if you want to exit that badly, I'll have to allow it
Hello %s, just a moment while I dig the dungeon...
orry, but your terminal window has too few columns.
Sorry, but your terminal window has too few lines.
please specifiy a letter between 'A' and 'Z'

dedicated reverse engineering tools

  • specialized toolkits for specifically reverse engineering

  • more complex analyses than objdump/strings


  • primary example I’ll look at: Ghidra

    • open source, developed by National Security Agency
  • has some commercial competitors

in Ghidra

(after making new project, loading mystery file, Window > Defined Strings)

libraries

$ objdump --all-headers mystery

Dynamic Section:
  NEEDED               libncurses.so.6
  NEEDED               libtinfo.so.6
  NEEDED               libc.so.6

ncurses?

tinfo? (1)

tinfo? (2)

library calls

$ objdump --dynamic-syms mystery

mystery:     file format elf64-x86-64

DYNAMIC SYMBOL TABLE:
0000000000000000      DF *UND*  0000000000000000 (GLIBC_2.3)  __ctype_toupper_loc
0000000000000000      DF *UND*  0000000000000000 (GLIBC_2.2.5) getenv
0000000000000000      DF *UND*  0000000000000000 (NCURSES6_5.0.19991023) wattrset
0000000000000000      DF *UND*  0000000000000000 (GLIBC_2.2.5) free
0000000000000000      DF *UND*  0000000000000000 (NCURSES6_TINFO_5.0.19991023) flushinp
0000000000000000      DF *UND*  0000000000000000 (GLIBC_2.2.5) localtime
0000000000000000      DF *UND*  0000000000000000 (GLIBC_2.34) __libc_start_main

0000000000000000      DF *UND*  0000000000000000 (GLIBC_2.2.5) setuid

library calls (Ghidra)

finding library call uses

objdump –disassemble –dyanmic-reloc:

0000000000005b00 <setuid@plt>:
    5b00:▶      f3 0f 1e fa          ▶  endbr64␣
    5b04:▶      f2 ff 25 fd d3 02 00 ▶  bnd jmp *0x2d3fd(%rip) 
            # 32f08 <setuid@GLIBC_2.2.5>
    5b0b:▶      0f 1f 44 00 00       ▶  nopl   0x0(%rax,%rax,1)

   2764f:▶      e8 ec e3 fd ff       ▶  call   5a40 <open@plt>
   27654:▶      89 05 fe 48 01 00    ▶  mov    %eax,0x148fe(%rip)        # 3bf58 <LINES@NCURSES6_TINFO_5.0.19991023+0x5244>
   2765a:▶      31 c0                ▶  xor    %eax,%eax
   2765c:▶      e8 2f e1 fd ff       ▶  call   5790 <getuid@plt>
   27661:▶      89 c7                ▶  mov    %eax,%edi
   27663:▶      31 c0                ▶  xor    %eax,%eax
   27665:▶      e8 96 e4 fd ff       ▶  call   5b00 <setuid@plt>
   2766a:▶      31 c0                ▶  xor    %eax,%eax
   2766c:▶      e8 cf e2 fd ff       ▶  call   5940 <getgid@plt>
   27671:▶      48 83 c4 08          ▶  add    $0x8,%rsp

disassembly issues (1)

.global main
main:
    call print_hello
    xorl %eax, %eax
    ret
.Lstr:
    .asciz "Hello!"
print_hello:
    leaq .Lstr(%rip), %rdi  // RDI <- .Lstr address
    jmp puts

0000000000001139 <main>:
    1139:   e8 0a 00 00 00          call   1148 <print_hello>
    113e:   31 c0                   xor    %eax,%eax
    1140:   c3                      ret    
    1141:   48                      rex.W
    1142:   65 6c                   gs insb (%dx),%es:(%rdi)
    1144:   6c                      insb   (%dx),%es:(%rdi)
    1145:   6f                      outsl  %ds:(%rsi),(%dx)
    1146:   2e                      cs
    ...
0000000000001148 <print_hello>:
    1148:   48 8d 3d f2 ff ff ff    lea    -0xe(%rip),%rdi        # 1141 <main+0x8>
    114f:   e9 dc fe ff ff          jmp    1030 <puts@plt>

disassembly issues

0000000000001139 <main>:
    1139:   e8 0a 00 00 00          call   1148 <print_hello>
    113e:   31 c0                   xor    %eax,%eax
    1140:   c3                      ret    
    1141:   48                      rex.W
    1142:   65 6c                   gs insb (%dx),%es:(%rdi)
    1144:   6c                      insb   (%dx),%es:(%rdi)
    1145:   6f                      outsl  %ds:(%rsi),(%dx)
    1146:   2e                      cs
        ...
0000000000001148 <print_hello>:
    1148:   48 8d 3d f2 ff ff ff    lea    -0xe(%rip),%rdi        # 1141 <main+0x8>
    114f:   e9 dc fe ff ff          jmp    1030 <puts@plt>

    1139:    e8 0a 00 00 00          call   1148 <__cxa_finalize@plt+0x108>
    113e:   31 c0                   xor    %eax,%eax
    1140:   c3                      ret    
    1141:   48                      rex.W
    1142:   65 6c                   gs insb (%dx),%es:(%rdi)
    1144:   6c                      insb   (%dx),%es:(%rdi)
    1145:   6f                      outsl  %ds:(%rsi),(%dx)
    1146:   2e 00 48 8d            cs add %cl,-0x73(%rax)
    114a:   3d f2 ff ff ff          cmp    $0xfffffff2,%eax
    114f:   e9 dc fe ff ff          jmp    1030 <puts@plt>

finding assembly heuristics

  • objdump strategy, apparently:
    • disassemble instructions starting at each symbol
    • skip over strings of zero-bytes just before symbol
  • problem: can misidentify jumped to instructions
    • especially if symbols stripped to save space/hinder reverse engineering
  • exercise: algorithm to fix?
    • (Ghidra does this)

some tricky cases (1)

_start:
    ...
    movq $main, %rdi
    ...
    call __libc_start_main
    ...

struct DeviceTypeFuncs {
    void (*Send)(struct DeviceInfo*, char *);
    void (*Recv)(struct DeviceInfo, char *, size_t);
};
void SendToDevice(struct DeviceInfo* info, char *data) {
    (info->funcs->Send)(data);
}

some tricky cases (2)

table:
    .int case1 - table
    .int case2 - table
...

    lea table(%rip), %rax
    addq (%rax, %rdi, 4), %rax
    jmp *%rax

    movq $function + 0x12340, %rax
    movq $0x1234, %r9
    sll $4, %r9
    addq %r9, %rax
    call *%rax

some tricky cases (3)

    call complex_func_returning_three
    lea next2-3(%rax), %rax
    jmp *%rax
    .byte 0x39, 0x59, 0x60, 0x89, 0xFF
next2:
    addq ...

Ghidra screenshot showing CALL FUN_001011148; XOR EAX, EAX; RET; s_Hello._001141: ds 'Hello'; FUN_0101148: LEA RDI,[s_Hello._00101141]...

cross-references (1)

Ghidra screenshot showing 'XREF's on right; for the string hello has a XREF labelling the instruction that uses it in RDI, and each function has a XREF for where the function is called or a pointer to the function is used.

cross-references idea

  • cross-reference idea:
  • really useful to know where something is used
  • do-able ‘by hand’ with objdump and friends, but…
    • lots of bookkeeping, searching in text files, etc.

more cross-references

Ghidra screenshot showing a large number of cross-references at the beginning of a function. The first few are for a stack-based local variable (marked W (for write) or R (for read). Then there are large collection representing calls to this function (marked C).

more cross-references (stack)

more cross-references (global)

function callers?

FUN_12345678

  • Ghidra names functions without symbols based on address
  • we can adjust that…

decompiler



refining decompile (1)

refining decompile (2)

  • can setup names, types for functions
  • types can include marking arrays, structs
    • Ghidra doesn’t seem great at inferring this itself
  • also for local/global variables
    • for globals, can right-click in listing view too
    • right-click “Create Array…” to say a global variable is an array

interlude: editing disassembly format

PCode

Intermediate Representations

  • Ghidra converts instructions to this PCode language

    • describes effects of each instruction for other parts of Ghidra
    • allows ‘easy’ support for ARM, MIPS, …
  • function graph we saw using PCode information, probably

  • decompiler is basically a PCode to C compiler

    • does the same kind of optimizations/etc. normal compiler does
    • different output language
  • Ghidra has ‘find similar functions’ tool that probably uses this

patch instruction?

patch instruction?

why is this useful?

  • can export modified version of binary to test

  • ghidra has support for debugging or emulating running program

    • emulation is another application of PCode representation
    • debugging requires some work to configure

debuggers / emulators

  • major way to analyzing software — run it!
    • believe most of you did this in CSO1
  • possibly using debugger to analyze memory/registers/etc.
  • possibly in restricted environment
    • either limit access to system calls, or
    • run on virtual (okay-to-lose) hardware

selected debugger features (1)

  • watchpoint
    • GDB/LLDB watch
    • breakpoint triggered by variable/expression changing
  • breakpoints on system calls
    • GDB catch syscall …
  • searching memory for strings
    • GDB find, LLDB memory find

selected debugger fetures (2)

  • saving ‘core’ files
    • full copy of program’s memory, can reload in debugger later
    • GDB generate-core-file NAME
  • copying memory to/from file
    • GDB dump/append/restore; LLDB memory read/write
  • attaching to programs / remote debugging
  • forcing jump to address/return from function
    • GDB jump/return

Ghidra debugger integration

aside: Ghidra debugger installation

  • relies on GDB python support + some python packages installed
  • see installation docs

Ghidra traces

  • Ghidra — debugging session creates a ‘trace’

    • can be saved to look at later
  • creates a list of ‘snapshots’ for every time debugger stopped

    • snapshots are incomplete
    • need to force read of memory/etc. to have info included in snapshot
  • can switch between ‘Control Target’ and ‘Control Trace’ modes

    • Control Trace — go back to old snapshots, examine state
    • Control Target — control live program in debugger

Ghidra dynamic view

Ghidra snapshots/saved traces:

  • automatic partial snapshots whenever pausing debugger
  • can force read of range of memory to make snapshot contain memory image

reverse debugging?

  • old idea: ‘reverse debugging’

  • in addition to step/continue,
    debugger could have reverse-step/reverse-continue

  • typically implemented by recording ‘trace’ of execution


  • some implementations (with varyingly middling performance

    • https://rr-project.org for x86-64 Linux (needs sysadmin to set some things)
    • QEMU for full virtual machines (not just one program)
    • built-in to GDB, but not maintained/possibly broken with modern systems

unicorn as tool

unicorn example (1)

$ cat test.s
    mov $10000, %edi
    imul $2, %rdi, %rdi
$ gcc -c test.s; objcopy -j .text test.o -O binary test.bin



code = Path('test.bin').read_bytes()
uc = Uc(UC_ARCH_X86, UC_MODE_64)
uc.mem_map(0x10000, 1024 * 1024)
uc.mem_write(0x10000, code)
uc.emu_start(0x10000, 0x10000 + len(code))
print("RDI",uc.reg_read(UC_X86_REG_RDI))



RDI 20000

unicorn example (2)

...
uc.hook_add(UC_HOOK_CODE, hook_code_func)
def hook_code_func(uc, addr, size, user_data):
    print(f"{addr:x} ({size} byte instruction): "
          f"{codecs.encode(
                uc.mem_read(addr, size), 'hex'
             ).decode()}")
uc.emu_start(0x10000, 0x10000 + len(code))

10000 (5 byte instruction): bf10270000
10005 (4 byte instruction): 486bff02

example tool: qiling

  • https://qiling.io

  • uses Unicorn emulator but adds…

  • emulation for a lot of system calls

    • including (hopefully) limiting file accesses to specific ‘‘virtual root’’ directory
  • loaders for common executable/bootloader formats


  • idea: get log of malware activity / add custom behaviors

PANDA.re

  • fork of emulator QEMU

  • supports whole-system record+replay


  • idea: run virtual machine with malware

  • replay run with analyses that can look at all instructions run

  • examples:

    • identify where dat from a specific file was used
    • search memory for string throughout execution
    • function call history