mitigate-stack

compiler generated code

    pushq %rbx
    sub $0x20,%rsp
/* copy value from thread-local storage */
    mov %fs:40,%rax
/* onto the stack */
    mov %rax,0x18(%rsp)
/* clear register holding value */
    xor %eax, %eax
    ...
    ...
/* copy value back from stack */
    mov 0x18(%rsp),%rax
/* xor to compare */
    xor %fs:40,%rax
/* if result non-zero, do not return */
    jne call_stack_chk_fail
    ret
call_stack_chk_fail:
    call __stack_chk_fail

stack layout: return address, followed by stack canary, followed by function's arrays and other temporaries

compiler generated code

    pushq %rbx
    sub $0x20,%rsp
/* copy value from thread-local storage */
    mov %fs:40,%rax
/* onto the stack */
    mov %rax,0x18(%rsp)
/* clear register holding value */
    xor %eax, %eax
    ...
    ...
/* copy value back from stack */
    mov 0x18(%rsp),%rax
/* xor to compare */
    xor %fs:40,%rax
/* if result non-zero, do not return */
    jne call_stack_chk_fail
    ret
call_stack_chk_fail:
    call __stack_chk_fail

stack layout: return address, followed by stack canary, followed by function's arrays and other temporaries

%fs:40 loaded with “canary” value
setup at program start

compiler generated code

    pushq %rbx
    sub $0x20,%rsp
/* copy value from thread-local storage */
    mov %fs:40,%rax
/* onto the stack */
    mov %rax,0x18(%rsp)
/* clear register holding value */
    xor %eax, %eax
    ...
    ...
/* copy value back from stack */
    mov 0x18(%rsp),%rax
/* xor to compare */
    xor %fs:40,%rax
/* if result non-zero, do not return */
    jne call_stack_chk_fail
    ret
call_stack_chk_fail:
    call __stack_chk_fail

stack layout: return address, followed by stack canary, followed by function's arrays and other temporaries

value copied to stack just below return address

compiler generated code

    pushq %rbx
    sub $0x20,%rsp
/* copy value from thread-local storage */
    mov %fs:40,%rax
/* onto the stack */
    mov %rax,0x18(%rsp)
/* clear register holding value */
    xor %eax, %eax
    ...
    ...
/* copy value back from stack */
    mov 0x18(%rsp),%rax
/* xor to compare */
    xor %fs:40,%rax
/* if result non-zero, do not return */
    jne call_stack_chk_fail
    ret
call_stack_chk_fail:
    call __stack_chk_fail

stack layout: return address, followed by stack canary, followed by function's arrays and other temporaries

trying to avoid info disclosure:
get canary value out of %rax
as soon as possible

stack canary

stack canary hopes

  • overwrite return address \(\implies\) overwrite canary
  • canary is secret

good choices of canary

  • random — guessing should not be practical
    • not always achieved— sometimes static or only \(2^{15}\) possiblities
  • GNU libc: canary contains random data and:
  • leading \0 (string terminator)
    • printf with %s won’t print it
    • copying a C-style string won’t write it
  • a newline
    • read line functions can’t input it
  • \xFF
    • hard to input?

stack canaries implementation

  • ‘‘StackGuard’’ — 1998 paper proposing strategy

  • GCC: command-line options

    • -fstack-protector
    • -fstack-protector-strong
    • -fstack-protector-all
    • one of these often default
    • three differ in how many functions are ‘protected’
  • Microsoft C/C++ compiler: /GS

    • on by default

stack canary overheads

  • less than 1% runtime if added to ‘‘risky’’ functions

    • functions with character arrays, etc.
  • large overhead if added to all functions

    • StackGuard paper: 5–20%?
  • similar space overheads


  • (for typical applications)

    • could be much worse: tons of ‘risky’ function calls

stack canaries pro/con

  • pro: no change to calling convention
  • pro: recompile only — no extra work
  • con: can’t protect existing executable/library files (without recompile)
  • con: doesn’t protect against many ways of exploiting buffer overflows
  • con: vulnerable to information leaks

stack canary summary

  • stack canary — simplest of many mitigations

  • key idea: detect corruption of return address

  • assumption: if return address changed, so is adjacent token

  • assumption: attacker can’t learn true value of token

    • often possible with memory bug

  • later: workarounds to break these assumptions

stack canary hopes

  • overwrite return address \(\implies\) overwrite canary
  • canary is secret

non-contiguous overwrites

void vulnerable() {
  int scores[8]; bool done = false;
  while (!done) {
    printf("Edit which score? (0 to 7) ");
    int i;
    scanf("%d\n", &i);
    /* Oops!
       sizeof(scores) is 4 * sizeof(int) */
    if (i < 0 || i >= sizeof(scores))
      continue;
    printf("Set to what value? ");
    scanf("%d", &scores[i]);
    ...
  }
  ...
}

exercise: non-contiguous overwrites

void vulnerable() {
  int scores[8]; bool done = false;
  while (!done) {
    printf("Edit which score? (0 to 7) ");
    int i;
    scanf("%d\n", &i);
    /* Oops!
       sizeof(scores) is 4 * sizeof(int) */
    if (i < 0 || i >= sizeof(scores))
      continue;
    printf("Set to what value? ");
    scanf("%d", &scores[i]);
    ...
  }
  ...
}

to set return address to 0x123456789, set what scores to what values?

exercise solution

0x123456789 =
0x0000 0001 2345 6789 as little-endian bytes =
89 67 45 23 01 00 00 00
[89 67 45 23] [01 00 00 00]
0x2345678 0x1
  • set score 8 to 36984440 (0x2345678), score 9 to 1

stack canary hopes

  • overwrite return address \(\implies\) overwrite canary
  • canary is secret

information disclosure (1a)

void vulnerable() {
    int value;
    for (;;) {
        command = ReadInput();
        if (command == "set") {
            value = ReadIntInput();
        } else if (command == "get") {
            printf("%d\n", value);
        } else if ...
    }
}
  • ‘‘get’’ command: can read uninitialized value
  • example: when I compiled this, value was stored on the stack

information disclosure (1b)

void vulnerable() {
    int value;
    ...
        } else if (command == "get") {
            printf("%d\n", value);
        }
    ...
}
void leak() {
    int secrets[] = { 
        12345678, 23456789, 34567890,
        45678901, 56789012, 67890123,
    };  
    do_something_with(secrets);
}
int main() {leak(); vulnerable();}

running this program (input in bold):

get
67890123

information disclosure (2)

void process() {
    char buffer[8] = "\0\0\0\0\0\0\0\0";
    char c = ' ';
    for (int i = 0; c != '\n' && i < 8; ++i) {
        c = getchar();
        buffer[i] = c;
    }
    printf("You input %s\n", buffer);
}
  • input aaaaaaaa
  • output You input aaaaaaaawhatever was on stack

information disclosure (3)

struct foo {
    char buffer[8];
    long *numbers;
};

void process(struct foo* thing) {
    ...
    scanf("%s", thing->buffer);
    ...
    printf("first number: %ld\n", thing->numbers[0]);
}
  • input: aaaaaaaa(address of canary)

    • address on stack or where canary is read from in thread-local storage

recall: compiler register clearing

  • compiler clearing out registers of stack canary
    • prevents leaks from reuse of register for variable
    • or register being pushed to stack or similar
  • stack canary stored in separate memory regoin
    • won’t find next to global variable or similar
    • … but it’s still next to stack arrays

exercise (1)

struct point {
    int x, y, z;
};
struct point p;
...
    if (command == "get") { 
        /* 'p' could be uninitialized */
        printf("%d,%d,%d\n", p.x, p.y, p.z);
    } ...
...
  • Suppose p (‘‘left over’’ from prior use of register, etc.) is stored at the same address of an ‘leftover’ copy of the 8-byte stack canary. If 999999,44444,333333 is output, how do we compute the stack canary value?

some early stack canary benchmarks

from Chiueh and Hsu, ‘‘RAD: A Compile-Time Solution to Buffer Overflow Attacks’’ (2001)

intuition: shadow stacks

  • problem with stack: easy to leak address/values because used for lots of data


  • goal: keep sensitive data in separate region

    • easier to kepe address secret?

  • can use this for (stronger?) alternative to stack canaries

shadow stacks

implementing shadow stacks

  • bigger changes to compiler than canaries
  • more overhead to call/return from function
  • most commonly: store return address twice

shadow stacks on x86-64 (1)

  • idea 1: dedicate %r15 as shadow stack pointer,
    copy RA to shadow stack pointer in function prologue
function:
    movq (%rsp), %rax    // RAX <- return address
    addq $-8, %r15       // R15 <- R15 - 8
    movq %rax, (%r15)    // M[R15] <- RAX
    ...
    movq (%rsp), %rdx     // RDX <- return address
    cmpq %rdx, (%r15)    
    jne CRASH_THE_PROGRAM // if RDX != M[R15] goto CRASH_THE_PROGRAM
    add $8, %r15          // R15 <- R15 - 8
    ret

shadow stacks on x86-64 (2)

  • idea 2: dedicate %r15 as shadow stack pointer,
    avoid normal call/return instruction
    addq $-8, %r15
    leaq after_call(%rip), %rax
    movq %rax, (%r15)
    jmp function
after_call:

function:
    ...
    addq $8, %r15        // R15 <- R15 + 8
    jmp *-8(%r15)        // jmp M[R15-8]

Android/AArch64 shadow stacks (1)

str     x30, [x18], #8      
stp     x29, x30, [sp, #-16]!
mov     x29, sp
bl      bar
add     w0, w0, #1
ldp     x29, x30, [sp], #16
ldr     x30, [x18, #-8]!
ret
stp     x29, x30, [sp, #-16]!
mov     x29, sp
bl      bar
add     w0, w0, #1
ldp     x29, x30, [sp], #16
ret

Android/AArch64 shadow stacks (2)

  • -fsanitize=shadowcallstack
  • supported on 64-bit ARM and RISC V only
  • ‘‘An x86_64 implementation was evaluated using Chromium and was found to have critical performance and security deficiencies’’

Intel CET shadow stacks

  • recent Intel processor extension adds shadow stacks

    • ‘‘Control-flow Enforcement Technology’’

  • new shadow stack pointer

  • CALL/RET: push/pop from BOTH stacks

  • shadow stack also protected from writes by hardware + OS

    • cannot be written through normal instructions
    • modification to page table structures

automatic shadow stacks?

  • if we change how CALL/RET works…

  • … maybe we can add shadow stack support to existing programs?

    • either with hardware support, or
    • in software with emulation techniques?

  • well, there’s a problem…

the problem in C++

void Foo() {
    try {
        ... Bar() ...
    } except (std::runtime_error &error) {
        ...
    }
}

void Bar() {
    ... Quux() ...
}
void Quux() {
    ...
    throw std::runtime_error("...");
    ...
}

the problem in C

jmp_buf env;
const char *error;
void Foo() {
    if (0 == setjmp(env)) {
        Bar();
    } else {
        ...
    }
}

void Bar() {
    ... Quux() ...
}
void Quux() {
    ...
    error = "...";
    longjmp(env, 1);
    ...
}

shadow stacks and non-lcoal returns

  • need to modify these functions to support shadow stacks, it seems?
  • violates idea of hardware extension that modifies CALL/RET operation

one way: dealing with non-local returns

  • exceptions and setjmp/longjmp deliberately skip return calls

  • one solution: ‘‘direct’’ shadow stack

  • fixed (possibly secret) offset from normal stack

  • shadow stack only stores return addreses

    • space in between return addresses left as nulls

CET and shadow-stack manipulation

  • Intel CET has instructions to manipulate shadow stack pointer

  • RDSSP (read shadow stck pointer)

    • used by glibc setjmp
  • INCSSP (increment shadow stack pointer)

    • apparently used by glibc longjmp in common case

  • also some functionality for switching shadow stacks

Backup slides

preventing shadow stack writes?

  • ARM64 scheme: prevent writes if

    • shadow stack pointer is never leaked (dedicated register)
    • shadow stack random location can’t be guessed (or queried otherwise)
  • Intel CET: prevent writes unless

    • OS (priviliged/kernel mode) instructions to setup shadow stack used

  • can we prevent writes without relying on avoiding info leaks…
    and without special hardware support?

    • well, yes, but …

what do shadow stacks stop?

  • combined with a information leak that can dump arbitrary bytes of memory,
    which of these exploits would shadow stacks stop…

    • A. using format string exploit to point stack return address to the ‘system’ function
    • B. using format string exploit to point VTable to the ‘system’ function
    • C. using an unchecked string copy that goes over the end of a stack buffer into the return address and pointing the return address to the ‘system’ function
    • D. using a buffer overflow that overwrites a saved stack pointer value to cause return to use a different address
    • E. using pointer subterfuge to overwrite the GOT entry for ‘printf’ to point to the ‘system’ function