spectre

check_passphrase

int check_passphrase(const char *versus) {
    int i = 0;
    while (passphrase[i] == versus[i] &&
           passphrase[i]) {
        i += 1;
    }
    return (passphrase[i] == versus[i]);
}

number of iterations = number matching characters
leaks information about passphrase, oops!

exploiting check_passphrase (1)

guess	measured time
aaaa	\(100\pm5\)
baaa	\(103\pm4\)
caaa	\(102\pm6\)
daaa	\(111\pm5\)
eaaa	\(99 \pm6\)
faaa	\(101\pm7\)
gaaa	\(104\pm4\)
…	…

exploiting check_passphrase (2)

guess	measured time
daaa	\(102\pm5\)
dbaa	\(99\pm4\)
dcaa	\(104\pm4\)
ddaa	\(100\pm6\)
deaa	\(102\pm4\)
dfaa	\(109\pm7\)
dgaa	\(103\pm4\)
…	…

not just timing — power analysis

From Ross Anderson, Security Engineering, Third Edition

not just timing — radio frequency

From Kuhn, ‘‘Electromagnetic Eavesdropping Risks of Flat-Panel Displays’’ (PET 2004)

timing and cryptography

lots of asymmetric cryptography uses big-integer math
example: multiplying 500+ bit numbers together
how do you implement that?

big integer multiplication

say we have two 64-bit integers \(x\), \(y\)
- and want to 128-bit product, but our multiply instruction only does 64-bit products
one way to multiply:
divide \(x\), \(y\) into 32-bit parts: \(x=x_1\cdot2^{32}+x_0\) and \(y=y_1\cdot2^{32}+y_0\)
then \(xy = x_1y_12^{64}+x_1y_0\cdot2^{32}+x_0y_1\cdot2^{32}+x_0y_0\)
can extend this idea to arbitrarily large numbers
number of smaller multiplies depends on size of numbers!

big integers and cryptography

naive multiplication idea:
- number of steps depends on size of numbers
problem: sometimes the value of the number is a secret
- e.g. part of the private key
oops! revealed through timing

big integer timing attacks in practice (1)

early versions of OpenSSL (TLS implementation) had timing attack
- Brumley and Boneh, ‘‘Remote Timing Attacks are Practical’’ (Usenix Security ’03)
attacker could figure out bits of private key from timing
why? variable-time multiplication and modulus operations
- got faster/slower depending on how input was related to private key
- e.g. detect when attacker-value MOD private key part is close to 0

big integer timing attacks in practice (2)

Figure 3a from Brumley and Boneh, ‘‘Remote Timing Attacks are Practical’’

browsers and website leakage

web browsers run code from untrusted webpages
one goal: can’t tell what other webpages you visit

some webpage leakage (1)

convenient feature 1: browser marks visited links

<script>
var the_color = window.getComputedStyle(
    document.querySelector('a[href=~"foo.com"]')
).color
if (the_color == ...) { ... }
</script>

convenient feature 2: scripts can query current color of something

some webpage leakage (1)

convenient feature 1: browser marks visited links

<script>
var the_color = window.getComputedStyle(
    document.querySelector('a[href=~"foo.com"]')
).color
if (the_color == ...) { ... }
</script>

~~convenient feature 2: scripts can query current color of something~~
- fix 1: getComputedStyle lies about the color
- fix 2: limited styling options for visited links

some webpage leakage (2)

one idea: script in webpage times loop that writes big array
variation in timing depends on other things running on machine

turns out, other webpages create distinct ``signatures’’

Figure from Cook et al, ``There’s Always a Bigger Fish: Clarifying Analysis of a Machine-Learning-Assisted Side-Channel Attack’’ (ISCA ’22)

side channels

observing machine operation, indirectly
common types of indirect observations
- timing
- power draw
- electromagnetic radiation
side channels can be security and privacy issues
- often used to reveal internal/secret state

our focus

we’re going to focus on timing-related leaks…
and interaction with attacker input/processor design
discovery: lots of cases where features that make code/processors fast
let attackers learn much more than they should through timing

inferring cache accesses (1)

suppose I time accesses to array of chars:
- reading array[0]: 3 cycles
- reading array[64]: 4 cycles
- reading array[128]: 4 cycles
- reading array[192]: 20 cycles
- reading array[256]: 4 cycles
- reading array[288]: 4 cycles
- …
what could cause this difference?
- array[192] not in some cache, but others were

inferring cache accesses (2)

some psuedocode:

char array[CACHE_SIZE];
AccessAllOf(array);
*other_address += 1;
TimeAccessingArray();

suppose during these accesses I discover that array[128] is slower to access
probably because *other_address loaded into cache + evicted it
what do we know about other_address? (select all that apply)

A. same cache tag	B. same cache index	C. same cache offset
D. diff. cache tag	E. diff. cache index	F. diff. cache offset

some complications

caches often use physical, not virtual addresses
- (and need to know about physical address to compare index bits)
- (but can infer physical addresses with measurements/asking OS)
- (often OS allocates contiguous physical addresses esp. w/‘large pages’)
storing/processing timings evicts things in the cache
- (but can compare timing with/without access of interest to check this)
processor ‘‘pre-fetching’’ may load things into cache before access is timed
- (but can arrange accesses to avoid triggering prefetcher
  and make sure to measure with memory barriers)
some L3 caches use a simple hash function to select index instead of index bits

exercise: inferring cache accesses (1)

char *array;
array = AllocateAlignedPhysicalMemory(CACHE_SIZE);
LoadIntoCache(array, CACHE_SIZE);
if (mystery) {
    *pointer += 1;
}
if (TimeAccessTo(&array[index]) > THRESHOLD) {
    /* pointer accessed */
}

suppose pointer is 0x1000188
and cache (of interest) is direct-mapped, 32768 (\(2^{15}\)) byte, 64-byte blocks
what array index should we check?

solution

array = AllocateAlignedPhysicalMemory(CACHE_SIZE);
LoadIntoCache(array, CACHE_SIZE);
if (mystery) { *pointer = 1; }
if (TimeAccessTo(&array[index]) > THRESHOLD) { /* pointer accessed */ }

\(2^{15}\) byte direct mapped cache, \(64=2^{6}\) byte blocks
9 index bits, 6 offset bits
0x1000188: … 0000 0001 1000 1000
array[0] starts at multiple of cache size — index 0, offset 0
to get index 6, offset 0 array[0b1 1000 0000] = array[0x180]

aside

array = AllocateAlignedPhysicalMemory(CACHE_SIZE);
LoadIntoCache(array, CACHE_SIZE);
if (mystery) { *pointer += 1; }
if (TimeAccessTo(&array[index]) > THRESHOLD) {
    /* pointer accessed */
}

will this detect when pointer accessed? yes
will this detect if mystery is true? not quite
… because branch prediction could started cache access

exercise: inferring cache accesses (2)

char *other_array = ...;
char *array;
array = AllocateAlignedPhysicalMemory(CACHE_SIZE);
LoadIntoCache(array, CACHE_SIZE);
other_array[mystery] += 1;
for (int i = 0; i < CACHE_SIZE; i += BLOCK_SIZE) {
    if (TimeAccessTo(&array[i]) > THRESHOLD) {
        /* found something interesting */
    }
}

other_array at 0x200400, and interesting index is i=0x800, then what was mystery?

solution

array = AllocateAlignedPhysicalMemory(CACHE_SIZE);
LoadIntoCache(array, CACHE_SIZE);
other_array[mystery] += 1;
for (int i = 0; i < CACHE_SIZE; i += BLOCK_SIZE) {
    if (TimeAccessTo(&array[i]) > THRESHOLD) { ... }
}

at i=0x800: … 0000 1000 0000 0000 (cache index = 0x20)
other_array at 0x200400
Q: 0x200400 + X has cache index 0x20?

`0x200400`		…`0``000 0100 00``00 0000`
\(+\) X	\(+\)	…`0``000 0100 00``00 0000`

`0x200400` + X		…`?``000 1000 00``00 0000`

exercise: inferring cache accesses (2)

char *array;
posix_memalign(&array, CACHE_SIZE, CACHE_SIZE);
LoadIntoCache(array, CACHE_SIZE);
if (mystery) {
    *pointer = 1;
}
if (TimeAccessTo(&array[index1]) > THRESHOLD ||
    TimeAccessTo(&array[index2]) > THRESHOLD) {
    /* pointer accessed */
}

pointer is 0x1000188
cache is 2-way, 32768 (\(2^{15}\)) byte, 64-byte blocks, ???? replacement
what array indexes should we check?

reading a value — general pattern

char *array;
posix_memalign(&array, CACHE_SIZE, CACHE_SIZE);
AccessAllOf(array);
other_array[mystery] += 1;
for (int i = 0; i < CACHE_SIZE; i += BLOCK_SIZE) {
    if (CheckIfSlowToAccess(&array[i])) {
        ...
    }
}

previous idea: learn bits of mystery that correspond to index bits
- compute index bits of other_array + mystery as function of mystery
- see how it matches index bits of array + i

reading a value — simpler case

char *array;
posix_memalign(&array, CACHE_SIZE, CACHE_SIZE);
AccessAllOf(array);
other_array[mystery * BLOCK_SIZE] += 1;
for (int i = 0; i < CACHE_SIZE; i += BLOCK_SIZE) {
    if (CheckIfSlowToAccess(&array[i])) {
        ...
    }
}

if other_array is char* and starts at multiple of cache size
and array[0x6*BLOCK_SIZE] is slow to access
mystery == 0x6 + K * SET_COUNT

PRIME+PROBE

name in literature: PRIME + PROBE
PRIME: fill cache (or part of it) with values
do thing that uses cache
PROBE: access those values again and see if it’s slow
(one of several ways to measure how cache is used)
coined in attacks on AES encryption

example: AES (1)

from Osvik, Shamir, and Tromer, ‘‘Cache Attacks and Countermeasures: the Case of AES’’ (2004)
early AES implementation used lookup tables
goal: detect index into lookup table
- index depended on key + data being encrypted
tricks they did to make this work
- vary data being encrypted
- subtract average time to look for what changes
- lots of measurements

example: AES (2)

from Osvik, Shamir, and Tromer, ‘‘Cache Attacks and Countermeasures: the Case of AES’’ (2004)

revisiting an earlier example (1)

char *array;
posix_memalign(&array, CACHE_SIZE, CACHE_SIZE);
LoadIntoCache(array, CACHE_SIZE);
if (mystery) {
    *pointer += 1;
}
if (TimeAccessTo(&array[index]) > THRESHOLD) {
    /* pointer accessed */
}

what if mystery is false but branch mispredicted?

revisiting an earlier example (2)

avoiding/triggering this problem

if (... /*something false*/) {
    access *pointer;
}

what can we do to make access more/less likely to happen?

reading a value without really reading it

char *array;
posix_memalign(&array, CACHE_SIZE, CACHE_SIZE);
AccessAllOf(array);
if (something false) {
    other_array[mystery * BLOCK_SIZE] += 1;
}
for (int i = 0; i < CACHE_SIZE; i += BLOCK_SIZE) {
    if (CheckIfSlowToAccess(&array[i])) {
        ...
    }
}

if branch mispredicted, cache access may still happen
so can find the value of mystery despite if

seeing past a segfault? (1)

Prime();
if (... /*something false*/) {
    triggerSegfault();
    Use(*pointer);
}
Probe();

could cache access for *pointer still happen?
yes, if:
- branch for if statement mispredicted, and
- *pointer starts before segfault detected

seeing past a segfault? (2)

operations in virtual memory lookup:
- translate virtual to physical address
- check if access is permitted by permission bits
Intel processors: looks like these were separate steps, so…

Prime();
if (something false) {
    int value = ReadMemoryMarkedNonReadableInPageTable();
    access other_array[value * ...];
}
Probe();

Meltdown

from Lipp et al, ‘‘Meltdown: Reading Kernel Memory from User Space’’

    // %rcx = kernel address
    // %rbx = array to load from to cause eviction
    xor %rax, %rax      // rax <- 0
retry:
    // rax <- memory[kernel address] (segfaults)
    // but check for segfault done out-of-order on Intel at time
    movb (%rcx), %al
    // rax <- memory[kernel address] * 4096 \[speculated]
    shl $0xC, %rax
    jz retry            // not-taken branch
    // access array[memory[kernel address] * 4096]
    mov (%rbx, %rax), %rbx

space out accessed by 4096
ensure sepaqrate cache sets and
avoid triggering prefetcher

repeat access if zero
apparently value of zero speculatively read
when real value not yet available

acesss cache to allow measurement in later
in paper with FLUSH+RELOAD instead of PRIME+PROBE

segfault actually happens eventually
option 1: okay, just start a new process each time
option 2: suppress segfault
(paper used (obscure) transactional memory support,
conceptually, could have used mispredicted branch instead)

Meltdown fix

HW: permissions check done with/before physical address lookup
- was already done by AMD, ARM apparently?
- now done by Intel
SW: separate page tables for kernel and user space
- don’t have sensitive kernel memory pointed to by page table
  when user-mode code running
- unfortunate performance problem
- exceptions start with code that switches page tables

Spectre

Meltdown: address translation without permissions check
seems relatively easy to fix in hardware
but idea of leaks from speculative execution has much harder to fix attacks…

reading a value without really reading it

char *array;
posix_memalign(&array, CACHE_SIZE, CACHE_SIZE);
AccessAllOf(array);
if (something false) {
    other_array[mystery * BLOCK_SIZE] += 1;
}
for (int i = 0; i < CACHE_SIZE; i += BLOCK_SIZE) {
    if (CheckIfSlowToAccess(&array[i])) {
        ...
    }
}

if branch mispredicted, cache access may still happen
so can find the value of mystery despite if

mistraining branch predictor?

if (something) {
    CodeToRunSpeculatively()
}

useful for attacks:
have ‘something’ be false, but predicted as true
one way: run lots of times with something true then do actually run with something false
another way: learn how branch prediction caches work, run code that fills in caches in known way

reading a value — general pattern

char *array;
posix_memalign(&array, CACHE_SIZE, CACHE_SIZE);
AccessAllOf(array);
other_array[mystery] += 1;
for (int i = 0; i < CACHE_SIZE; i += BLOCK_SIZE) {
    if (CheckIfSlowToAccess(&array[i])) {
        ...
    }
}

previous idea: learn bits of mystery that correspond to index bits
- compute index bits of other_array + mystery as function of mystery
- see how it matches index bits of array + i

reading a value — simpler case

char *array;
posix_memalign(&array, CACHE_SIZE, CACHE_SIZE);
AccessAllOf(array);
other_array[mystery * BLOCK_SIZE] += 1;
for (int i = 0; i < CACHE_SIZE; i += BLOCK_SIZE) {
    if (CheckIfSlowToAccess(&array[i])) {
        ...
    }
}

if other_array is char* and starts at multiple of cache size
and array[0x6*BLOCK_SIZE] is slow to access
mystery == 0x6 + K * SET_COUNT

reading a value — * X * BLOCK_SIZE

char *array;
posix_memalign(&array, CACHE_SIZE, CACHE_SIZE);
AccessAllOf(array);
other_array[mystery * 8 * BLOCK_SIZE] += 1;
for (int i = 0; i < CACHE_SIZE; i += BLOCK_SIZE) {
    if (CheckIfSlowToAccess(&array[i])) {
        ...
    }
}

if other_array is char* and starts at multiple of cache size
and array[0x100*BLOCK_SIZE] is slow to access
mystery * 8 = 0x100 + K * SET_COUNT
mystery = 32 + K * SET_COUNT / 8

contrived(?) vulnerable code (1)

suppose this C code is run with extra privileges
- (e.g. in system call handler, library called from JavaScript in webpage, etc.)
assume x chosen by attacker
(example from original Spectre paper)

if (x < array1_size)
        y = array2[array1[x] * 4096];

the out-of-bounds access (1)

char array1[...];
...
int secret;
...
y = array2[array1[x] * 4096];

suppose array1 is at 0x1000000 and
secret is at 0x103F0003;
what x do we choose to make array1[x] access first byte of secret?

the out-of-bounds access (2)

unsigned char array1[...];
...
int secret;
...
y = array2[array1[x] * 4096];

suppose our cache has 64-byte blocks and 8192 sets
and array2[0] is char stored in cache set 0
if the above evicts something in cache set 128,
then what do we know about array1[x]?

the out of bounds access (2) soln

array2[array1[x] * 4096] accesses set with index =
(array1[x] * 4096 / BLOCK_SIZE mod 8192)
- using division + modulus to extract index bits
know that set number is 128 from probing
array1[x] * 64 = 128 (mod 8192)
\(\rightarrow\) array1[x] * 64 = 128 + 8192 * K
array1[x] = 2 + 128 * K

exploit with contrived(?) code

/* in kernel: */
int systemCallHandler(int x) {
    if (x < array1_size)
        y = array2[array1[x] * 4096];
    return y;
}

/* exploiting pseudocode */
    /* step 1: mistrain branch predictor */
for (a lot) {
    systemCallHandler(0 /* less than array1_size */);
}
    /* step 2: evict from cache using misprediction */
Prime();
systemCallHandler((targetAddress - array1Address) / A1ElemSize);
int evictedSet = ProbeAndFindEviction();
int targetValue = (evictedSet - array2StartSet) / setsPer4KA2Elem;

really contrived?

char *array1; char *array2;
if (x < array1_size)
    y = array2[array1[x] * 4096];

times 4096 shifts so we can get lower bits of target value
- so all bits effect what cache block is used

int *array1; int *array2;
if (x < array1_size)
    y = array2[array1[x]];

will still get upper bits of array1[x] (can tell from cache set)
- still likely to be sensitive data

bounds check in kernel

void SomeSystemCallHandler(int index) {
    if (index > some_table_size) 
        return ERROR;
    int kind = table[index];
    switch (other_table[kind].foo) {
        ...
    }
}

if (x < array1_size) {
    y = array2[array1[x]];
}

generalizing exploit

limited in what address we can learn about based on how big entries in tables are
- but can combine multiple Spectre-type exploits
- only need one secret value leaked
need to adjust calculations to actual addresses / array element sizes / etc.

privilege levels?

vulnerable code runs with higher privileges
so far: higher privileges = kernel mode
but other common cases of higher privileges
example: scripts in web browsers

JavaScript

JavaScript: scripts in webpages
not supposed to be able to read arbitrary memory, but…
can access arrays to examine caches
and could take advantage of some browser function being vulnerable
or — doesn’t even need browser to supply vulnerable code itself!

just-in-time compilation?

for performance, compiled to machine code, run in browser
not supposed to be access arbitrary browser memory
example JavaScript code from paper:

if (index < simpleByteArray.length) {
    index = simpleByteArray[index | 0];
    index = (((index * 4096)|0) & (32*1024*1024-1))|0;
    localJunk ˆ= probeTable[index|0]|0;
}

web page runs a lot to train branch predictor
then does run with out-of-bounds index
examines what’s evicted by probeTable access

supplying own attack code?

JavaScript: could supply own attack code
turns out also possible with kernel mode scenario
trick: don’t need to actually run code for real
… just need branch predictor to fetch it
so it gets partially executed speculatively

other misprediction

so far: talking about mispredicting direction of branch
what about mispredicting target of branch in, e.g.:

// possibly from C code like:
//   (*function_pointer)();
jmp *%rax           

// possibly from C code like:
//      switch(rcx) { ... }
jmp *(%rax,%rcx,8)

an idea for predicting indirect jumps

for jmps like jmp *%rax predict target with cache:

bottom 12 bits of jmp address	last seen target
0x0–0x7	0x200000
0x8–0xF	0x440004
0x10-0x18	0x4CD894
0x18-0x20	0x510194
0x20-0x28	0x4FF194
…	…
0xFF8–0xFFF	0x3F8403

Intel Haswell CPU did something similar to this
- uses Hash(bits of last several jumps), not just last jmp

using mispredicted jump

1: find some kernel function with jmp *%rax
2: mistrain branch target predictor for it to jump to chosen code
- use code at address that conflicts in ‘‘recent jumps cache’’
- since only bottom bits are used, can set this up in user memory
3: have chosen code be attack code (e.g. array access)
- either write special code OR
- find suitable instructions (e.g. array access) in existing kernel code
4: run the kernel function

Spectre variants

showed Spectre variant 1 (array bounds), 2 (indirect jump)
- from original paper

other possible variations:
- could cause other things to be mispredicted
  - prediction of where functions return to?
  - values instead of which code is executed?
- could use side-channel other than data cache changes
  - instruction cache
  - cache of pending stores not yet committed
  - contention for resources on multi-threaded CPU core
  - branch prediction changes
  - …

some Linux kernel mitigations (1)

replace array[x] with array[x & ComputeMask(x, size)]
… where ComputeMask() returns
- 0 if x \(>\) size
- 0xFFFF..F if x \(\le\) size
… and ComputeMask() does not use jumps:

mov x, %r8
mov size, %r9
cmp %r9, %r8
sbb %rax, %rax  // sbb = subtract with borrow
    // either 0 or -1

some Linux kernel mitigations (2)

for indirect branches:
with hardware help:
- separate indirect (computed) branch prediction for kernel v user mode
- other branch predictor changes to isolate better
without hardware help:
- transform jmp *(%rax), etc. into code that
  will only predicted to jump to safe locations
  (by writing assembly very carefully)

only safe prediction

as replacement for jmp *%rax
code from Intel’s ‘‘Retpoline: A Branch Target Injection Mitigation’’

        call load_label
    capture_ret_spec:    /* <-- want prediction to go here */
        pause
        lfence
        jmp capture_ret_spec
    load_label:
        mov %rax, (%rsp)
        ret

not just BLOCK_SIZE

char *array, *other_array;
// PRIME
posix_memalign(&array, CACHE_SIZE, CACHE_SIZE);
AccessAllOf(array);
// (some code we don't control)
other_array[mystery * N] += 1;  // previously: * BLOCK_SIZE
// PROBE
for (int i = 0; i < CACHE_SIZE; i += BLOCK_SIZE) {
    if (CheckIfSlowToAccess(&array[i])) {
    ...
    }
}

64KB (\(2^{16}\)B) direct-mapped cache with 64B blocks
array[0x800] slow to access?
other_array at 0x4000000 (index 0, offset 0)
value of mystery if N = 1? N = 32 * 64?

solution (N=1)

\[\begin{align*} \left\lfloor\text{mystery} * N / \text{BLOCK_SIZE}\right\rfloor~\text{mod}~1024 & = & 32 \\ \left\lfloor\text{mystery} * N / \text{BLOCK_SIZE}\right\rfloor & = & 32 + 1024K \\ \end{align*}\]
let offset be some number in [0,BLOCK_SIZE):
\[\begin{align*} \text{mystery} * N & = & \text{BLOCK_SIZE}\times(32+1024Z) + \text{offset}\\ \text{mystery} & = & \text{BLOCK_SIZE}\times(32+1024Z) + N\times\text{offset} \\ \text{mystery} & = & 64\times(32+1024Z)+N\times\text{offset} \\ \end{align*}\]
N=1: mystery = \(2048\), \(2049\), \(2050\), …, \(2048+63\), \(64\cdot1024+2048\), \(64\cdot1024+2048+1\), …

exercise (N=`32*64`)

what if N = 32*64
recall: other_array[0] is set 0, offset 0
other_array[mystery * N] is set 32
possible values of mystery?

\[\begin{align*} \text{mystery}\cdot 32\cdot 64 & = & 64(32+1024Z) + \text{offset} \\ & = & 64\cdot32 + 65536Z + \text{offset}\\ \text{mystery} & = & 1 + \frac{65536}{64\cdot32}Z + \frac{\text{offset}}{64\cdot32} = 1+32Z \\ \end{align*}\]

alternate view

learn index bits of mystery * N
this example: bits 6–15
N = 1, bits 6–15 of mystery
N = 64, bits 0–9 of mystery
N = 32*64 (\(2^{11}\)), bits 0–4 of mystery

exercise

char *array;
// PRIME
posix_memalign(&array, CACHE_SIZE, CACHE_SIZE);
AccessAllOf(array);
// (some code we don't control)
other_array[mystery * BLOCK_SIZE] += 1;
// PROBE
for (int i = 0; i < CACHE_SIZE; i += BLOCK_SIZE) {
    if (CheckIfSlowToAccess(&array[i])) {
    ...
    }
}

64KB (\(2^{16}\)B) direct-mapped cache with 64B blocks
array[0x800] slow to access;
other_array at 0x4000000
value of mystery?

exercise solution (1)

NUM_SETS = 64KB/64B = 1K (1024) sets
array[0x800] has cache index 0x800/BLOCK_SIZE mod NUM_SETS
- = cache index 32
know other_array[mystery * BLOCK_SIZE] had same index
other_array[0] at cache index 0
- (0x4000000 / BLOCK_SIZE) mod NUM_SETS = 0

exercise solution (2)

recall have found:
- other_array[0] at index 0;
- other_array[mystery*BLOCK_SIZE] has index 32 (same as array[0x800])
other_array[X] at cache index (0 + X/BLOCK_SIZE mod NUM_SETS)
- advanced by X/BLOCK_SIZE blocks
- wrapping around after NUM_SETS blocks

X = mystery * BLOCK_SIZE
32 = 0 + mystery mod NUM_SETS
mystery = 32 or 32 \(\pm\) 1024 or 32 \(\pm\) 1024 \(\times\) 2 or etc.

variation: different starting location

other_array starts at 0x4001440
then other_array[0] at cache index
- 0x4001440 / BLOCK_SIZE mod NUM_SETS = 0x51
(0x51 + mystery * BLOCK_SIZE / BLOCK_SIZE) mod NUM_SETS = 32
mystery = -49 or 975 or 1099 or …

variation: associative cache

char *array;
// PRIME
posix_memalign(&array, CACHE_SIZE, CACHE_SIZE);
AccessAllOf(array);
// (some code we don't control)
other_array[mystery * BLOCK_SIZE] += 1;
// PROBE
for (int i = 0; i < CACHE_SIZE; i += BLOCK_SIZE) {
    if (CheckIfSlowToAccess(&array[i])) { ...  }
}

suppose 2-way 64KB cache instead of direct-mapped
NUM_SETS = 64KB/2/64B = 512 sets
array[0x800] still has cache index 32 (still)
but now mystery can be \(32\) or \(32+512\) or \(32+512\cdot2\) or …

variation: associative cache (2)

char *array;
// PRIME
posix_memalign(&array, CACHE_SIZE, CACHE_SIZE);
AccessAllOf(array);
// (some code we don't control)
other_array[mystery * BLOCK_SIZE] += 1;
// PROBE
for (int i = 0; i < CACHE_SIZE; i += BLOCK_SIZE) {
    if (CheckIfSlowToAccess(&array[i])) { ...  }
}

suppose 2-way 64KB cache w/ 64B and array[0x8800] is slow
0x8800/BLOCK_SIZE = 544 = 512 + 32
since 512 sets total, still set index 32
mystery still \(32\) or \(32+512\) or \(32+512\cdot2\) or …

exercise

if 4-way 64KB cache w/64B blocks and something from cache set 32 evicted,
then where could slow access be?
- recall: 2-way cache: i=0x800, i=0x8800
A. i=0x400, i=0x800, i=0x8400, i=0x8800
B. i=0x800, i=0x8800, i=0x10800, i=0x18800
C. i=0x800, i=0x4800, i=0x8800, i=0xc800
D. i=0x800, i=0x4800, i=0x8800, i=0x10800
E. something else

EVICT+RELOAD

PRIME+PROBE: fill cache, detect eviction
alternate idea EVICT+RELOAD:

unsigned char *probe_array;
posix_memalign(&probe_array, CACHE_SIZE, CACHE_SIZE);
access OTHER things to evict all of probe_array
if (something false) {
    read probe_array\[mystery * BLOCK_SIZE];
}
check which value from probe_array is faster

requires code to access something you can access
but often easier to setup/more reliable than PRIME+PROBE

into exploit: Meltdown

uint8_t* probe_array = new uint8_t[256 * 4096];
// ... Make sure probe_array is not cached
uint8_t kernel_memory_val = *(uint8_t*)(kernel_address);
uint64_t final_kernel_memory = kernel_memory_val * 4096;
uint8_t dummy = probe_array[final_kernel_memory];
// ... catch page fault
// ... in signal handler, determine which of 256 slots in probe_array is cached

spectre

check_passphrase

exploiting check_passphrase (1)

exploiting check_passphrase (2)

not just timing — power analysis

not just timing — radio frequency

timing and cryptography

big integer multiplication

big integers and cryptography

big integer timing attacks in practice (1)

big integer timing attacks in practice (2)

browsers and website leakage

some webpage leakage (1)

some webpage leakage (1)

some webpage leakage (2)

side channels

our focus

inferring cache accesses (1)

inferring cache accesses (2)

some complications

exercise: inferring cache accesses (1)

solution

aside

exercise: inferring cache accesses (2)

solution

exercise: inferring cache accesses (2)

reading a value — general pattern

reading a value — simpler case

PRIME+PROBE

example: AES (1)

example: AES (2)

revisiting an earlier example (1)

revisiting an earlier example (2)

avoiding/triggering this problem

reading a value without really reading it

seeing past a segfault? (1)

seeing past a segfault? (2)

Meltdown

Meltdown fix

Spectre

reading a value without really reading it

mistraining branch predictor?

reading a value — general pattern

reading a value — simpler case

reading a value — * X * BLOCK_SIZE

contrived(?) vulnerable code (1)

the out-of-bounds access (1)

the out-of-bounds access (2)

the out of bounds access (2) soln

exploit with contrived(?) code

really contrived?

bounds check in kernel

generalizing exploit

privilege levels?

JavaScript

just-in-time compilation?

supplying own attack code?

other misprediction

an idea for predicting indirect jumps

using mispredicted jump

Spectre variants

some Linux kernel mitigations (1)

some Linux kernel mitigations (2)

only safe prediction

not just BLOCK_SIZE

solution (N=1)

exercise (N=32*64)

alternate view

exercise

exercise solution (1)

exercise solution (2)

variation: different starting location

variation: associative cache

variation: associative cache (2)

exercise

EVICT+RELOAD

into exploit: Meltdown

backup slides

exercise (N=`32*64`)