- S2018 E1 Q8 ~~ LEA int array[4]; int *y = array; <-- &array[0] y += 2; <-- advance 2 elements in array -- &array[2] long **x; <-- "pointer to pointer to long" x += **x; add 8 times **x to the address in x ^----- sizeof(long*) pretending we're advancing in an array of long *s that this points to where x is stored %r8 mov (%r8), %rax # %rax = *x mov (%rax), %rbx # %rbx = **x <-- how many array elements (that are ptrs) to advance %r8 by lea (%r8,%rbx,8), %r8 # %r8 = %r8 + %rbx * 8 this is basically what option A some wrong options: option B: leaq (,%r8,8), %rax # %rax = x * 8 <-- doesn't make sense # multiplying the wrong thing by 8, then carried through.... option C: mov (%r8), %rax mov (%rax), %rax # **x extracted --- good! lea (,%rax,8), %r8 # x = **x * 8 <--- no += option D: mov (%r8), %rax # *x lea (%r8, %rax, 8), %rax # temp <- x + *x*8 <-- wrong thing mult by 8? - F2022 Week 2 Q4/5 ~~ relocation tables upperstr.c: void upperstr(char *str) {...} printbig.c: void printbig(char *str) { upperstr(str); puts(str); } in the object files, we need: SYMBOL TABLE ENTRIES ~ way to find upperstr in upperstr.o ~ way to find printbig in printbig.o RELOCATION TABLE ENTRIES ~ way to find where to put the address of upperstr in printbig's machine code ~ way to find where to put the address of puts in printbig's machine code - F2022 Week 3 Q1 ~~ bitwise (((x & 0xFF0F) >> 2) & 0xFF0) == (((x & 0xFF0) >> 2) & 0xFF0F) x = abcd efgh ijkl mnop left-hand side: & 0xFF0F: abcd efgh 0000 mnop >> 2: 00ab cdef gh00 00mn & 0xFF0 0000 cdef gh00 0000 right-hand side: & 0xFF0: 0000 efgh ijkl 0000 >> 2: 0000 00ef ghij kl00 & 0xFF0F: 0000 00ef 0000 kl00 VERSUS: 0000 cdef gh00 0000 ^^ ^^ ^^ to test if cd, gh, kl are zero we can use abcd efgh ijkl mnop ( x & 0011 0011 0011 0000 ) == 0 ^^^^^^^^^^^^^^^^^^^---- Y question claims that tihs is the same as (x & Y) == 0 for some Y - Y86 machine code fields general format: first byte: upper bits: primary opcode ~~ "row of the table" lower bits: if needed, secondary opcode for similar instructoins like: variants of jmp/jle/etc. rrmov/cmovle/etc., arithmetic second byte (if relevant to instruction): register indices up to two register indices that are from the assembly from the instruction table mapping names to indices later bytes (if relevant to instruction): [starts at third byte if register numbers, and the second byte otherwise] 8-byte constant: target address of jump or call (where to go to) displacement for a rmmov or mrmov immediate value for an irmov constant stoed in little endian, meaning the the byte w/ the lowest address is the least sig part sec opcode v /---------------------\----- 8-byte constant, written w/ lowest address on left for example: jmp 0x123 ---> 70 23 01 00 00 00 00 00 00 as sequence of bytes ^ |primary opcode ~ pipelining ~~ hazard handling ~ hazards ~ we make our pipeline and find out it doesn't run instructions correctly because it doesn't handle dependencies automatically when we were running parts of instructions in different orders (for example: ~ might write a register after it would be read in the pipeline -- "data hazard" ~ might compute the address of the next instruction after we need to fetch it -- "control hazard" ) ~ strategies we learned ~ stalling --- get the operatoins back in order by inserting nop (no-operations) between them delay the thing that pipeling made happen too early relative to other instruction's work e.g. wait to read register until afte it's written e.g. wait to run instruction after conditoinal jump until after the jump is computed ~ forwarding --- for data hazards only ~~ get the value after it's computed but before it's stored solves a amjority of data hazards BUT requires that the value be computed before we need it in the later instruction (which won't always be true) ~ can combine with stalling: instead of stalling to wait for a register to be written stall to wait for a value to be computed so we can forward it ~ most commonly replaced "wrong" value we just read from register file (but we could do forwarding at other times, too) ~ what forwarding is possible ~ we need to have the value computed already ~ we don't want adding the forwarding to extend our critical path much TYPICALLY we forward from the "end" of a cycle to the "end" of a cycle OR from the "beginning" of a cycle to the "beginning" of a cycle b/c this won't add much to what needs to happen in a cycle Let's say we try to forward from "end" of Memory stage to "beginning" of Execute tihs would not work well, because now our cycle needs to do: beginning of memory, middle of memory, beginning of execute, middle of execute, end of execute --> much longer cycle time needed to accomodate this --> kinda defeates having separate pipeline stages for execute + memory - F2022 8a ~~ data cache misses data cache misses for A+B with 3-way cache w/ 16 sets, 8-byte blocks (blocks fit two elements of A, B) A+B are arrays of 4-byte ints for (int ii = 0; ii < 1024; ii += 16) { for (int j = 0; j < 1024; j += 1) { <--- 1 miss for B[j] for every other iteration of this loop for (int i = ii; i < ii + 16; i += 1) { result += A[i] * B[j]; <-- 1 miss for every other A[i] when j = 0 } } } vvvvvvvvvvvvvvvv---- can store all of these in one the ways of 8 sets of the cache A[0]A[1]...A[15] } A[0]A[1]...A[15] } 1024 times ~~ for each j loop: 16 * .5 of a miss for A[i] --> 8.5 misses ~~ for iteration of j loop .5 of miss for B[j] 1024 / 16 iterations where j = 0 1024 / 16 * 8 misses for A[i]s 1024 / 16 * 1024 iterations of the j loop in total 1024 / 16 * 1024 * .5 misses for B[j]s - dirty bits ~ for writes to caches if we are caching a value, we have two options: write-through: always update the next level write-back: defer the update until later ~ hope: maybe we'll change it again and we'll avoid sending extra updates to the next level maybe we'll never actually need to access it other than through tihs cache with write-back, we need to actually do the update eventually so it doesn't get lost one strategy we could try is whenever we evict something update the next level but this would be really slow b/c we evict lots of things that haven't changed would do lots of extra writes INSTEAD: we store a _dirty bit_ that tells us if something needs to be updated in the next level ~ we set this when we modify a block in our cache w/o updating hte next level - aliasing ~ "two names" for the same value names most commonly: A[x] and B[y] means that the compiler can't do some optimizations that might be "obvious" A[x] += B[0] A[x] += B[1] A[x] += B[2] A[x] += B[3] you might like the compler to put A[x] in a register BUT the compiuler needs to handle if A[x] is another name for B[2] and we should get the same result as: B[2] += B[0] B[2] += B[1] B[2] += B[2] B[2] += B[3] also other optimizations (e.g. vectorization, cache blocking, etc.) where this would be a concern - system calls ~ processor reserves some operatoins for the OS ~ generally: thigs with shared resources I/O devices ~ OS managed data for things share dbetween users (e.g. I shouldn't be able to access your program's data on portal and vice-versa) ~ but many legitimate things programs want to do with shared resources ~ print to the screen ~ stop another program ~ ... ~ interface to run the OS via its exception handlers called "system calls" that programs trigger when they want the OS to help them ~ the handler for exceptions in general runs in kernel mode so it has access to resreved operatoins/resources ~ related to but different from exceptions NOT triggered by program deliberately ~ example: OS might switch from your program even though your program didn't reques tit not a system call, since your program didn't attempt to ask for this behavior ~ e.g. triggere dby timer or I/O coming in that the OS got to handle despite whatever your program was doing