- pointers and assembly Q1 on S2018 long *x --- %r8 long y --- %r9 x[y+1] += y ptr-vvv vvvv ---- integer *(x + (y+1)) += y ^^^^^^^^^ memory location --- *( R8 + 8 x (R9 + 1) ) = *( R8 + 8 x R9 + 8 ) = *(8 + R8 + R9 x 8 ) offset(base,index,scale) --> *(offset + base + index x scale) 8(%r8,%r9,8) optoin A: add %r9, 8(%r8,%r9,8) address in x and add sizeof(*x) * (y+1) - pointers and array Q4 on F2018 Exam 1G 0x10 0x18 0x20 0x28 0x030 char *arr[5] = {"We", "live", "in", "a", "society"}; ^^^^ ^^^^^ +2 +3 +4 arr+0 arr+1 NB: arr+0 is a char** char *temp1 = "We"; char *temp2 = "live"; .... char *array[5] = {temp1, temp2, temp3, ...}; which were true: A: *(*(arr+4)+3) ?= arr[2][0] *(*(&"society")+3) ?= (arr[2])[0] = ("in")[0] = 'i' ^^^^^^^^^^----- 0x30 *("society"+3) *("iety") 'i' B: *((*arr)+1) ?= *(arr[1] + 3) *arr --> *(arr+0) --> arr[0] *(("We")+1) *("live" + 3) *("e") *("e") 'e' 'e' - AMAT average memory access time = (portion of time we have a hit) * (amount of time for a hit) + (portion of time we have a miss) * (amount of time for a miss) = (hit rate) * (hit time) + (miss rate = 1 - hit rate) * (miss time) = (hit rate) * (hit time) + vvvvvvvvvvvvvv--------- additional time b/c no hit (1 - hit rate) * (hit time + miss penalty) ^^^^^^^^--- try to get a hit = hit time + (1 - hit rate) * miss penalty = hit time + miss rate * miss penalty - cache effect on stalling if cache is the only cause of stalling, then hits are "free" AMAT = hit time --- no stalling average cycles of stalling = (AMAT - hit time) = miss rate * miss penalty ^^^^^^^^^^^ - cache blocking question from last exam for (i = 0, 2, 4, 6) for (j = 0, 2, 4, 6) { **** do something with M[row j to j+1 column i to i+1], K[row i to i+1 column j to j+1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2 cache blocks 2 cache blocks } } given: we had a fully-associative cache which can store 8 array elements with 2 byte blocks, each array element is 1 byte, and pairs of odd+even --> same cache block 4 * 4 * 4 = 64 maximum = to minimum because when we access M[x] once for a matrix, we don't access that block again - exceptions NOT try/catch (in this class) way for the processor to run the OS when the OS needs to run [fault] because a user program did something "bad" [trap] because a user program requested the OS to run ("system call") [interrupt] because some I/O device had something happen (e.g. keypress, timer) [abort] because something is terribly wrong with the hardware and the OS should try to recover - context switching when the OS gets to run, when the OS finsihes running, it can be running something else example: I'm running an infinte loop on portal.cs.virginia.edu the OS decides from a timer going off and trigger an interrupt that I don't deserve to run anymore to implement this: OS needs to save what was in CPU registers, page table base register so it can keep running my infinite loop later needs to restore the CPu registers, etc. of the program that it's running instead - user versus kernel mode we want to share machines between different users can't trust user programs to not do things they aren't supposed to access the disk to read files for othre users not have bugs that corrupt other program's memory hardware support: 1 bit --- user or kernel mode in kernel mode: more things work special instructions like: "change the exception table" "chnage the page table base register" "access I/o device" ... work page table entries marked kernel-mode only work in user mode: all of things cause an exception ("run the OS and hvae the OS figure it out") instead with exceptions: exception handlers always run in kernel mode returning from exception handler typically swiched back to user mode if want to do somethig that needs a priv'd instruction e.g. changing the page table base register or accessing I/o device need to cause an exception to do this - address splitting for caches [ physical addresses ] [ tag ][index ][ offset ] tag: different things stored in the same set, which one do we have? index: which set do I use to lookup/store this value? offset: where in a multi-byte block is this value offset bits = enough bits to idenitfy the byte K-byte blocks --> log_2(K) bits index bits = enough bits to identify the set S sets --> log_2(S) bits tag = everything left over after offset + index for TLB [ virtual page number ] [ tag ][ index ] tag: different things stored in the same set, which one do we have? index: which set do I use to lookup/store this value? no offset --- because I only store one thing (a page table entry) per block index bits = enough bits to identify the set S sets --> log_2(S) bits for virtual memory [ physical address ] [ virtual address ] [ physical page # ][ page offset ] [ virtual page #][ page offset ] virtual page #: which index of a page-sized chunk into the process's memory (the memory a user program thinks it has) page offset: where in that chunk physiical page #: which index of page-sized chunks in the hardware's memory page offset = enough bits to identify a byte in page N-byte pages --> log_2(N) bits virt/phys page # = everything left over single-level page table --- one entry for each virtual page # E page table entries in a single-level table, virtual page # is log_2(E) bits multi-level page table [ virtual address ] [ virt page # ][ page offset ] [ part 1 ][part 2][... ] ^^^^^^^^^^ index in first level-table ^^^^^^^ index in sceond-level table E_1 entries in the first-level table, then part 1 is log_2(E_1) bits page table entry [valid bit][physical page #][other metadata bits] ^^^^^^^^^^^^^^^^^ 0x123 PTE size = 4 bytes/PTE next VPN part = 0x2 PTEs into the table start of table + PTE size * next VPN part page # 0x123 (pages after beginning of memory) + 4 bytes/PTE * 2 PTEs into the table page # 0x123 * N bytes/page + 4 bytes/PTE * 2 PTEs from beg of table 0x123 * N bytes + 8 bytes - page table sizes relationship with number of VPN bits we use determines number of entries also have page table entry size --- usually we're told minimum PTE size = enough to hold valid bit + physical page # - virtual addresses something Q3 on S2018 256-byte pages, two-entry DM TLB, [ virtual page # ][page offset] --- 8 ---- ^------ 256 = 2^8 bytes [ TLB ][TLB ] [ tag [[idx ] -- 1-- ^---- which of two entries? index 0 index 1 VPN being accessed v --- --- M 0x0FF 0x0 0x0 M 0xFFF 0x0 0xF 0xF M 0x100 0x0 0x1 0x1 H 0x101 0x0 0x1 0x1 M 0xFFE 0x0 0xF 0xF H 0xFF6 0x0 0xF H 0x0FF 0x0 0xF - profilers - time your code to find out where it is slow i.e. where does it spends it time - don't spend a lot of time optimizing something which takes 5% of your time - two designs: - sampling-based: stop the program periodically, record where it is tool reports distribution of where it was stopped - instrumentaiton-based design: modify code to have counters to count/time how often and long each function/instr runs - either case: - what code was running - what code called the code that was running (important if, say, the code that was running was a library functoin) - cache misses from previous exam cache 512-byte direct mapped, 16-byte blocks array elements 1 byte for (j <- 0 to 256) for (i <- 0 to 3) vvvvvvvvvvvvvvvvvv --- same issue for misses array[i * 256] += array[i * 256 + j] array[16] and array[512 + 16] ^^^^^^^^^^^^^^ miss every time array[0] <--> array[0 + 512] array[0 * 256] <--> array[2 * 256] evicts array[0], array[1], array[2], ... array[1 * 256] <--> array[256 + 512] array[256 + 2 * 256] <--> array[3 * 256] BUT array[i * 256] and array[i * 256 + j] are in th same block if j < 16