- what's difficult? what's important? - i went through the schedule page and compard # of lectures on topic to number of points on topic on the ifnal drafts - system calls and exception handling --- how to does OS code work - the OS sets up "exception handlers" which are run by the processor - there's a different handler for each event that can trigger can exception (so the OS code can use what triggered it to decide what to do) - hardware runs exception handlers in response to: - I/O devices asking for attention - instructions programs are running that it can't handle itself - something is wrong in the hardware (e.g. RAM is broken) - if the OS code likes, it can do a context switch meaning, it saves the current program/thread and loads a different program/thread - can infer a context switch must have happened if we're running different program on a core than before - timers - OSs want to prevent infinite loops from hogging a core - so hardware exposes a timer "device" (device like a keyboard device in how it talks to the CPU) - OS can set an "alarm" on this timer that will trigger an exception that runs the OS - are signal handlers context switches? - signal handlers run in an existing program/thread (kinda like a function call that's done by the OS instead of by the program) - if we're already running that thread, no context switch just the work for the "funny" function call - kill might not run signal handler immedaitely (so kill does not imply context switch even if in other process) - context switch includes any type of switching threads (regardless of whether in same process) - counting proceses with fork, waitpid, etc. my recommendation: draw a diagram pid_t p1 = fork(); if (p1 == 0) { pid_t p2 = fork(); if (p2 == 0) { ... } else { exit(0); } } else { waitpid(p1, 0, NULL); } | | |fork\ | \ | \ | |fork\ | | \ | | \ | | | | | | |wait / | | | | - caches - cache lookup [on read]: - split addr [ tag ][index][offset] - we use the index to access a "set" (row) in the cache - that set contains K "blocks" each block is either: invalid valid, and we need to know where it came from we know where it came from using the *tag if it's valid, it contains values from [tag][index]0000000 ---> [tag][index]1111111 (all possible values with that tag/index) each block needs to have metadata: - a valid bit [to say if iinvalid or not] - a tag [to say where it came from] - sometimes: - dirty bit [if we need to remember whether to write this later] - replacement policy information - if we find a valid block whose tag matches, we have a hit we extract the patr of the block corresponding to the offset - otherwise: - we add a new block to the cache, replacing something if needed - always add FULL blocks - return the correct that block - 32KB, 2-way means that we have 16KB of blocks per way but need more info to know size of each block - overall organization typically procesor --- L1 data cache ------ shared L2 cache --- shared L3 cache [---- other cores] ---- L1 instruction cache ---/ with multiple levels, accesses that miss in L1, go to L2 missin in L2 goes to L3 goal: lower the time to bring values into the lower caches - multi-way - multiple blocks in a set - we need to check all of them to see if something's in the cache - we need to choose which one to replace when adding something - why are write policies useful - cache lookup [on write] - split addr [ tag ][index][offset] - we use the index to access a "set" (row) in the cache - that set contains K "blocks" - if we find a valid block whose tag matches, we have a hit - need to modify the data in the cache - OPTION****: - "write-back" --- mark the data as "dirty" and update the next level/memory later - fewer next level/memory accesses especially if a value is modified a lot before it's done being used - "write-through" --- update the next level/memory now - less bookkeeping - don't have to write back the value later on when it might be less convenient - if we have a miss: - OPTION: - "write-allocate" --- add the new block to the cache, replacing somtehing if needed - always add FULL blocks - if adding, update value in cache - and go to OPTION**** to decide what to do about next level - good idea if we're going to modify/read this value soon ("locality") - "write-no-allocate" --- do not add new block, just update next level/memory - good idea if we're not 1going to modify/read this value soon - quiz for week 9, Q3 [cache] - 16 byte cache blocks, 256 sets 4 offset bits, 8 index bits - array of 4-byte integers @ 0x10000000 array[100], then array[X] accessed and access to array[X] evicted array[100] so array[100] and array[X] need to have same index bits solve: index bits of (array + 100 * sizeof(int)) = index of bits (array + X * sizeof(int)) let say the index 44 array[100] = ???????? 0100 0100 ????? array[X] = ???????? 0100 0100 ????? choose some values for the ??s in array[X], and then solve array + X * sizeof(int) = value with ?s filled in - TLBs and relationship to page tables - normal K-level page table lookup: 1) take virtual address and divide into: a virtual page number, which is further divided into K parts an offset 2) lookup part 1 of the virtual page number in the page table @ ptbr 3) lookup part 2 of the virtual page number in a page table we find using previous lookup result 4) lookup part 2 of the virtual page number in a page table we find using previous lookup result ... K+1) lookup part K of the virtual page number in a page table we find using previous lookup result K+2) compute the final physical address using physical page number from previous lookup result and and the offset from the original addres - TLBs on a hit, skip steps 2 through K+1 1) take virtual address and divide 1a) lookup virtual page number (whole thing) in TLB [on hit --- skip a lot of steps] K+2) compute the final physical address using physical page number from [TLB] lookup result and and the offset from the original addres - TLBs on a miss, add a step to fill in the TLB K+1-a) add virtual page number, physical page number + other page table entry info to TLB - week 6 quiz, ... [page tables] - Q6 - 2^16 byte pages - multi-level page tables w/ 8192 (2^13) entries-level - 13 bits to lookup an index in a table - 8-byte page table entries - 55-bit virtual address 55-bits = 16-bit offset + K levels * 13 VPN bits per level K = 3 - 0x00123456789ABC offset: 0x9ABC 123454567 [...0000 00][01 0010 0011 010][0 0101 0110 0111] = VPN parts - want to find address used for second-level lookup - was based on first-level result of physical page 0x54 physical page 0x54's addresses are 0x540000 through 0x54ffff (all possible page offsets) --> location of the second-level page table index in the second level page table is [01 0010 0011 010] lookup is 0x540000 + index * sizeof(entry) 0x540000 + [01 0010 0011 010] * 8 - threads versus processes - threads share memory - processes don't - pthread_cond_broadcast - pthread_cond_wait adds thread to a list of waiters - other threads can detect that some thread needs to stop waiting example: if we know threads are waiting for a queue to be non-empty, we want to stop threads waiting when the queue becomes non-empty - cond_broadcast lets all waiting threads stop waiting - (there's also signal, which only does one but that's not always appropriate) - (always "safe" to use broadcast, because waking up additional threads shouldn'tmatter, since they always double-check whether they're ready when they run again) - networking protocols/layers which protocols map to which layers [link] --- local network only [network] --- bridging between networks BUT not handling reliablity, identifying specific programs, ... [transport] --- handles reliablity, streams of data, sending to specific programs not just a machine structure of packets link layer frame that contains a network-layer packet that containts transport-layer segment/datagram that contains application-layer data the network-layter "routerS" will copy a packet out of one frame and into a new frame when recieving one network and sending on another - why D-H versus "normal" asymmetric - we can use "normal" asymmetric to do what diffie-hellman accomplishes - main practical advantage is that it's faster to use temporary private values with D-H than with asymmetri schemes - digital signatures and verification - signature verifies that someone with the private key chose to run the Sign operation - run the Verify operation to check it - certificates - way of distributing public keys - if we trust X, we want to let X give us other public keys - certificates are a way of getting public keys "from" X but without commnuicating directly - instead, X generates a message with a signature we can verify that tells us about public key - someone who wants to learn that public key (like whoever it is for) sends that message from X they got earlier - what can attacker can do with asymmetric v symmetric - asymmetric, we assume everyone can do the operations with the public key - need to beware that anyone can encrypt - anyone can verify signatures - no inherent problem with tihs as long as we don't make assumptions that because something is encrypted it came from a specific place - S2023, Q6 [encryption] - pipeline latency and throughput <----------------- N ------> - FDEMW FDEMW FDEMW .,,, latency = <---> = end-to-end-time FDEMW throughput = # instructions / time (N-4) / N ~~ 1 cycle/instruciton [ideal case w/o stalling, etc.] - time for pipeline stages cycle time = slowest stage + anything we added for pipelining (usually: new registers) time and when we're dividing a single-cycle processor into K pieces, those pieces tkae 1/Kth the time *at least* unless we made our single-cycle CPU poorly