- reference sheet > there will be one, not finalized the contents yet > will have diagram of cache organizatoin + VM lookup + pipeline at least - processes and threads ~ process : "virtual computer" that your program runs in: ~ its own memory ("address space") -- implemented typically via virtual memory ~ its own cores ("threads") -- implemented by context switching + assigning to one or more "real" cores ~ its own I/O devices "files" ~ threads usually we're talking about multiple threads in a single process, and if so, they share: ~ address space + files since we need a call stack to run things, each thread has its own stack and if tehre are multiple threads sharing the address space, each stack needs to have different virtual address (implies that one thread can use a pointer to access things on another thread's stack) - saving/restoring state in context switches, CPU v OS jobs > context switch can be: between two threads in one process between two threads in different processes ~ if we have one core and want to run multiple threads on it, we can't load the values for each thread simulatenously ~ need to have the threads take turns ~ code that does the context switch needs to save: ~ register values ~ condition codes ~ program counter for old thread + restore them for the new thread ~ AND if we're switching from one process to another also: need to change the page table base register ~ the OS can do all this in software anytime it's running, BUT when we run the OS from an exception, something needs to save what program counter was before the OS started running (or else the OS won't know it) e.g. a timer goes off in the middle of a loop --- need to where in the loop we stopped processor's logic for exception handling is definitely going to save the previous PC and then the OS will probably copy that somewhere if it does a context switch processor could also help the OS by saving additional stuff somewhere to make writing exceptions handlers more convenient, but this isn't necessary for correctness ~ generally, if the OS runs, it can decide to context switch if it wants to (usually based on policies about who gets to use the CPU) ~ definitely need the OS's help to switch from one process to another (b/c user mode code can't change the page table base register) - exceptions v signals ~ exceptions: hardware feature that runs a handler in the OS for certain events types of exceptoins: interrupts, traps, system calls, ... ~ certain events: program doing something (deliberately [system calls] /accidentally) I/O devices doing things timers ... ~ handler in some OS-managed memory, usually at addresses marked not accessible in user mode ~ exception handler runs in the middle of whatever's happening ~ signals: an OS feature that runs a handler in a program for certain events ~ certain events: program doing something (deliberately/accidentally) other program deliberately triggering a signal ... ~ often signals will be triggered by an exception (meaning the OS's exception handler will deal with the event by causing a signal) ~ handler is a "normal" function in the program ~ signal handler runs in the middle of one of the program's threads ~ signal handler won't run until we switch to (or are otherwise running) that other program - pipes ~ open files (on Unix-like systems): let us read and/or write ~ regular files: read data stored somewhere and/or write data to store some where ~ pipes: ~ we get a pair of connected open files "read end" and "write end" ~ we can write to "write end" open file ~ then later on read that data from the "read end" open file ~ application: we can start a program with an open file that is, e.g., "write end" of a pipe then if we read fro the "read end" open file for that pipe, we'll get whatever the program wrote ~ since pipes are an open file, we can mostly use them interchangable with open files that would represent a file on disk, terminal, etc. (they have the same interface) ~ writing to "read end" or reading from "write end" will return an error, like reading froma file open for writing or vice-versa - virtual v physical addresses ~ virtual addresses: when we have a page table active, all the addreses we use in assembly ~ code addreses ~ data addresses ~ physical addreses: are the "real" hardware addresses that the memory/cache understands ~ produced from virtual addresses using page tables ~ page table base register contains a physical address for finding the page tables ~ mapping from virtual to physical addresses is chosen by the OS (b/c the OS chooses what's in the page table(s) it tells the processor about) - mapping kernel memory to physical addresses ~ most OSes reserve some virtual addresses for OS use ~ so they don't need to change page tables when running exception handlers ~ avoiding this was a mitigation for Meltdown if you didn't have help from the HW ~ but we can't allow programs to edit the OS's memory, so we'll mark them in the page table as not accessible in user mode ~ and the processor will enforce this by triggering an exception if user-mode code tries to access them - splitting addresses into pieces for VM lookup ~ we divide all the possible {virtual, physical} addresses into fixed-size pages where the page size is a power of two = 2^K bytes ~ this means that if we take an address X, then X / 2^K = "page number" = all but the lower K bits of X X mod 2^K = "page offset" = the lower K bits of X virtual addresses [ virtual page number ][page offset] (possibly pointers have more bits than virtual addresses) physical addresses [ physical page number ][page offset] ~ page tables let us lookup a virtual page number and get a physical page number + some metadata ~ most simply, we'd just use the virtual page number as an array index into a single page table BUT this doesn't work when we have alot of possible virtual page numbers SO instead, we use multi-level page tables: virtual addresses [ virtual page number ][page offset] [ VPN pt 1 ][ VPN pt 2 ][ VPN pt 3 ][page offset] divide VPN into pieces use the first piece as an array index for the "first-level" page table (pointed to by the page table base register) use the sceond piece as an array index for the second-level page table (pointed to by what we found from the first array lookup (first-level) ) [only get this far if first-level entry marked as valid] use the third piece as an array index for the second-level page table (pointed to by what we found from the sceond array lookup (second-level) ) [only get this far if sceond-level entry marked as valid] use the result of third array lookup to get the final physical page number + metadata [only get this far third-level entry marked as valid] gives us a "fat" tree data structure: one root (first-level page table), which branches to potentially many second-levels, each of which branches to potentially many third-levels and so on save space by not storing second+-level page tables that would be all invalid - splitting addresses into pieces for cache lookup ~ caches usually only deal with physical addresses [we've already done all the page table stuff] [ tag ][ index ][ offset ] index = "which set (row) of the cache" ~ we'll always have a power-of two number of sets 2^I ~ --> I index bits offset = "which byte of a block that's stored in one of these sets" ~ we'll always have a power-of-two block size 2^B ~ --> B offset bits tag = everything left over, so we can tell where a value stored in a cache block came from is the data I found in a set actually for the address I care about (and not some other address with the same index) way 0 way 1 set index 0 {valid,tag,data,?info for write policy} {valid,tag,data,?info for write policy} | {repl. info?} 1 {valid,tag,data,?info for write policy} {valid,tag,data,?info for write policy} | {repl. info?} 2 {valid,tag,data,?info for write policy} {valid,tag,data,?info for write policy} | {repl. info?} 3 {valid,tag,data,?info for write policy} {valid,tag,data,?info for write policy} | {repl. info?} ... ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ what's actually stored repl. info? = info to implement replacement policy, if needed ?info for write policy = info to implement write policies, if needed - write policies for caches ~ two decisions: if I lookup something in the cache for a write, and it's a miss, what do I do? A. Add it to the cache ---> write-allocate B. Don't ---> write-no-allocate if I modify something in the cache for a write, what do I do for the next level? A. Write it there immediately --> write-through B. Write it there when I evict --> write-back for write-back, we need to track a "dirty" bit to remember that we have something to write (bad alternative: we could always write everything that's evcited just in case, but that would be very slow) - cryptography: when to use public v private keys ~ private keys are the ones you shouldn't share ---> more powerful ~ encryption: the private key lets you decrypt more dangerous for ranodm people to be able to decrypt than encrypt ~ signatures: the private key lets you sign more dangerous for ranodm people to be able to sign than verify ~ something is probably wrong if you're sharing private keys widely but we expect to share public keys widely - TLS handshake ~ TLS design: 1. client and server agree on temporary symmetric keys ~ use key agreement protocol --- client + server generate secret value + a key share for that value ~ combine their sceret with the other's key share (which is public) to produce a common shared value that others cannot know ~ use that value to make keys for symmetric encryption and message authenticatoin codes 2. client verifies that it's talking to the right server ~ server signs its key share (what the client used to get symmetric keys) with a private key that corresponds to a public key the client will get from a trusted source ~ what trusted source? ~ the server presents a certificate showing that some trusted-third-party ("certificate authority") attests that this public key really comes from the server 3. client and server communicate the rest of th etime with those symmetric keys ~ b/c symmetric cryptography is much faster + more flexible than asymmetric - routers in networking ~ a router is a machine connnected to mulitple networks that forwards data from one to another ~ typically it has a "routing table", which tells it what range of addresses should be sent to what network "addresses" = network-layer addreses (on the Intenet = IP addresses) [the messages will also have a link-layer address (typically MAC address), but that's only within each local network] [the messages will often also have a port number, but that's only for the sending/receiving machine] - different network protocols ~ layers: application: HTTP, TLS, ... transport: TCP -- streams of reliable bytes; UDP -- unreliable datagrams network: IP (Internet Protocol) "inter-network" -- sending messages between networks (includes routing) link: Wi-Fi, Ethernet, Bluetooth, ... -- sending message within a local network ~ DHCP, IP, TLS, UDP, TCP --- what are these ~ TLS: provides "secure channels", typically on top of TCP ("application" layer) ~ DHCP: protocol for getting IP addresses on a local network (which has to do some special things b/c it has to work when you don't have an IP address yet) - forwarding/stalling ~ forwarding: when we can't read the right value from a register because it's not written yet in a pipelined/etc. processor BUT the value is computed, we can add a new wire to the processor to "forward" the value to where it's needed need: the value is computed before we actually need it, just need to add some path to send it (since it's not going to be sent via the regsiter file because we write too late and/or read too early) ~ stalling: when we don't have a value ready, we can delay processing instructoins until it's ready generally in our pipelined processor most commonly: repeat the decode stag euntil we can forward a value that is not computed yet (typically from the memory stage) - Spectre/Meltdown: what bits are exposed through side-channel? if we're using the cache PRIME+PROBE technique, we learn what cache set was evicted how does the cache choose what set is evicted? using the index bits of the physical address that was accessed