- what's difficult? what's important?
    - i went through the schedule page and compard # of lectures on topic to 
        number of points on topic on the ifnal drafts

- system calls and exception handling --- how to does OS code work
    - the OS sets up "exception handlers" which are run by the processor
        - there's a different handler for each event that can trigger can exception
            (so the OS code can use what triggered it to decide what to do)
            - hardware runs exception handlers in response to:
                - I/O devices asking for attention
                - instructions programs are running that it can't handle itself
                - something is wrong in the hardware (e.g. RAM is broken)
        - if the OS code likes, it can do a context switch
            meaning, it saves the current program/thread
                and loads a different program/thread
            - can infer a context switch must have happened if we're running different program on 
                a core than before
    - timers
        - OSs want to prevent infinite loops from hogging a core
        - so hardware exposes a timer "device"
            (device like a keyboard device in how it talks to the CPU)
        - OS can set an "alarm" on this timer
            that will trigger an exception
            that runs the OS

    - are signal handlers context switches?
        - signal handlers run in an existing program/thread
            (kinda like a function call that's done by the OS instead of by the program)
        - if we're already running that thread, no context switch
            just the work for the "funny" function call
        - kill might not run signal handler immedaitely (so kill does not imply context switch
            even if in other process)

    - context switch includes any type of switching threads
        (regardless of whether in same process)


- counting proceses with fork, waitpid, etc.
    my recommendation: draw a diagram

        pid_t p1 = fork();
        if (p1 == 0) {
            pid_t p2 = fork();
            if (p2 == 0) {
                ...
            } else {
                exit(0);
            }
        } else {
            waitpid(p1, 0, NULL);
        }

        |
        |
        |fork\
        |     \
        |      \
        |      |fork\
        |      |     \
        |      |      \
        |      |      |
        |      |      |
        |wait /       |
        |
        |
        |

- caches 
        - cache lookup [on read]:
            - split addr [ tag ][index][offset]
            - we use the index to access a "set" (row) in the cache
            - that set contains K "blocks"
                each block is either:
                    invalid
                    valid, and we need to know where it came from
                        we know where it came from using the *tag
                        if it's valid, it contains
                                values from [tag][index]0000000 ---> [tag][index]1111111
                                (all possible values with that tag/index)
                each block needs to have metadata:
                    - a valid bit [to say if iinvalid or not]
                    - a tag [to say where it came from]
                    - sometimes:
                        - dirty bit [if we need to remember whether to write this later]
                        - replacement policy information
            - if we find a valid block whose tag matches, we have a hit
                we extract the patr of the block corresponding to the offset
            - otherwise:
                - we add a new block to the cache, replacing something if needed
                    - always add FULL blocks
                - return the correct that block
    - 32KB, 2-way
        means that we have 16KB of blocks per way
        but need more info to know size of each block

    - overall organization
        typically

        procesor --- L1 data cache          ------  shared L2 cache --- shared L3 cache [---- other cores]
                ---- L1 instruction cache   ---/
        with multiple levels, accesses that miss in L1, go to L2
            missin in L2 goes to L3
        goal: lower the time to bring values into the lower caches

    - multi-way
        - multiple blocks in a set
        - we need to check all of them to see if something's in the cache
        - we need to choose which one to replace when adding something

    - why are write policies useful
        - cache lookup [on write]
            - split addr [ tag ][index][offset]
            - we use the index to access a "set" (row) in the cache
            - that set contains K "blocks"
            - if we find a valid block whose tag matches, we have a hit
                - need to modify the data in the cache
                - OPTION****:
                    - "write-back" --- mark the data as "dirty" and update the next level/memory later
                        - fewer next level/memory accesses
                            especially if a value is modified a lot before it's done being used
                    - "write-through" --- update the next level/memory now
                        - less bookkeeping
                        - don't have to write back the value later on when it might be less convenient
            - if we have a miss:
                - OPTION:
                    - "write-allocate" --- add the new block to the cache, replacing somtehing if needed
                        - always add FULL blocks
                        - if adding, update value in cache
                        - and go to OPTION**** to decide what to do about next level
                            - good idea if we're going to modify/read this value soon ("locality")
                    - "write-no-allocate" --- do not add new block, just update next level/memory 
                            - good idea if we're not 1going to modify/read this value soon

- quiz for week 9, Q3 [cache]
    - 16 byte cache blocks, 256 sets
        4 offset bits, 8 index bits
    - array of 4-byte integers @ 0x10000000
        array[100], then array[X] accessed
            and access to array[X] evicted array[100]

        so array[100] and array[X] need to have same index bits

        solve: index bits of (array + 100 * sizeof(int)) = index of bits (array + X * sizeof(int))
    
        let say the index 44 
                array[100] = ????????  0100 0100 ?????
                array[X]   = ????????  0100 0100 ?????
                    choose some values for the ??s in array[X], and then solve array + X * sizeof(int) = value 
                        with ?s filled in


- TLBs and relationship to page tables
    - normal K-level page table lookup:
        1) take virtual address and divide into:
            a virtual page number, which is further divided into
                K parts
            an offset
        2) lookup part 1 of the virtual page number in the page table @ ptbr
        3) lookup part 2 of the virtual page number in a page table we find using previous lookup result
        4) lookup part 2 of the virtual page number in a page table we find using previous lookup result
        ...
        K+1) lookup part K of the virtual page number in a page table we find using previous lookup result
        K+2) compute the final physical address using physical page number from previous lookup result
            and and the offset from the original addres

    - TLBs on a hit, skip steps 2 through K+1
        1) take virtual address and divide
        1a) lookup virtual page number (whole thing) in TLB
        [on hit --- skip a lot of steps]
        K+2) compute the final physical address using physical page number from [TLB] lookup result
            and and the offset from the original addres

    - TLBs on a miss, add a step to fill in the TLB
        K+1-a) add virtual page number, physical page number + other page table entry info to TLB


- week 6 quiz, ... [page tables]
    - Q6
        - 2^16 byte pages
        - multi-level page tables w/ 8192 (2^13) entries-level
            - 13 bits to lookup an index in a table
        - 8-byte page table entries
        - 55-bit virtual address
            55-bits = 16-bit offset + K levels * 13 VPN bits per level
                K = 3
        - 0x00123456789ABC
            offset: 0x9ABC
            123454567
            [...0000 00][01 0010 0011 010][0 0101 0110 0111] = VPN parts

        - want to find address used for second-level lookup
            - was based on first-level result of physical page 0x54
                physical page 0x54's addresses are
                    0x540000 through 0x54ffff (all possible page offsets)
                        --> location of the second-level page table
                index in the second level page table is [01 0010 0011 010]
                lookup is 0x540000 + index * sizeof(entry)
                    0x540000 + [01 0010 0011 010] * 8

- threads versus processes
    - threads share memory
    - processes don't

- pthread_cond_broadcast
    - pthread_cond_wait adds thread to a list of waiters
    - other threads can detect that some thread needs to stop waiting
        example: if we know threads are waiting for a queue to be non-empty,
            we want to stop threads waiting when the queue becomes non-empty
    - cond_broadcast lets all waiting threads stop waiting
        - (there's also signal, which only does one
            but that's not always appropriate)
        - (always "safe" to use broadcast, because waking up additional threads
            shouldn'tmatter, since they always double-check whether they're ready
            when they run again)

- networking protocols/layers
    which protocols map to which layers
        [link] --- local network only
        [network] --- bridging between networks
            BUT not handling reliablity,
                identifying specific programs,
                ...
        [transport] ---
            handles reliablity,
                streams of data,
                sending to specific programs not just a machine
    structure of packets
        link layer frame that contains
            a network-layer packet that containts
                 transport-layer segment/datagram that contains
                    application-layer data

        the network-layter "routerS" will copy a packet out of one frame
            and into a new frame when recieving one network
                and sending on another


- why D-H versus "normal" asymmetric
    - we can use "normal" asymmetric to do what diffie-hellman accomplishes
    - main practical advantage is that it's faster to use temporary private values
        with D-H than with asymmetri schemes
- digital signatures and verification
    - signature verifies that someone with the private key chose to run the Sign operation
    - run the Verify operation to check it
- certificates
    - way of distributing public keys
        - if we trust X, we want to let X give us other public keys
        - certificates are a way of getting public keys "from" X
            but without commnuicating directly
        - instead, X generates a message with a signature we can verify that
            tells us about public key
        - someone who wants to learn that public key (like whoever it is for)
            sends that message from X they got earlier
- what can attacker can do with asymmetric v symmetric
    - asymmetric, we assume everyone can do the operations with the public key
        - need to beware that anyone can encrypt
        - anyone can verify signatures
        - no inherent problem with tihs as long as we don't make assumptions that because something 
            is encrypted it came from a specific place

- S2023, Q6 [encryption]

- pipeline latency and throughput
      <----------------- N ------>
    - FDEMW
       FDEMW
        FDEMW
          .,,,
   
   latency = <---> = end-to-end-time
             FDEMW

   throughput = # instructions / time
                (N-4) / N ~~ 1 cycle/instruciton [ideal case w/o stalling, etc.]
        
- time for pipeline stages
    cycle time = slowest stage + anything we added for pipelining (usually: new registers) time
    and when we're dividing a single-cycle processor into K pieces,
        those pieces tkae 1/Kth the time *at least* unless we made our single-cycle CPU poorly