- system calls versus context switches
    in xv6 and similar OS designs, context switches only occur within kernel mode
        needs to change kernel data structures
        needs to access other programs' data (which can't be done by user code
            so they can't mess things up)
    before a context switch: always enter kernel mode
        could be via a system call
        could be via something else (timer interrupt, I/O, etc.)
    after a context switch: usually leave kernel mode (and go back to application/user code)

    some system calls might trigger a context switch
        but that's just part of what service the system call is requesting
        e.g. "read" --> might wait for input
            wait for input --> switch to another program until there is more input
    
- user and kernel stack during context switch
    in xv6 and similar OS designs
        we don't decide whether we're doing a context switch until we enter the kernel
        have GENERIC trap (exception/interrupt/system call/etc.) handling code
                that saves user registers
            xv6: user registers saved to kernel stack
                    and one of those registers points to user stack
            used if we just want to run som kernel code using the same registers
                e.g. system call handler will use eax, esp, etc.
                     need to remember what their old values were just like for a function call
                        saving registers
            happens to save user registers before we decide to do a context switch
                so no need to have extra code to handle them

        for the context switch in the kernel, still need to save the kernel registers
            xv6: swtch() function does this to the kernel stack
                (and reads the new ones from a different kernel stack)
                    typical usage of swtch(): called from the scheduler
                        called from function that just switches to the scheduler
                xv6 "context switch" --> probably switch to scheduler then from scheduler to other

    kernel stack:
        also where temporary variables used by code running in the kernel
        (and saved registers, return addresses, etc. 
            everything you learned about being on the stack for a normal program,
            but just for the code running in the kernel)


- scheduler types
    CPU throughput (= work per unit time): FCFS
        simplest scheduler
        doesn't context switch unless needed

    minimum average (per thread) turnaround/waiting time: SRTF
        (under assumptions that context switch overhead isn't significant)
        * can't actually implement SRTF: so use approximations like MLFQ

    ? maximizing I/O throughpout: something like SRTF
        make sure programs get to do their I/O operatoins quickly
        intutiion about SRTF: "race to sleep"

    fairness -- medium-term: CFS ("Completely Fair Scheduler")
            order seconds to tenths of seconds?? (depends on parameters)
                not like milliseconds/microseconds --- doesn't try to context switch that much
                            and tries to take into account more history
                not like minutes/hours --- doesn't want to penalize programs based on what
                            they were doing that long ago
        took into account recentish CPU usage ("virtual time")
        tried to run program most deprived of recent CPU usage

    fairness -- time until you run: round-robin or similar
        everyone runs periodically with roughly the same interval
        simpliest preemptive scheduler

    real-world deadlines: earliest deadline first or ????

    favoring certain users: priority

    other goals: ???
        new scheduler --- probably built from priority or lottery or similar

- reader/writer locks
    - many data structures:
        can safely read the data strucutre from many threads at once
        but only if no thread is trying to modify it
    - how can we guarentee the "but only if" part:
        (1) could use locks or reading OR writing
                but if we do this, then we have to take turns readin
                this is correct (won't violate the safety rule)
                but slow
    - reader/writer locks try to solve this taking turns problem by extending a lock
        to have two ways of being locked:
            - ReadLock:
                registers a thread as a reader
                wait until no writers [veruss normal lokc: wait until no other threads]
            - WriteLock:
                registers a thread as a writer
                wait until no readers+writers [same as normal lock]
        by having less restrictive conditoins for waiting, can do more things at once
        
    - when implementing reader/writer locks, need to make a decision about
        what what to do if it a reader and a writer is waiting --- who goes first?

- monitors
    - way of building data structures that wait for things to happen sometimes
    - lock + shared data + condition variables
    - lock:
        always take turns accessing shared data
        don't have to worry about being confused about the state of data
            (example: if shared data is for rwlock: not confused about
                whether there are 0 or 1 threads waiting to write)
    - shared data:
        can include whatever data we're manipulating (a queue of work to do)
        can include information about who's waiting so we can
            enforce conditoins about when to wait
  
    - condition variables:
        usage: one condition variable for everything we wait for

               vvvvvvvvvvvvvvvvvvvvvv-------based on the shared data
                                            means shared data has to include bookkeeping info
                                            needed to know when we wait
        while (we still need to wait)
            cond_wait(some_cv, ...);
        
        ... 
        if (some thread might no longer need to wait)
            cond_signal or cond_broadcast(some_cv)

        signal: guarnetee that only one thread cares
            e.g. if a signal for each item I add to a queue,
                at most one thread cares about each one, so it's safe
            e.g. for reader/writer lock, only one writer is allowed at a time,
                so I can signal a writer rather than broadcasting
        broadcast: otherwise

        when considering: remember what you signal might not run next
            e.g. typically wrong the signal when queue has a praticular size
            thread A:                           thread B:      thread C:    thread D:
                add to queue                                   wait         wait
                queue size = K --> signal C
                unlock                          add to queue
                                                queue size != K
                                                              
                                                               run 
                                                                            still waiting
            not what you'd expect if you thought that "while queue size is K and
                someone waiting, have them handle the item right away"

- spurious wakeups
        = cond_wait returns when nothing happens
    POSIX allows spurious wakeups just to make implementing CVs easier
        probably happen mostly when one CV is manipulated in parallel from two threads???

    justification why this is not big problem with POSIX's design
        usually you have to worry about (w/o spurious wakeups) some thread
        acquiring the lock in between when you wake from waiting and when you get the lock again

        example:
            thread A waiting in Consume for antoher item on queue
            thread A woken up via signal from thread B
            thread C gets lock 
            thread C dequeue
            thread C unlocks
            thread A gets lock
            thread A shouldn't dequeue (even though it was woken up)

        not every monitor-using code has this sort of scenario
        but it's usually a reason why you can't avoid the while loop anyways
            this scenario would not be possible with Hoare-scheduling ("hand off mutex" on signal)
                "hand off mutex" --> gaurnetee what is signalled gets lock next
                (but no implements that)

- VA -> PA
    - step one: split the virtual address into pieces

        [           virtual address                     ]
        [   virtual page number  ][   page offset       ]
                                   ^^^^^^^^^^^^^^^^^^^^^----
                                                        enough to identify which byte of a page
                                                        so if page size = 2^12 bytes --> 12 bits
        [ VPN pt  1 ][ VPN pt 2  ][    page offset      ]
         ^^^^^^^^^^^--------------------
                                    enough to index into the first-level page table
                                    which is an array of page table entries
                                    so if 1024 (2^10) entries in the first-level page table
                                    --> 10 bits
    - step two: lookup a page table entry in the first-level
        use the page table base pointer = beginning  of array of 1st level PTEs
        access that_array[VPN pt 1]
            that_array + VPN pt 1 * sizeof(1st level PTE)

        the 1st level PTE will contain:
            * a valid/present bit ---
                if not set, then stop!
            * metadata bits
                (readable, writeable, etc.)
                (stop if program not allowed to do access)
            * physical page number

    - step three: use first-level page table entry to lookup in second-level
        same as step two but:
            use the physical page number we found in step two converted
            to the byte address of the BEGINNING of that pgae as the page table base pointer
                concatenate with zeroes to fill in page offset
                or multiply by page size

                unit conversion: pages --> bytes
                 
            use VPN pt 2 instead of VPN pt 1

    - step four: use second-level page table entry to get the pA
        find physical page number from second-level entry
        concatenate with the page offset from the VA

        result --> final physical address

                            
- setting up page tables for exec()
    - what xv6 does, other OSes might differ

    - need to setup:
        - kernel part of the page tables
            every process has a kernel memory as part of its page table
                (set with metadata so it will only work in kernel mode)
            why? avoid changing page tables to let kernel handle exceptions/system calls/etc.

            * hard-coded list of page table entry ranges to setup
                generate page table entries with for loops

        - allocate memory for the program's code+data
            read from the executable information about:
                where should the program be loaded (and how much)
            setup that much space (basically with a for loop)
        - copy from disk into that allocated memory
            read from executable information about:
                where to load data from on disk

    - for setting individual page table entries
        in xv6: use the same kind of procedure we did for VA -> PA
            but stop when they figure out where the secon-level PTE would be
            and then change second-level PTE

            (and if they are stopped at the first-level PTE, they'll change that first)

- allocating physical pages and coordinating with the hardware
    xv6's strategy:
        learns on boot (hard-coded, real OS: some table in the HW) how much memory there is
        has a data structure that tracks which parts of that xvy is snot using
        uses that data structure to find memory it can use for page tables + programs

    telling the hardware about the memory:
        xv6 sets the page table base pointer

- shared bus / cache coherency (invalid/shared/modified)
    - shared bus: simplest way of connecting processors + memory together
        one common communicate line
        advantage: every processor gets to see all accesses to memory

    - to keep caches in sync want to maintain some invariants:
        - at most one cache has a modified copy of some data
            * to ensure this:
                if we have the modified copy:
                        then write it back before anyone tries to read/write it
                        we know if someone tries to read/write because they will have to ask
                            on the bus for a copy
                if we want to modify something: 
                        then tell every cache to get rid of any copy
        - before a cache forgets about a modified value, it writes it to memory

    - problem: we don't want to tell everyone every time we write
        usually they won't have a copy anyways so this is a waste

        solution: remember that we've already told everyone to get rid of their copy
            so we know we don't have tell them proactively
                "Modified" state

- false sharing
    cache coherency: gaurentees one cache at a time has modified data
        problem: data is kept in larger blocks in caches than your progrma access
    caches will take turns when accessing any part of the block
    --> lower performance if two threads manipulate different parts of the same cache block

    for example: two adjacent array elements often in the same cache block

    typical solution: keep the data further apart
    
- path of a system call in xv6
    [user functoin to invoke system call]
        "system call wrapper" [xv6: usys.S]
            make it so we can use a C-like (or other PL) interface
                even though a system call uses a special intructoin
            also handles setting up for the calling convention our OS has chosen:
                xv6 system call calling conventoin:
                    arguments are on the stack (like a normal functoin call)
                    eax says which system call to do

        --- kernel mode starts here ---
    [generic TRAP handling] [xv6: trapasm.S, traps.c]
        xv6's choice: smae code for all exception to handle
            saving registers, etc.
        saves user registers so we don't have to worry about the kernel
            overwriting those values
        saves what caused the trap for later

    [system call dispatcher] [xv6: syscall.c]
        based on the trap being a system call, figure out which
        system call implementatoin to run

        xv6:
            use saved eax value to lookup actual system call implementatoin

    [system call implementation] [xv6: sysproc.c, sysfile.c]
        actual code that implementation the particular service requested
            (reading, writing, killing a process, ....)
        needs to retrieve arguments
            xv6: reads from the user stack b/c convention says arguments are there
        and does system-call-specific operatoin

    then return, going backwards from bottom to top
        [system call dispatcher] stores result from system call implementatoin 
        [generic TRAP handling] restores user registers that were saved
        --- kernel mode ends here ---
        [wrapper function] returns to the functoin call the application made

- dup2(from, to)
    PCB->files[to] = PCB->files[from]
        now fd to is the same as fd from
    but close(from)
        PCB-> files[from] = NULL, so that won't change the "fd to"
    but dup2(somethign_else, from)
        PCB->files[from] = ..., so that won't change the "fd to"