- logistics: what's on the exam
    - everything covered up until now in lecture
    - some questions on virtual memory
    - definitely nothing on page cache

- partial read()s and write()s
    - POSIX API: read(file descriptor, buffer, amount)
        reads UP TO amount bytes
        sometimes don't have amount bytes available:
            ~ just make the read fail
            ~ just return what you have <---- POSIX's choice
            ~ wait until you have everything
        POSIX strategy (for file descirptors like pipes)
            - wait until I have at least one byte
            - then return whatever I have
    - POSIX API: write(file descripor, buffer,amount)
        writes UP TO amount bytes
        sometimes can't actually send amount bytes right away:
        POSIX strategy (for file descriptors like pipes)
            ~ wait until I can write at least one byte
            ~ write what I can now
            ~ then return how much I've written

- kernel versus user mode (what happens in each)
    - kernel mode is a processor feature
    - processor only allows certain things in kernel mode:
        ~ most I/O (e.g. x86's in/out instructions; special memory locations; etc.)
        ~ setting up exception handling
        ~ changing the page table base register
        ~ accessing virtual pages marked as kernel-mode-only in the page tables
    - processor supports switching to kernel mode 
        ONLY via exception/trap/interrupt/etc.
    - convention in Unix-like OSes:
        ~ handling system calls, scheduler, I/O runs in kernel mode
        ~ ***all*** user code runs in user mode

- context switching
    - context: the stuff we need to run on the CPU for a thread
        ~ minimally: registers + PC
    - context switching operation:
           save registers (including stack pointer) + PC of old thread
           restore registers + PC of new thread
    - xv6: swtch() function --- only deals with kernel registers
        save registers on the old kernel stack
        restore regsister from the new kernel stack
    - xv6 only switches between programs in kernel mode
        - this means that user registers are saved because of an exception
          and NOT because of a context switch

    - context switching between processes
        also need to save/restore the address space
        save old page table base register
            [xv6: it's already stored in the struct proc]
        restore new page table base register
            [xv6: load from struct proc into the register]
            - xv6: separate step from the kernel context switch
                (first swtch() to change kernel stacks,
                then a different function switchuvm() to change page tables)
        

- differences between scheduling algorithms
    - preemptive versus non-preemptive
        preemptive means when a program becomes ready, we run it right away
        non-preemptive means only stop a program when it gives up the CPU
    - scheduling goals:
        - "fair" sharing
            CFS ("Completely Fair Scheduler") --- medium-term equal CPU usage
                    medium-term ~~ "credit" for not needing the CPU
                (or proportional to weights)
            RR --- everyone gets a turn (if runnable)
                    short-term ~~ can't use the CPU, you lose that time
            lottery --- proportional to weights, random choice
                    short-term ~~ can't use the CPU, you lose that time
        - throughput --- minimal overhead (context switch overhead + scheduling decisoin time)
            FCFS (first-come first-served) / FIFO
                just have a list, run the next thing until completion
        - mean turnaround/wait time optimizing
                --- interactivity-optimizing
                --- try to get I/O to happen sooner
            SJF/SRTF --- run the shortest CPU burst (to do I/O faster)
            MLFQ --- try to do SJF/SRTF based on measuring CPU bursts
                because we can't actually know CPU bursts in advancek
        - meeting deadlines
            Earliest Deadline First
        - other goals
            Priority

- process versus thread resources
    - thread ~~ what we need to run on a CPU
       - program counter
       - CPU registers (x86: EAX, EBX, ...)
       ---
       - stack (x86: whatever ESP points to)
    - process
        - process ID
        - current directory
        - one or more threads
        - file descriptors
            shared between all threads
        - address space (heap, code, global variables, etc.)
            shared between all threads
            contains multiple stacks
                (threads can get pointers to other thread's stacks)


- order of operatoins with multiple threads -- Q5 on last semester's exam
        int x = 50;  /* <---- shared between the threads! */
        void child(void *ignored_argument) {
          x = x + 10;  
          printf("[child] %d ", x);
          x = x + 10;
        }    
        int main(void) {
          pthread_t child_thread;
          x = 100;  
          pthread_create(&child_thread, NULL, child, NULL);
          x = x - 1; 
          pthread_join(child_thread, NULL);
          printf("[main] %d", x);
          return 0;
        }
assuming no reordering, etc.

	    --- child is joined before main outputs --- a. [main] 110 [child] 119 
		
	    *a. [child] 110 [main] 109 
            child: x = x (100) + 10
            main: load xs for x -1
            child: print
            child: x = x (110) + 10
            main: computes x - 1 = 109
            main: store x
            main: print
            

	    --- no way of adding 10 or subtracting 1 to get 120 --- a. [child] 109 [main] 120
		we know value is 109 when child prints and only + 10 and -1 can run after that

	    --- x starts at 100 before child runs --- eliminates: a. [child] 60 [main] 99 

	    *a. [child] 110 [main] 119 
		child: x = x (100) + 10
	        child: prints
                child: x = x (110) + 10
                main: x = x (120) - 1
                main: prints	

	    *a. [child] 110 [main] 99 
                main: read x for x = x - 1
                    question says loads/stores aren't reordered
                    but we have to load, then subtract, then store
		child: x = x (100) + 10
	        child: prints
                child: x = x (110) + 10
                main: compute x - 1 fromprevious read value
                main: store x -1 in x
                main: print
                
	    ---- child is joined before main outputs --- a. [main] 110 [child] 109 

	    *a. [child] 109 [main] 119 
                main: x = x - 1 --> 99
                child: x = x + 10 --> 109
                child: print
                child: x = x + 10 --> 119
                main: print

- cache coherency ~~ quiz question
        A: invalid / B : invalid (b/c question said so --- not cached)
                invalid = not cached
    Q: how many updates/invalidations for A (which only redas)
    Processor A reads X
        A: SHARED / B: Invalid
    Processor B reads X
        A: Shared / B: Sharedk
    Processor B WRITEs X
        A: INVALID / B: Modified (cached copy that's not up-to-date with memory)
    Processor B WRITEs X
        A: Invalid / B: Modified (already Modified, no work -- only B has a copy)
    Processor B writes X
        A: Invalid / B: Modified (already Modified, no work -- only B has a copy)
    Processor B writes X
        A: Invalid / B: Modified (already Modified, no work -- only B has a copy)
    Processor A reads X
        A: SHARED / B: Shared or Invalid

- virtual to physical address translation
    take our virtual address:
        split it into Virtual Page Number and Page Offset
            how big the page offset is? # bits = log_2(size of page in bytes)
        split Virtual Page Number into parts for each level
            most significant bits --> first-level
            least significant bits --> last-level
        do an array lookup based on split Virtual Page Number at each level
            each page table is an array of page table entries
            like arrays in C:
                if array is at address 0x50000
                    then element 0 is at address 0x50000
                    and  element 1 is at address 0x50000 + sizeof(array element)
                    and  element 2 is at address 0x50000 + sizeof(array element) * 2
                    ...
            first-level page table entry contains the PHYSICAL page number of the second-level
                    physical -- because we don't want to do recursive lookups for virtual to physical
                converting physical page number to a physical address of a particular byte
                    add page offset of 0
                    (same as multiply by page size)
            once we find the last-level page table entry
                contains the PHYSICAL page number of the data
                combine physical page number with page offset from original virtual address

        15 --> 015


-----
- race conditions avoiding
    - one way which works (but might be more conservative than neeeded)
        if you have shared data, always lock before making changes
        and never assume anything about what happened while you didn't have it locked

- avoiding busy-waits
    - scenario: waiting for an event X to happen
        naive (slow busywait) check if it happened, check if it happened, check if it happened...
    - two primitives to help:
        - condition variables:
            ~ thing we need to wait for MUST be protected by a lock 
                (hopefully doing anyways to avoid race conditions; see rule above)
            ~ check if it happened, if not release lock + start waiting
            * extra task: whatever another thread makes X happen, it needs to broadcast
                (or signal) us 
            ~ recheck after waking up the condition (if nothing else, b/c of spurious wakeups)
        - semaphores: (analogy: "opening a gate")
            if you know someone's always waiting:
                - semaphore that's intialy 0
                - wait (down) on semaphore
                * extr task: up on semaphore (always)
            if you don't know someone's always waiting:
                ~ semaphore that's initially 0
                ~ check if we need to wait
                ~ mark that we've started waiting
                ~ release lock
                ~ wait (down) on semaphore
                * extra task: if someone's started waiting, then  post (up) on semaphore
                
- implementing spinlock
    - simplest strategy: atomic operation that reads a value and then writes LOCKED
        if we read UNLOCKED, we know that THIS THREAD set it to LOCKED
            "test and set"

- signal versus broadcast
    - signal means that at most thread goes per time this event happens
        tricky part: need to prove the "at most one"
        UP on a semphore: yes, at most one theard
        0 -> 1 on a semaphore: not necessairily, because 1 -> 2 isn't accounted for
            (might be waking up less things that need to go)

    - question: when will second thing waiting when you signal be woken up
        is that really gaurenteed?

- copy on write
    - copy one page or multiple pages
        - OS only needs to copy one page
        - it has the option of copying more, maybe?