- last time: virtual machines
    guest / physical / machine
    trap-and-emulate
        hopefully instructions that behave diff user/kernel mode cause exception
        exception handler emulates instruction
            do priv op if guest OS in pretend kernel mode
            forward exception if guest OS in pretend user mode
    shadow page tables
        synthesize page table from guest OS page table + hypervisor physical -> machine mapping
        shadow page table = virtual -> machine mapping, needed by hardware
            (when guest OS is running in user mode)
    modern VMs --- extra HW support

---

- inodes
    - for comparison: FAT splits the data for a file across different strurctures
        size, modification time, etc. is in the directory entry
        the location of the first block of the file is in the directory entry
        thelocation of later blocks are in the FAT
    - traditional layout of disk with inodes and no block groups and no journalling:
        [superblock = header (approx. the Fat BPB]
        [free block map]
        [inode array (fixed # of inodes created with disk is allocated)]
        [actual data blocks]

    - traditional Unix design: most data for the file is in this "inode" 
        one per file
        directory entries ONLY contain name + pointer to inode
    - can have multiple directory entries referring to the same inode
            unlike FAT, no worries about keeping them in sync
            link() system call sets this up
    - traditional way inodes specify blocks of the file w/ an array of block pointers
        (block pointer = # of a data blockon a the disk)
        some number of direct pointers --- in the inode itself
            (for sufficiently small files --- all we needed)
        K direct pointers = first K blocks o fthe file
        if not enough direct pointers to fit file data, then
            an indirect pointer to another array of blocks
            array of blocks was only for this file
        if an indirect pointer + direct pointers still wasn't enough, then
            an double-indirect pointer to pointer to pointers to more arrays
        and sometimes a triple-indirect pointer, etc.

    - (alternatives to block pointers we talked about might be used instead, sometimes
        extents (block pointer + # blocks instead of single block pointers)
        fragments -- sometimes replace block pointer w/ pointer to a patr of block
    )
    
- direct/indirect/... counitng data
    - example: 4K blocks, 8B block pointers
    - 10 direct blocks, 1 indirect, 1 double-indirect, 1 triple-indirect
    - 10 direct blocks: point to 4K each ---> 40K
    - 1 indirect block: point to (4KB / 8B =) 512 pointers to 4K each --> 4*512K
    - 1 double-indirect block point to (4KB / 8B =) 512 pointers to     
            indirect blocks pointers --> 512 pointers to 4*512K each --> 4*(512^2)K
    - 1 triple-indirect block point to (4KB / 8B =) 512 pointers to
            double-indrirect --> 512 to 4*512^2 K --> 4*(512^3)K

- redo-logging
    normal operation:
    - write our intention to log
                typically: intention is a set of operations that we want to happen
                all together
                    example: removing a file changes direntry + inode for file + free blocks
            <-- failures just mean we don't do what we intended
    - write a "commit" to our intention to log
            <-- failures mean we STILL do what we intended, because we read our log
                and might end up doing the thing twice
    - do the actual thing we intended
        because we might do this twice on some types failures the operation
        MUST BE SAFE TO DO TWICE
            OKAY: operations like "set count to 14"
            NOT OKAY: operations like "increment count"
    - after the actual thing is done, can clear the log

    recovery (coming back after crash):
    - if commit is written the log, redo everything that was committed to
        "the transaction"
    - if commit message NOT written, then ignore that part of the log
        (and can delete that the part of the log)

- RAID -- reading and writing
    data split up into sets of sectors

    disk 1     disk 2      disk 3    disk 4
    sector A / sector B / sector C / parity = sector A XOR B XOR C

    can recover sector A given B, C, (A XOR B XOR C)
    "    "       "     B  "    A, C, (A XOR B XOR C)
    "    "       "     C  "    A, B, (A XOR B XOR C)
    "    "       "     (A XOR B OR C) given A, B, C

    normal operation:
        to update sector A, we need to also update parity
        write to sector A becomes:
            read sector B 
            read sector C
            compute (A XOR B XOR C) 
            write to sector A
            write to parity

    on failure:
        lose sector A
        to read the data for sector A:
            read B 
            read C
            read (A XOR B XOR C)
            compute (B XOR C) XOR (A XOR B XOR C) = A

- RAID 5 versus RAID 4
    
    RAID 4:
    disk 1     disk 2      disk 3    disk 4
    sector A / sector B / sector C / parity ABC = sector A XOR B XOR C
    sector D / sector E / sector F / parity DEF

    write A + E --> write to disk 1 once, disk 2 once, disk 4 twice
    write A, then B, then C, then D, then E, then F one at a time -->
        write to disk 1 twice, disk 2 twice, disk 3 twice, disk 4 six times
        [if we were writing these all in one batch, we could do better by
         only writing the parity once]


    RAID 5:
    disk 1     disk 2      disk 3    disk 4
    sector A / sector B / sector C / parity ABC = sector A XOR B XOR C
    sector D / sector E / parity DEF / sector F

    write A + E --> write to disk 1 once, disk 2 once, disk 3 once, disk 4 once
    write A, then B, then C, then D, then E, then F one at a time -->
        write to disk 1 twice, disk 2 twice, disk 3 four times, disk 4 four times
        [if we were writing these all in one batch, we could do better by
         only writing the parity once]

- RPC and marshalling
    - marshalling AKA serialization is translate data into bytes
        - what's wrong with just copying data from memory
            (e.g. what you did to load structs from the FAT FS)
            * pointers --- want to send things over the network with pointers
                e.g. vectors, variable-length strings
            * portability --- want to transfer data between different architectures
                e.g. endianness
                e.g. different integer sizes
                e.g. different padding
        - the typical procedure for RPC libraries is that we have an
            "interface description language" that compiles into a
            class/struct definition for our data + code to convert it to/from bytes
          (alternative: parse existing class/structs)
    - RPC: high-level --- make sending messages to server look like calling a function
        function turns into a connect to server + send command + receive response
            command contains arguments + identifier for function
            response turns into return value
    - RPC can't quite look like a normal function call
        ~ argument/return value type need to be convertible to bytes
        ~ failures can happen (e.g. lost connection)
        ~ have some way to identify the server

- what is two-phaes commit for
    multiple machines want to do an operation "together"
    and each machine has some part they must do locally
        e.g. update machine A's student record and machine B's course record

- two-phase commit failure cases
    - PREPARE
         coordinator -> worker: "this is what I want to do" [PREPARE]
         worker -> coordinator: "I agree to do this" [AGREE TO COMMIT]
                OR              "I agree to NEVER do this" [AGREE TO ABORT]
    - COMMIT
        all workers agreed to do it:
                doesn't matter if worker goes away after agreeing
            coordinator -> worker: "Everyon'e committed, do it" [COMMIT]
    - ABORT
        at least one worker agreed to NEVER do it
            coordinator -> worker: "It's not going to happen" [ABORT]

    FAILURES:
        - coordinator fails before recording responses from workers:
            resend the PREPARE message, get same response that workers might
            have already sent
        - coordinator fails after recording response from workers:
            coordinator's log indicates whether to COMMIT or ABORT
            resend COMMIT/ABORT messages
        - worker fails after deciding whether to AGREE TO COMMIT/ABOTR
            worker's log indicates decision made
                [gaurentees that worker doesn't change it's mind]
            resend AGREE-TO-COMMIT/ABORT
        - messages lost in network
            - can resend on timer if we have reason to believe that other end
            - coordinator should resend COMMIT/ABORT if workers resend vote
    
- capabilities
    - in a true-capability-based OS, we don't have global names
        instead, you have some handles (like file descriptors)
        to access resources or get more handles for resources
            (e.g. directory handle gives you handles for files in directory)

    rather than your access being determined by user+group, its determined by what
        handles you hvae

    on this system, the login/program launching utility probably has handles that give it handles
    for anything, and it gives out a subset that makes sense for you

    hope of these sytems is that each program has only handles that make sense
        e.g. web browser doesn't get read all my taxes

    - Capiscism --- FreeBSD capability extension not a "pure" capability system
        because it was built on top of a Unix
            (couldn't ginore that underlying system still had access control lists)

- last post-quiz on permissions
    - want to give a user write permission but only to write ASCII data
        - NOT A FEATURE POSIX permission support
            "read" "write" "execute" -->
            "read" and e"execute" are irrelevant
            "write gives too many permissoins --- can write ANY data
        - also not a feature file descriptors support
            file descrpitor is "writeable" "readable" --->
            "writeable --> can write ANY data
    - setuid program can enforce this by doing the write on behalf of the user
        procedure:
            check which user ran me,
            use my set-user-ID permissions to open the file,
            copy the data the user sends me to my stdin to the file
                BUT ONLY if it's ASCII data

- working set model
    - intuition: programs have a "working set" that they are accessing
    - data outside this working set doesn't need to be in memory for the program to run

    - consequence: can keep only ythe working sets of programs in RAM
        and still have our system run well
    - consequence: if we canNOT keep the working sets of porgrams in RAM
        (don't have enough space, e.g.) --- our system will run horribly ("thrashing")
    - consequence: least-recently used does okay for deciding what to put in RAM
            (assuming RAM fits all program's working sets)
        things in working set will usually be recently used
            (exception: switching between working sets)
- mmap
    - interface: rather than using read/write, I call mmap() and get a
        memory region which reflects the contents of the file
    - if I use MAP_SHARED ---> updating this memory region updates the file
                        and updates to file update this memory region
    - if I use MAP_PRIVATE --> updating this memory region gives me a private copy
                (and unspecified otherwise whether I get updates from the file ---
                    compromise to make it easier to implement mmap)

    - implementation: OS caches data of files anyways,
        we just not only keep the cahce in the kernel's memory, but also
        add it to the user's page table

        some extra work:
            copy-on-write for MAP_PRIVATE mappings
            checking dirty bits to identify if file is modified
            making sure we update page tables if we replace file cache pages

- readahead
    - programs often read files sequentially
    - would prefer not to wait for program to get to byte X of the file to read byte X
        from disk

    - solution: detect when program is reading sequentially and try to stay ahead
        of the program in reading the disk

    - typical heuristic: have a window

                extra stuff we've read
             vvvvvvvvvvv
        [       *       ]
        ^^^^^
        program has read

        have some mark *, where we say "yeah, I think the program is still reading from
        the file, we should read even more ahead"
    
    - compromise:
        reading too much ahead --- evict normal data from memory
            especially if program doesn't end up reading hte reset of the file
        reading too little ahead --- fall behhind the program is reading