- mmap() --- why use and how it works
    ~ it's a convenient way to load files into memory
      and have the OS handle the details
    ~ typically loading programs is a effectively a bunch of mmap() calls
        - load code with read-only
        - load initial values of globals copy-on-write

    ~ how mmap() works
        process contrl block contains a list of mmap()'d regions
            0x1000-0x5000 come from foo.exe bytes 4096 and on
        on a page fault, the OS can lookup the mmap() region and use that
            if page fault at 0x1400, then "load" the page that contains bytes 4096-... of foo.exe

        pages that are mmap()'d can be shared with the page cache
            OS is already chching file data
            don't copy when loading for mmap()
            just map the PTE point to the cached copy
                (and make the sure the cached copy doesn't go away)

        also means mmap() can be used to share memory between programs

- how much inodes point to directly and indirectly
    big idea of an inode: all the file data in one place
    but since inode is fixed-size, need a way to store large and small files
        want to allocate variable amounts of space to file

    part of the data about where a file is is in the inode
        most notably: "direct pointers"
    part of the data about where a file is 
        --- if the file is too big --- needs to go elsewhere
        and inode contains pointer(s) to this extra metadata
            "(1x/2x/3x/...)-indirect pointers"

    support 1-100KB files well, but also handle >> 1GB files well

    first blocks: "direct pointers" --> what blocks is the file data in
        fixed number stored in node for the first several blocks of the file
        most files are small, so this is all most files need

    if direct pointers in the inode aren't enough, we need more of them
        solution: allocate more blocks of direct pointers
            (why a block? b/c we don't allocate smaller units on disk)

    next set of blocks: "indirect pointer" --> points to a block of direct pointers

    following that: "double-indirect pointer"
        well, we couldn't fit enough indirect pointers, so allocate a whole block of them
            and each of those indirect pointers can have a whole block of direct pointers they
            point to

    following that: "triple-indirect pointeR"

    result: tree structure

    ---
    how much space?
        direct pointers = each points to 1 block 
            space that can be pointed to: # direct pointers * block size
            overhead: 0?? --- we have to allocate the whole inode anyways
        indirect pointers = each points to 1 block of direct pointers
            space that can be pointed to:
                [# indirect pointers * (# direct pointers / block)] * block size
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
                    indirect pointers are equivalent to this many direct pointers
            overhead (space for metadata extra):
                # indirect pointer * 1 block
        double-indirect pointers:
            space that can be pointed to:
                [# double-indirect pointers * (# indirect pointers / block)] * [ space per indirect ptr ]
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
                    double-indirect pointers are equivalent to this many indirect pointers
            overhead:
                # double-indirect pointers * 1 block        + 
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                        space for the indirect pointers

                [# double-indirect pointers * (# indirect pointers / block)] * 1 block
                                                                               ^^^^^^^
                                                                               overhead per
                                                                               indirect ptr                                                                            
    in a file: we don't have to have all the pointers be non-NULL
        save space by not storing the extra pointers
        (and of course, not storing data that isn't in the file)

    another overhead: we store whole blocks (or maybe fragments) even if the file is not
        a whole number of blocks/fragments large

- FS snapshots
    ~ user point of view:
        two (or more) copies of the filesystem
        which represent different versions of the filesystme

        but they save space when they hvae files in common

        backups take the form of "make one of thse copies"
    ~ implementation: copy-on-write
        if the two copies are exactly the same: they have exactly the same data
        if the two copies have one file that's different:
            they'll have pointers to different inode arrays [*]
            and only one inode wil be different between the two

            and the two inodes will have blocks in common where they share data
                (e.g. if one file was appended)

            [*] inode array is going to be stored in pieces
                we'll have an array pointers to those pieces
                and only the parts which are different between the two copies
                will have different pointers

        details we need handle:
            (a) normal file updates need to make the new copies
            (b) we need to track reference counts
                    [to do deletion, etc. safely]
            
- quiz today Q1 and Q2
    guest OS program <<<<
    ---------------
    guest OS         <<<<
    ---------------
    hypervisor / host OS
    ---------------
    hardware    

    "Suppose a guest operating system is running in a virtual machine implemented using 
    trap-and-emulate. A user program running in the guest operating system tries to
    access memory that is not allocated but will
    be allocated on demand by the guest operating system. What will happen?"
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    WRONG ANSWER:
        (a) the user program's memory access will trigger a page fault in the host operating system
                CORRECT
                
        (b) then that page fault handler will update the 
                 ^^^^^^^^^^^^^^^^^^^^^^^
                 host OS page fault handler

                guest operating system's page table, then resume running the user program"

                NOT CORRECT
                b/c the hypervisor doesn't update the guets page table itself
                    the hypervisor doesn't know that the memory is considered
                        alllocate-on-demand by the guest OS

                    guest OS policy NOT hypervisor policy

    WRONG ANSWER:
        (a) the user program's memory access will trigger a page fault in the host operating system, then that page fault handler will cause the guest operating system's page fault handler to run in 
            kernel mode
            ^^^^^^^^^^^
        problem: hypervisor NEVER runs the guest oS in kernel mode with normal trap-and-emulate
            (except maybe with HW virtualization support)

    CORRECT ANSWER [not an option]
        the user program's memory access will trigger a page fault in the host operating system, then that page fault handler will cause the guest operating system's page fault handler to run

            host OS page fault handler ends up delegating the guest OS page fault handler
            "reflecting the exception"

Q2:
    virtual -> physical -> machine

    last-level guest page table entry
        PPN 0x40

    what does the shadow page table entry corresponding to this look like
                  ^^^^^^^^^^^^^^^^^^^^^^^---- substitues for the guest PTE in the hardware PT
        physical page number 0x40 is what machine page number?
        
        guest physical memory starts at machine address 0x1000000
            machine  page number 0x1000  (4K pages)
            machine page offset 0x0 (12-bit page offset b/c question said 4K pages)
        physical page 0x0 is machine page 0x1000
        physical page 0x1 is machine page 0x1001
            physical addr 0x1000 is machi adddr 0x1001000

        physical page numbber 0x40 is
            physical address 0x40000 to 0x40FFF

- device controller versus driver
    device controller: the hardware that's attached to the processor (via some bus)
                       can trigger interrupts
                        
    device driver: the part of the OS that knows how to communicate with the device controller
                   respond to interrupts ("bottom half")

- communicating with devices: programmed v DMA
    how does the device controller get fed data or send data to the OS

    programmed I/O
    simplest option: the processor sends the data on the bus to the device
        this means the OS needs to have code that explicitly copies to or from the device
        means the device controller probably has a buffer with a physical memory address
            example: keyboard controller: read from the "current keypress" buffer
            disk controllers: copy to/from disk controller buffer into our page cache

    requires a buffer on the device controller
        

    direct memory access (DMA)
            doesn't make sense for keyboards, etc. which don't transfer much data
    another option: the device is on the memory bus and
        stores/loads data from memory at a location the OS specifies
        disk controller: OS says "my cache will be at this address, load into it"
            more efficient: device controller can use memoyr as its buffer, so
                no extra copy needs to happen, put where OS wants as received from device?
            more efficient: processor can be running other code while the copy happens
                and as long as the code doesn't use memory "too much" will run just
                as fast

    requires device controller to take memory addresses from the OS
        device driver needs to allocate memory to the device

        still communicates with the device controller
            setup memory addresses
            figure out whether read/write hapened yet?
            figure out if there's an error?


- page replacement polices
    - main family: least-recently used and approximations
            "not recently used"`
        usually can't do exact least-recently used b/c you'd have to update a data structure
            (like a linked list) on every read+write to memory
        approxmations based on 
                    "was it accessed since I last checked?"
                OR  "mark as invalid briefly and see if it gets accessed?"
            second-chance: 
                keep an ordered list of pages
                check if it was accessed since it got on the list?
                    yes --> put back on list
                    no  --> this page hasn't been accessed in a while
            SEQ:
                keep two ordered lists of pages
                    active --> assume probably used
                    inactive -> guess possibly unused
                move pages from active to inactive
                check if it was accessed while on the inactive list?
                    (perhaps check before it gets to the bottom)
                    if accessed -> move back to active list
                    if not -> okay to reuse, it hasn't been accessed recently
                hope: only scanning a smaller list of pages for accesses
                    don't assume a page accessed a long time ago isn't safe to evict
            
            CLOCK:
                general idea of periodically scan+clear "was it accessed" information
                and do something with recent history for each page
    - secondary family: heuristics for cases where least-recently used is wrong
        scanning files: Linux's heuristic was "allow things to be evicted if they aren't accessed
            twice" (and make it from top to bottom of the inactive list)
                "accessed twice" --> 
                    either with some interval in between (I don't know if linux does that?)
                    or because it's done via read() calls <<<<<< 


- pros and cons of FSs
    - FAT filesystem pros
        - simple to implement
        - okay with very constrained memory?
    - FAT filesystem cons
        - seeking is very slow
        - reliablity????
        - doesn't help keep file and directory data close to each other
            - always have FAT pointers at begining of disk
            - no FS support for keeping directory entries close to the files they refer to, etc.

    - FFS-like filesystem
        - avoiding allocating whole blocks for small files (fragments)
        - block groups (keep file + directory closer to each other)
        - have all the free-block information in one place in more compact format?
    
    - other FS featuers that are helpful
        - extents
            - pro: more efficient for large files
            - con: more complicated to allocate effectively
                   more complicated to seek through
        - trees for directory
            - pro: more efficient to find a particular filename in a directory
            - con: more complicated to update/scan/etc.
        - journalling /logging
            - pro: recovery from failures
            - pro: faster writes
            - con: more complicated to implement
            - con: doesn't eliminate data loss entirely