- twophase reordering tests 
    - unfortunately not as complete as intended
- OH next week
    - should be on the calendar
- I need TAs next semester
    - if you'd like to either apply or email me
    - but I won't look until after I figure out final grades

----
- location of exam
    - should be Nau 101

- F2018 Quiz 6#1 page tables (how many pages to store)
            Question sameorder
            Before it starts any programs, xv6 sets up its page table so that 224MB starting
            at virtual address `0x80000000` corresponds to physical byte addresses starting
            with 0. (MB = 1024 * 1024 bytes.) It does this using 4K pages.
            How much space is requires for page tables where only these pages are valid?
            (Assume it uses 4K pages and the page table format used by xv6.)

            a. 28 pages (112KB)
            a. 29 pages (116KB)
            a. 32 pages (128KB)
            a. 56 pages (224KB)
            *a. 57 pages (228KB)
            a. 224 pages (896KB)
            a. 225 pages (896KB)
            a. 256 pages (1024KB)

    what do we need for this:
        definitely need a first-level page table
            wihhc will take 1 page
        the best we could possible do is 1 second-level page table for each 1024 valid pages
            224MB / 4KB = 57344 valid pages
            or 56 * 1024 valid pages
        the best possible is 56 second-level page tables --> 56 pages to store

        best possible total is 57 pages

        would be worse than the best if we couldn't fill up all of each second-level table
            due to the virtual addresses
            but 0x80000000 maps to the start of a second-level table, so this won't a problem

- mmap and how processes's memory layout can be stored
    Linux and common Unix way of storing what a process's memory looks like is as a list of regions that
        could have been created iwth mmap.
    on a page fault --> look up which region to decide what to do
        (e.g. copy on write? load from a file? allocate new zero pages? etc.)
    needed to lookup physical page --> where it is to update page tables on eviction/etc.

- page replacement algorithm: pros/cons/types
    goal: trying to guess what won't be accessed soon/what will be accessed soon

    optimal for # of replacements (Belady's MIN):
        replace the thing which is going to be accessed furthest in the future
            but this requires knowing the future

    typically, we assume that we want something close to LRU
        with most programs, the least recently accessed thing is similar to the thing which is least likely to be accessed soon
    
    true LRU:
        is not practical to implement b/c we have to do bunch of work for each memory access
            work = update the ordering of pages so we can always tell which is least recent

    approximations based on looking for accesses during some period of time
        "second chance" --> period of time = since we last examined it from replacement
        Linux's SEQ policy ---> period of time = since we put it on a list of "inactive" pages
    also some variation about whether you try to react to the access immediately or wait until you need to look for 
        an unaccessed page
            "second chance" ---> just recorded that it was accessed, but didn't use this until we needed a new page
                (and it made it to the bottom of the list of pages)
            SEQ --> reacted soon after access while on inactive list to remove it

    we also mentioned some policies that went "beyond" LRU
        ~ try to heuristically detect patterns that LRU is a bad idea for
            Linux trying to detect files that are accesed only once
        ~ eager replacement
            readahead --- guess future accessed before they happen

- optimal lay out of data on hard drives 
    general rule: want to keep related data in contiguous sector numbers
                               ^^^^^^^^^^^^
                                data that you will access together

    hard drive hardwre will make sure adjacent sectors are physically close
        --> lower seek times and rotational latency

- optimal lay out of data on SSDs
    block remapping happens, so it mostly doesn't matter
    also nothing has to move physically, so adjacency doesn't matter much
    how many writes you do is goign effect performance a bunch


- inodes -- what and why
    inodes have all the information about a file on disk except for the name
        the name is stored in the directory
        when we're looking up a file, we want all the names we need to check together
    alternative to the more ad-hoc FAT approach of splitting the informatoin between the directory entry and the FAT table

    challenge: what about variable sized files
        solution: we have pointers to the extra info (locations of additional data blocks)
    
- F2018 Q14a --- space required to store files
    - storing 1.5KB file in:
        inode-based filesystem, 256-byte inodes: 5 direct ptrs, 1 indir, 1 double-indir
        4KB blocks, 4B block pointers
        space for data + inodes + other disk space allocated
            allocate one block to store the data
                4KB
            going to allocate one inode
                256B
            in that inode we will set a direct pointer (but the pointer is there regardless of whether we use it)

        4KB + 256B
    - storing 1.5KB file in
        inode-based filesystem but instead of block pointers we have 10 extents in the inode
            which store (starting block number, length in blocks)
            
            4KB + 256B

        we still need to allocate a block because the length is in blocks and because we have a free block map
            (mentioned in the question) that has a bit per block
    - storing 1.5KB file in FAT32 in 4KB clusters
        space for FAT entries + file data (but not directory entries)
            4KB + 4B (for the FAT entry that siad no next cluster)
- F2018 Q14b same quesiton but 16MB file
    - inode, 5 direct, 1 indir, 1 double-indir, 4KB, 4B block pointers
        16MB for the data
    16MB (2^24 B)/4KB (2^12 B) = 2^12 direct pointers
        5 direct pointers in the inode
        2^12 - 5 in indirect blocks
    each block can hold 2^10 pointers,
       4 blocks can hold 2^12 pointer
    4 indirect blocks
    1 double-indirect block to point to the last three indirect blocks
    4 + 1 blocks of extra data 

    16BM + 4 * 4KB (indirect blocks) + 4KB ( double-indirect block) + 256B (inode, b/c the quesiton said to include that)

- why double-indirect and not two indirect pointers
    - heuristic is that we're going to have some much bigger files to deal with
    - also, most we're saving is a block by doing this, and that's not much for these big files

- symbolic links
    - either directory entries or special files that just contain the name of a different file
        when used, the operating system goes to that different file, looking it up BY NAME

    - unlike hard links, if the file is removed and created again, still works
    - unlike hard links, doesn't keep the file around if the original is removed
        with hard links: adds to a reference count in the inode
    - unlike hard links, works across different mounts
        wit hard links: need to store an inode number --- can't work across two flesystems
            (e.g. USB stick could have a symbolic link to /bin/ls even though /bin/ls varies between
                systems you can plug the USB stick into)

- trees on disk
    - if we want to store some ordered data strucutre on disk, we can use a search tree
        directory entries
    - BUT we want to be efficient for HDDs/SSD and read/write big sectors/blocks at a time
        - solution: tree where each node has a bunch of children and a bunch of items
        - otherwise works similar to a normal search tree:
            if you hvae 4 children, you know a value in between each of the four children
            and can compare to determine which one to go to when searching/adding a value

- journalling order
    redo logging:
        [normal operatoin]
        (1) log what we are about to do
        (2) log that we really ar going to do ("commit")
        (3) actually do it

        [recovery]
        (1) check the log and see if we were really going to do it
        (2) if so, actually do it (again maybe)

- hypervisor: handling a system call in the guest OS
    assumption: trap and emulate and we don't have special HW support for pagie tables

    (1) guest OS application: makes a system call
    (2) hypervisor [= host OS]'s system call handler runs   [b/c hardware ran it]
   *(3) hypervisor system call handler returns to the guest OS's sytem call handler
   *(4) guest OS's system call handler does whatever the system call requires
   *(5) guest OS's system call handler tries to return from the system call
   *(6) hypervisor's protection fault handler runs [b/c hardware detected that the privileged "return from exception/interrupt"
                                                        instruction was run in user mode]
    (7) hypervisor's protection fault handler returns to the guest OS's application 
        (by looking up where the guest OS is trying to return to)

    * = running the guest OS in kernel mode ---> different page table than when running it in use rmode

- ACLs v capabilities
    access control list: list of users/etc. ("protection domains") that can do operations on object (like a file)

    capabilites: token to access thing = permission to access thing
        example: file descriptor act like this
            and we can pass them between programs (instead of using filenames, etc.)

    pro of capabilities: 
        - harder to forget to check a capability is valid than an acces control list
        - easier to delegate capabilities
            e.g. if you want to give a setuid program a file, you just pass the file descriptor
            e.g. if a user is running a server and you want them to read your file, you pass a file descriptor
                    rather than fiddling with the ACL (the act of doing the opreation might handle the permissoins "for free")
    con of capabilities:
        - revocation
            it's very hard for someone with a capability to stop having access to the file/etc. if you change your mind
        1G- makes naming/etc. resources more diffiicult
            can't have user pass capability by reading out a filename