- last time:
    - capabilities: token to access = ability to access
        - example: file descriptors
        - to make useful: provide way to pass between processes
        - permissions check done when acquiring capability (hopefully)

    - virtual machines: hardware / host+hypervisor / guest OS / application in guest OS
    - hypervisor tracks similar information to process control block:   
        current registers; current memory; current I/O devices
    - hypervisor tracks extr ainformation to emulate HW's kernel mode:
        does guest OS believe it's in kernel mode?
            [guest OS is always in user mode on the real hardware]
        guest OS page table pointer
        guest OS exception table pointer
        ...
    - trap-and-emulate
        hope/make sure special operations in guest OS trigger exception
            I/O
            updating things that support kernel mode on the real HWS
            system calls
            ...
        in hypervisor/host OS exception handler: do what hardware would do
            if guest OS thinks it's in kernel mode, maybe privileged operation
                emulate talking to I/O device
                update emulated page table pointer
                switch from pretend kernel to pretend user mode
            reflect exceptions back to guest OS
                system call while guest OS is running --> triggers GUEST system call handler
    - shadow page tables
        build "shadow" page table for real hardware by combining guest + hypervisor PT
        add shadow page table entries on demand
        clear shadow page table in response to TLB flushes
        separate tables for pretend kernel versus not

-----

- page cache
    - want to keep program data and file data mostly in memory
        - *mostly* --- have a place on disk for the data
        - can store on disk --- when not loaded yet
        - can store on disk --- if not enough space in memory
    - program data: everything that's in a program memory
    - file data: everything in a regular file
    - we need some way to lookup:
        given something I want to access, where is it in memory? ["forward mapping" (cache hit)]
            if it's part of a process's address space: page table
            if it's part of a file being read with system calls: some OS data structure
        given something I want to access, where is it on disk? ["forward mapping" (cache miss)]
            if it's part of a process's address space: some OS data structure attached to the PCB
            if it's part of a file being read with system calls: based on how the FS works
    - more subtly, also need a way to lookup:
        given something I know is memory, where are there pointers to it? ["reverse mapping"]
            we need some sort of OS data structure for everything kept in memory in the cache
    - needed this information because when memory is full, we need to unset pointers to free up
        more memory for something else  
            - "page replacement"
- page replacement algorithms
    = a way to choose what thing to replace when memory is full and we need to bring in something new

    theoretical best: replace whatever is going to be accessed never again or furthest in the future
        ("Belady's MIN")
    practical ideas:
        assume things that haven't been accessed recently won't be accessed for a long time
            --> least recently used policy
            if this assumption is true, it's enough to track things which have been accessed recently
                without being precise about it
                example: second-chance policy: look for something which hasn't been accessed since
                    we added it to a list of pages
                various choices about what time-spans we look for accesses in

    sometimes the assumption above is false ---> we can have heuristics ["guesses"] for those cases
        example: things we're only going to access once


- device driver flow
    two components to the device driver:
        "top half": the part run from system calls/etc.
            sets up the I/O operation
        "bottom half": the part run from the interrupt handler
            triggered by the device, finished the I/O operation
            fills in buffer
    general pattern:
        process makes a read/write call
            top half executes and checks if it already has the data for read/write
                [typically: buffer containing data from/to device]
            if it doesn't, it puts the process to sleep
        while the process is sleeping, other things can happen
        device triggers interrupt when read/write is finished
            bottom half completes I/O operation
                [typically: updates buffer, etc.]
            bottom half wakes up process doing read/write call
    (sometimes: top half might need to run multiple times to do a very big read/write)

- SSD versus HDD performance
    HDD: big performance issue is that the head needs to move to the right location on the disk 
        so it's useful/important to try to keep data you want to access together close physically on the disk
        to help OSes, etc. do this: HDDs assign sector numbers such that nearby sector numbers are close on the disk
    SSD: no moving parts, so physical location doesn't matter very much
        in fact: really doesn't matter, because the SSD doesn't place data in physical locations knows about
        fast read/write
        big performance problem comes from needing to clear up space
            has to be done in "erasure blocks"
            data has to copied to make sure a whole erasure block can be erased
            happens in the device (OS doesn't generally get know about it except for slowness)

- inode-based filesystems
    without block groups layout of the disk:
        [ header ][ array of inodes ][ array of bools: is data block free ][ data blocks]
                  ^^^^^^^^^^^^^^^^^^^
                   information about files (size, owner, permissions, where the first several data blocks
                                            are located)
- single/double/triple-indirect blocks
    inode directly contains the block #s (index into data blocks part of the disk) of first few blocks of data
        "direct block pointers"
    example: inode has 10 direct block pointers and blocks are 2K,
             then the first 20K of the file can be found using those 10 pointers
    after that, more and more indirection, depending on how big the file is:
    example: inode has one indirect pointer (after 10 direct)
             then bytes 20K through (1024K+20K) = 1044K are found using that indirect pointer
        indirect pointer points to a block (2K)
            the block contains an array of block pointers -- how big?
                if block pointers are 4 bytes, then 2K/4b = 512 elements
                each of those elements points to a block [that will contain data for the file]
                512 * 2K = 1024K in those blocks
    what if 1044K isn't big enough?
    example: inode has one double-indirect pointer (after 10 direct + 1 indirect)
             then bytes 1044K through (1044K+524288K) are found using that double-indirect pointer
        double-indirect pointer points to a block, whcih contains 512 pointers ["indirect pointers"]
            those pointers each point to a block, which contains 512 pointers ["direct pointers"]
                and those pointers point to blocks that will contain data for the file
        512 * 512 * 2K of data for the file in those blocks -->  524288K of data
- # of blocks needed/file size calculation
    if we use the indirect block, that doesn't mean that all the pointers in the direct block are neeeded
        example: if the file is 24K, we need 2 pointers in the indirect block (20-22K, 22-24K)
        example: if the file is 1048K, we need 1 pointer in the double-indirect block (1044-2064?K)
                    and 2 pointer in that block that pointer to  (1044-1046K, 1046-1048K)
            the first 20K use the direct pointers, next 1024K use the indirect block and all its 512 pointers
            then we have 4K left over, but we need to use the double-indirect pointer next
                even though we only want to point to two blocks of data
                    to do this: we have double-indirect pointer point to a block with one useful indirect pointer
                        then that useful indirect pointer points to a block with two useful direct pointers
            total number blocks:
                of data: 1048K / 2K = 524 blocks of data
                plus 1 block of direct pointers pointed to by the indirect pointer in the inode
                    [points to where bytes 20K through 1044K of the file's data are]
                plus 1 block on indirect pointers pointed to by the double-indirect pointer in the inode
                plus 1 block of direct pointers pointed to by the first entry in the double-indirect
                    block in the inode
                    [points to where bytes 1044K through 1048K of the file's data are]
                = 527

- fragments
    - for very small files, we might not want to allocate a whole block of data
    - divide blocks into smaller pieces, but only use those smaller pieces for small files
        - fragments: these smaller pieces
    - why only for small files?
        - if we used small block sizes for large files, then we'd add a lot more block pointers

- sockets
    - when setup: act like two-way pipes (with file descriptors for each end)
        can write on "end" of the socket, read on the other "end" and vice-versa
    - setup process: we need to specify
        [client] the address + port number ("socket address") to 'connect' to 
            socket() --> create file descriptor
            getaddrinfo() --> translate human-readable hostname (virginia.edu) + service name to socket address
            connect() --> make file descriptor access a server using that socket address
        [server] the address + port number to 'listen' for connections on
            socket() --> create "server socket" file descriptor
            getaddrinfo() --> translate human-readable hostname (virginia.edu) + service name to socket address
            bind() --> set the address + port number using a socket address
            listen() --> tell the OS to expect connects
            accept() --> to get a new file descriptor for the next client that connects
                can call accept() in a loop for multiple clients

- distributed system failure cases
    - our main failure model assumed for *machines* "fail-stop"
        either the machine works or it's *as if* powered-off
        can remember things in spite of failure by writing to disk carefully
    - our main failure model assuemd for *networks* we'd have lost messages + reordering
        lost messages: you send a message and it doesn't get there sometimes
                includes: you send a message, and it does get there, but the reply message doesn't get to you
        reordered messages: you send a message A, then a message B, and message B arrives first, then message A
                includes: you send message A and wait long enough to assume it's not getting there

- guest OS versus host OS exceptions
    - when a guest OS is running, the only exception handler on the Real Hardware is the host OS's
        [the real hardware doesn't know it's running a virtual machine (b/c we assumed no HW support for VMs)]
    - causes of host OS exceptions when a guest OS is running:
        * the guest OS did something that would've triggered an exception if the guest OS wasn't in a VM:
            - ?caused an I/O device to request something to happen
            - making a system call
            - accessing memory that is invalid in a page table
            - ...
            --> if it would've happened outside a VM, our VM (b/c the guest OS shouldn't need to know it's in a VM)
                    should run the guest OS's exception handler from the host OS exception handler
        * the guest OS did something that would've done something other than triggering an exception
                if the guest OS wasn't in a VM:
            - talking to an I/O device via memory-mapped I/O
            - asking to change the page table base pointer
            - asking to disable interrupts
            - asking to change the exception table address
            - asking to switch from kernel to user mode
            ---> the VM should pretend to do ("emulate") the operation the guest OS would've done outside a VM

- which thread scheduler policy makes sense when
    - metrics:
        turnaround time ~~ time from when a program becomes runnable until when it stops being runnable
            if the program is a loop of "wait for input; process input", then this reflects how "interactive"
            the program is
            --> optimal algorithm: SRTF (shortest remaining time first)
            --> practical algorithm: approximations of SRTF that estimate the remaining time somehow
                    usual estimate: based on past amounts of time used
        fairness ~~ whether multiple programs/users get to use the same amount of CPU or have the same
                ability to access the CPU
            --> round-robin: when you're runnable, same ability to access CPU
                    but we don't care about when you're not runnable
            --> Linux CFS (virtual time idea): same total amount of CPU usage over time
                    with some configurable options for over how much time?
                        [b/c we don't want to worry about what happens minutes/hours/days ago]
        throughput ~~ amount of compute time used for actual work, not figuring out what to do or starting/stopping programs ("overhead")
            --> FCFS: simplest scheduling decision + minimal context switches
        ...

- POSIX access control
    - two mechanisms for specifying file access control
        ~ historic mechanism where files have:
                --> usually stored in inodes
            an owner (user ID)
            a group (group ID)
            "mode" bits specifying:
                can owner read the file?
                can owner write the file?
                can owner execute the file?
                can group read the file?
                can group write the file?
                can group execute the file?
                can anyone else read the file?
                can anyone else write the file?
                can anyone else execute the file?
            --> can view this as an access control list, but with very limited features
        ~ less supported, much more flexible mecahnism called "POSIX ACLs"
                --> probably requires special filesystem support
            ("real access control lists")
            have a list of any number of
                user ID: can user ID read / write / execute?
                group ID: can group ID read / write / execute?
                default: can non-specified user/group read/write/execute?
                ...
    ~ for either of these checked on open(), rename(), remove(), etc. based on the user and group ID
        stored in the process control block of whoever makes a file-related system call

- TOCTTOU
    time of check to time of use

    a security problem that occurs when we separate the check of an access control list (or similar)
        from actually accessing an item

    example: print file program that runs as root
        timeline:
        [attacker] calls the print file program on example.txt
        [print file program] checks that the attacker is allowed to access example.txt <--- check
        [attacker] moves (links, makes a symbolic link) of secret.txt to example.txt
        [print file program] accesses and prints "example.txt", which is now a different file <--- use
    
    one solution is:
        print file program opens file, checks using the file descriptors
            --> file descriptor won't change what file it refers to because of rename/link/etc. operation
        print file program opens file in a way which checks if attacker can access file

            --> both solutions: combine check and use in some way --- ensure it's always the same file

    can occur with non-files:
        [attacker] kill(pid 10000)
        [kernel system call implementation] can attacker kill pid 10000? --> yes
        [attacker] cause pid 10000 to exit
        [attacker] trigger important program to start as pid 10000
        [kernel system call implementation] actually kill pid 10000, but it might different than the one
            checked

        solution: make sure what pid 10000 doesn't change between check + use (to kill it)


- final logistics
    ~ I'm going to release a document of questions and an answer template document
    ~ answer template will be a text file with fill-in-the-blank areas
    ~ you'll upload that text file after filling it in to kytos    

    ~ worth 15% -- same as midterm
        focus on material from after the midterm
        there probably will be some coverage of topics before the midterm