- mmap() --- why use and how it works ~ it's a convenient way to load files into memory and have the OS handle the details ~ typically loading programs is a effectively a bunch of mmap() calls - load code with read-only - load initial values of globals copy-on-write ~ how mmap() works process contrl block contains a list of mmap()'d regions 0x1000-0x5000 come from foo.exe bytes 4096 and on on a page fault, the OS can lookup the mmap() region and use that if page fault at 0x1400, then "load" the page that contains bytes 4096-... of foo.exe pages that are mmap()'d can be shared with the page cache OS is already chching file data don't copy when loading for mmap() just map the PTE point to the cached copy (and make the sure the cached copy doesn't go away) also means mmap() can be used to share memory between programs - how much inodes point to directly and indirectly big idea of an inode: all the file data in one place but since inode is fixed-size, need a way to store large and small files want to allocate variable amounts of space to file part of the data about where a file is is in the inode most notably: "direct pointers" part of the data about where a file is --- if the file is too big --- needs to go elsewhere and inode contains pointer(s) to this extra metadata "(1x/2x/3x/...)-indirect pointers" support 1-100KB files well, but also handle >> 1GB files well first blocks: "direct pointers" --> what blocks is the file data in fixed number stored in node for the first several blocks of the file most files are small, so this is all most files need if direct pointers in the inode aren't enough, we need more of them solution: allocate more blocks of direct pointers (why a block? b/c we don't allocate smaller units on disk) next set of blocks: "indirect pointer" --> points to a block of direct pointers following that: "double-indirect pointer" well, we couldn't fit enough indirect pointers, so allocate a whole block of them and each of those indirect pointers can have a whole block of direct pointers they point to following that: "triple-indirect pointeR" result: tree structure --- how much space? direct pointers = each points to 1 block space that can be pointed to: # direct pointers * block size overhead: 0?? --- we have to allocate the whole inode anyways indirect pointers = each points to 1 block of direct pointers space that can be pointed to: [# indirect pointers * (# direct pointers / block)] * block size ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ indirect pointers are equivalent to this many direct pointers overhead (space for metadata extra): # indirect pointer * 1 block double-indirect pointers: space that can be pointed to: [# double-indirect pointers * (# indirect pointers / block)] * [ space per indirect ptr ] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ double-indirect pointers are equivalent to this many indirect pointers overhead: # double-indirect pointers * 1 block + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ space for the indirect pointers [# double-indirect pointers * (# indirect pointers / block)] * 1 block ^^^^^^^ overhead per indirect ptr in a file: we don't have to have all the pointers be non-NULL save space by not storing the extra pointers (and of course, not storing data that isn't in the file) another overhead: we store whole blocks (or maybe fragments) even if the file is not a whole number of blocks/fragments large - FS snapshots ~ user point of view: two (or more) copies of the filesystem which represent different versions of the filesystme but they save space when they hvae files in common backups take the form of "make one of thse copies" ~ implementation: copy-on-write if the two copies are exactly the same: they have exactly the same data if the two copies have one file that's different: they'll have pointers to different inode arrays [*] and only one inode wil be different between the two and the two inodes will have blocks in common where they share data (e.g. if one file was appended) [*] inode array is going to be stored in pieces we'll have an array pointers to those pieces and only the parts which are different between the two copies will have different pointers details we need handle: (a) normal file updates need to make the new copies (b) we need to track reference counts [to do deletion, etc. safely] - quiz today Q1 and Q2 guest OS program <<<< --------------- guest OS <<<< --------------- hypervisor / host OS --------------- hardware "Suppose a guest operating system is running in a virtual machine implemented using trap-and-emulate. A user program running in the guest operating system tries to access memory that is not allocated but will be allocated on demand by the guest operating system. What will happen?" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ WRONG ANSWER: (a) the user program's memory access will trigger a page fault in the host operating system CORRECT (b) then that page fault handler will update the ^^^^^^^^^^^^^^^^^^^^^^^ host OS page fault handler guest operating system's page table, then resume running the user program" NOT CORRECT b/c the hypervisor doesn't update the guets page table itself the hypervisor doesn't know that the memory is considered alllocate-on-demand by the guest OS guest OS policy NOT hypervisor policy WRONG ANSWER: (a) the user program's memory access will trigger a page fault in the host operating system, then that page fault handler will cause the guest operating system's page fault handler to run in kernel mode ^^^^^^^^^^^ problem: hypervisor NEVER runs the guest oS in kernel mode with normal trap-and-emulate (except maybe with HW virtualization support) CORRECT ANSWER [not an option] the user program's memory access will trigger a page fault in the host operating system, then that page fault handler will cause the guest operating system's page fault handler to run host OS page fault handler ends up delegating the guest OS page fault handler "reflecting the exception" Q2: virtual -> physical -> machine last-level guest page table entry PPN 0x40 what does the shadow page table entry corresponding to this look like ^^^^^^^^^^^^^^^^^^^^^^^---- substitues for the guest PTE in the hardware PT physical page number 0x40 is what machine page number? guest physical memory starts at machine address 0x1000000 machine page number 0x1000 (4K pages) machine page offset 0x0 (12-bit page offset b/c question said 4K pages) physical page 0x0 is machine page 0x1000 physical page 0x1 is machine page 0x1001 physical addr 0x1000 is machi adddr 0x1001000 physical page numbber 0x40 is physical address 0x40000 to 0x40FFF - device controller versus driver device controller: the hardware that's attached to the processor (via some bus) can trigger interrupts device driver: the part of the OS that knows how to communicate with the device controller respond to interrupts ("bottom half") - communicating with devices: programmed v DMA how does the device controller get fed data or send data to the OS programmed I/O simplest option: the processor sends the data on the bus to the device this means the OS needs to have code that explicitly copies to or from the device means the device controller probably has a buffer with a physical memory address example: keyboard controller: read from the "current keypress" buffer disk controllers: copy to/from disk controller buffer into our page cache requires a buffer on the device controller direct memory access (DMA) doesn't make sense for keyboards, etc. which don't transfer much data another option: the device is on the memory bus and stores/loads data from memory at a location the OS specifies disk controller: OS says "my cache will be at this address, load into it" more efficient: device controller can use memoyr as its buffer, so no extra copy needs to happen, put where OS wants as received from device? more efficient: processor can be running other code while the copy happens and as long as the code doesn't use memory "too much" will run just as fast requires device controller to take memory addresses from the OS device driver needs to allocate memory to the device still communicates with the device controller setup memory addresses figure out whether read/write hapened yet? figure out if there's an error? - page replacement polices - main family: least-recently used and approximations "not recently used"` usually can't do exact least-recently used b/c you'd have to update a data structure (like a linked list) on every read+write to memory approxmations based on "was it accessed since I last checked?" OR "mark as invalid briefly and see if it gets accessed?" second-chance: keep an ordered list of pages check if it was accessed since it got on the list? yes --> put back on list no --> this page hasn't been accessed in a while SEQ: keep two ordered lists of pages active --> assume probably used inactive -> guess possibly unused move pages from active to inactive check if it was accessed while on the inactive list? (perhaps check before it gets to the bottom) if accessed -> move back to active list if not -> okay to reuse, it hasn't been accessed recently hope: only scanning a smaller list of pages for accesses don't assume a page accessed a long time ago isn't safe to evict CLOCK: general idea of periodically scan+clear "was it accessed" information and do something with recent history for each page - secondary family: heuristics for cases where least-recently used is wrong scanning files: Linux's heuristic was "allow things to be evicted if they aren't accessed twice" (and make it from top to bottom of the inactive list) "accessed twice" --> either with some interval in between (I don't know if linux does that?) or because it's done via read() calls <<<<<< - pros and cons of FSs - FAT filesystem pros - simple to implement - okay with very constrained memory? - FAT filesystem cons - seeking is very slow - reliablity???? - doesn't help keep file and directory data close to each other - always have FAT pointers at begining of disk - no FS support for keeping directory entries close to the files they refer to, etc. - FFS-like filesystem - avoiding allocating whole blocks for small files (fragments) - block groups (keep file + directory closer to each other) - have all the free-block information in one place in more compact format? - other FS featuers that are helpful - extents - pro: more efficient for large files - con: more complicated to allocate effectively more complicated to seek through - trees for directory - pro: more efficient to find a particular filename in a directory - con: more complicated to update/scan/etc. - journalling /logging - pro: recovery from failures - pro: faster writes - con: more complicated to implement - con: doesn't eliminate data loss entirely