- system calls versus context switches in xv6 and similar OS designs, context switches only occur within kernel mode needs to change kernel data structures needs to access other programs' data (which can't be done by user code so they can't mess things up) before a context switch: always enter kernel mode could be via a system call could be via something else (timer interrupt, I/O, etc.) after a context switch: usually leave kernel mode (and go back to application/user code) some system calls might trigger a context switch but that's just part of what service the system call is requesting e.g. "read" --> might wait for input wait for input --> switch to another program until there is more input - user and kernel stack during context switch in xv6 and similar OS designs we don't decide whether we're doing a context switch until we enter the kernel have GENERIC trap (exception/interrupt/system call/etc.) handling code that saves user registers xv6: user registers saved to kernel stack and one of those registers points to user stack used if we just want to run som kernel code using the same registers e.g. system call handler will use eax, esp, etc. need to remember what their old values were just like for a function call saving registers happens to save user registers before we decide to do a context switch so no need to have extra code to handle them for the context switch in the kernel, still need to save the kernel registers xv6: swtch() function does this to the kernel stack (and reads the new ones from a different kernel stack) typical usage of swtch(): called from the scheduler called from function that just switches to the scheduler xv6 "context switch" --> probably switch to scheduler then from scheduler to other kernel stack: also where temporary variables used by code running in the kernel (and saved registers, return addresses, etc. everything you learned about being on the stack for a normal program, but just for the code running in the kernel) - scheduler types CPU throughput (= work per unit time): FCFS simplest scheduler doesn't context switch unless needed minimum average (per thread) turnaround/waiting time: SRTF (under assumptions that context switch overhead isn't significant) * can't actually implement SRTF: so use approximations like MLFQ ? maximizing I/O throughpout: something like SRTF make sure programs get to do their I/O operatoins quickly intutiion about SRTF: "race to sleep" fairness -- medium-term: CFS ("Completely Fair Scheduler") order seconds to tenths of seconds?? (depends on parameters) not like milliseconds/microseconds --- doesn't try to context switch that much and tries to take into account more history not like minutes/hours --- doesn't want to penalize programs based on what they were doing that long ago took into account recentish CPU usage ("virtual time") tried to run program most deprived of recent CPU usage fairness -- time until you run: round-robin or similar everyone runs periodically with roughly the same interval simpliest preemptive scheduler real-world deadlines: earliest deadline first or ???? favoring certain users: priority other goals: ??? new scheduler --- probably built from priority or lottery or similar - reader/writer locks - many data structures: can safely read the data strucutre from many threads at once but only if no thread is trying to modify it - how can we guarentee the "but only if" part: (1) could use locks or reading OR writing but if we do this, then we have to take turns readin this is correct (won't violate the safety rule) but slow - reader/writer locks try to solve this taking turns problem by extending a lock to have two ways of being locked: - ReadLock: registers a thread as a reader wait until no writers [veruss normal lokc: wait until no other threads] - WriteLock: registers a thread as a writer wait until no readers+writers [same as normal lock] by having less restrictive conditoins for waiting, can do more things at once - when implementing reader/writer locks, need to make a decision about what what to do if it a reader and a writer is waiting --- who goes first? - monitors - way of building data structures that wait for things to happen sometimes - lock + shared data + condition variables - lock: always take turns accessing shared data don't have to worry about being confused about the state of data (example: if shared data is for rwlock: not confused about whether there are 0 or 1 threads waiting to write) - shared data: can include whatever data we're manipulating (a queue of work to do) can include information about who's waiting so we can enforce conditoins about when to wait - condition variables: usage: one condition variable for everything we wait for vvvvvvvvvvvvvvvvvvvvvv-------based on the shared data means shared data has to include bookkeeping info needed to know when we wait while (we still need to wait) cond_wait(some_cv, ...); ... if (some thread might no longer need to wait) cond_signal or cond_broadcast(some_cv) signal: guarnetee that only one thread cares e.g. if a signal for each item I add to a queue, at most one thread cares about each one, so it's safe e.g. for reader/writer lock, only one writer is allowed at a time, so I can signal a writer rather than broadcasting broadcast: otherwise when considering: remember what you signal might not run next e.g. typically wrong the signal when queue has a praticular size thread A: thread B: thread C: thread D: add to queue wait wait queue size = K --> signal C unlock add to queue queue size != K run still waiting not what you'd expect if you thought that "while queue size is K and someone waiting, have them handle the item right away" - spurious wakeups = cond_wait returns when nothing happens POSIX allows spurious wakeups just to make implementing CVs easier probably happen mostly when one CV is manipulated in parallel from two threads??? justification why this is not big problem with POSIX's design usually you have to worry about (w/o spurious wakeups) some thread acquiring the lock in between when you wake from waiting and when you get the lock again example: thread A waiting in Consume for antoher item on queue thread A woken up via signal from thread B thread C gets lock thread C dequeue thread C unlocks thread A gets lock thread A shouldn't dequeue (even though it was woken up) not every monitor-using code has this sort of scenario but it's usually a reason why you can't avoid the while loop anyways this scenario would not be possible with Hoare-scheduling ("hand off mutex" on signal) "hand off mutex" --> gaurnetee what is signalled gets lock next (but no implements that) - VA -> PA - step one: split the virtual address into pieces [ virtual address ] [ virtual page number ][ page offset ] ^^^^^^^^^^^^^^^^^^^^^---- enough to identify which byte of a page so if page size = 2^12 bytes --> 12 bits [ VPN pt 1 ][ VPN pt 2 ][ page offset ] ^^^^^^^^^^^-------------------- enough to index into the first-level page table which is an array of page table entries so if 1024 (2^10) entries in the first-level page table --> 10 bits - step two: lookup a page table entry in the first-level use the page table base pointer = beginning of array of 1st level PTEs access that_array[VPN pt 1] that_array + VPN pt 1 * sizeof(1st level PTE) the 1st level PTE will contain: * a valid/present bit --- if not set, then stop! * metadata bits (readable, writeable, etc.) (stop if program not allowed to do access) * physical page number - step three: use first-level page table entry to lookup in second-level same as step two but: use the physical page number we found in step two converted to the byte address of the BEGINNING of that pgae as the page table base pointer concatenate with zeroes to fill in page offset or multiply by page size unit conversion: pages --> bytes use VPN pt 2 instead of VPN pt 1 - step four: use second-level page table entry to get the pA find physical page number from second-level entry concatenate with the page offset from the VA result --> final physical address - setting up page tables for exec() - what xv6 does, other OSes might differ - need to setup: - kernel part of the page tables every process has a kernel memory as part of its page table (set with metadata so it will only work in kernel mode) why? avoid changing page tables to let kernel handle exceptions/system calls/etc. * hard-coded list of page table entry ranges to setup generate page table entries with for loops - allocate memory for the program's code+data read from the executable information about: where should the program be loaded (and how much) setup that much space (basically with a for loop) - copy from disk into that allocated memory read from executable information about: where to load data from on disk - for setting individual page table entries in xv6: use the same kind of procedure we did for VA -> PA but stop when they figure out where the secon-level PTE would be and then change second-level PTE (and if they are stopped at the first-level PTE, they'll change that first) - allocating physical pages and coordinating with the hardware xv6's strategy: learns on boot (hard-coded, real OS: some table in the HW) how much memory there is has a data structure that tracks which parts of that xvy is snot using uses that data structure to find memory it can use for page tables + programs telling the hardware about the memory: xv6 sets the page table base pointer - shared bus / cache coherency (invalid/shared/modified) - shared bus: simplest way of connecting processors + memory together one common communicate line advantage: every processor gets to see all accesses to memory - to keep caches in sync want to maintain some invariants: - at most one cache has a modified copy of some data * to ensure this: if we have the modified copy: then write it back before anyone tries to read/write it we know if someone tries to read/write because they will have to ask on the bus for a copy if we want to modify something: then tell every cache to get rid of any copy - before a cache forgets about a modified value, it writes it to memory - problem: we don't want to tell everyone every time we write usually they won't have a copy anyways so this is a waste solution: remember that we've already told everyone to get rid of their copy so we know we don't have tell them proactively "Modified" state - false sharing cache coherency: gaurentees one cache at a time has modified data problem: data is kept in larger blocks in caches than your progrma access caches will take turns when accessing any part of the block --> lower performance if two threads manipulate different parts of the same cache block for example: two adjacent array elements often in the same cache block typical solution: keep the data further apart - path of a system call in xv6 [user functoin to invoke system call] "system call wrapper" [xv6: usys.S] make it so we can use a C-like (or other PL) interface even though a system call uses a special intructoin also handles setting up for the calling convention our OS has chosen: xv6 system call calling conventoin: arguments are on the stack (like a normal functoin call) eax says which system call to do --- kernel mode starts here --- [generic TRAP handling] [xv6: trapasm.S, traps.c] xv6's choice: smae code for all exception to handle saving registers, etc. saves user registers so we don't have to worry about the kernel overwriting those values saves what caused the trap for later [system call dispatcher] [xv6: syscall.c] based on the trap being a system call, figure out which system call implementatoin to run xv6: use saved eax value to lookup actual system call implementatoin [system call implementation] [xv6: sysproc.c, sysfile.c] actual code that implementation the particular service requested (reading, writing, killing a process, ....) needs to retrieve arguments xv6: reads from the user stack b/c convention says arguments are there and does system-call-specific operatoin then return, going backwards from bottom to top [system call dispatcher] stores result from system call implementatoin [generic TRAP handling] restores user registers that were saved --- kernel mode ends here --- [wrapper function] returns to the functoin call the application made - dup2(from, to) PCB->files[to] = PCB->files[from] now fd to is the same as fd from but close(from) PCB-> files[from] = NULL, so that won't change the "fd to" but dup2(somethign_else, from) PCB->files[from] = ..., so that won't change the "fd to"