- static v dynamic libraries ~~ how to build / general diff ~ when we make an executable we combine a bunch of .o files into a single loadable file ~ and with STATIC LIBRARIES: ~ we also put the static libraries in that loadable file ~ and with DYNAMIC LIBRARIES: ~ we put a reference to the dynamic library in that loadable file and when it runs, it loads the dynamic library from another file - building static libraries: on Linux, static libraries are an archive (like a zip file, but different format) of .o created with "ar" - building dynamic libraries: on Linux, dynamic libraries need to have their assembly written to handle being loaded at different addresses (normal executables can hard-code addresses in their machine code) to make the compiler generate compatabile assembly we need to pass something like -fPIC when *compiling* [= producing the .o files] - important compiler flags ~ different modes the compiler has: compiling [converting C into .o files (binary version of assembly)] requested with "-c" linking [combining .o files into an executable or dynamic library] default mode (need "-shared" to produce dynamic library) if linking command starts with .c files, it will compile those first ~ example: clang -c foo.c --- compiling clang -o exec foo.c --- compiling + linking clang -o exec foo.o --- linking only ~ with gcc/clang, you specify the output file with "-o SOMETHING" clang -o something.o -c foo.c compiles foo.c into something.o ~ -fPIC --- produce assembly suitable for a dynamic library (added when compiling) ~ library flags: -LSOMEPATHA -LSOMEPATHB -lname looks for libname.a or libname.so (Linux names) in SOMEPATHA and SOMEPATHB - C function returns most C standard library functions return a value to indicate an error if they don't have anything else to return sometimes C functions "want" to return multiple values: * waitpid() wants to return the PID it waited for (or that it failed) _and_ the status it got that process ~ it does this by having you pass in a pointer to where to put the status and returning the other value * posix_memalign() wants to return the address allocated and what kind of failures occurred ~ it does tihs by having you pass in a pointer to where to put the allocated address - set-user-ID , sudo - set-user ID is a bit we can set on executable files with chmod [stored alongside the "normal" file permissions] chmod u+s - if set, when we exec*() this program, the system changes the effective user ID to the user ID that owns the file (after making sure you can execute the file) - example: root can have a program that prints to a printer that normal users don't have access to if it's marked setuid, any user can run this program and succesfully print (provided the permissions let them execute it) if it wasn't setuid, they could run the progrma, but it would get permission errors accessing the printer, since it would still run with their user ID - sudo is a very common set-user-ID program that is used to let people run things as administrator if configuratoin files say its okay ~ example of implementing a policy about who can do things without the kernel knowing about it ~ kernel just runs sudo program with effective user ID of root ~ sudo program has its own logic that decide what to things to do after that - set-user-ID bit is usually stored in the chmod bits before the user bit [but we're not expecintg you to know that] - I/O handling and how the OS runs generally: the OS needs to be running to talk to an I/O device common scenarios: ~ input we don't need to wait for: [earlier] the hardware device tells the processor "I have input" --> triggers an exception --> OS runs and collects that input while some unrelated program is running ... [ time passes ] ... program requests input --> triggers an exception = system call --> OS runs and figures out it collected that input earlier --> OS sends that input to the program and resumes running it ~ input we do need to wait for program requests input --> triggers an exception = system call --> OS runs and figures out it doesn't have the input the program wants right now --> OS will run something else ... [ time passes ] ... [later] the hardware device tells the processor "I have input" --> triggers an exception --> OS runs and collects that input --> (OS *might* decide to switch to the program that was waiting for it) ... [after that] the OS resumes program that requested input with its input --- ~ output we don't need to wait for: program requests output --> triggers an exception = system call --> OS sends the output to the device --> OS resumes the program ~ output we do need to wait for program requests output --> triggers an exception = system call --> OS realizes the device can't accept the output now --> OS keeps track of the output for later and possibly returns to the program/another program ... [time passes] ... [later] the hardware tells the processor "I can accept more output now" --> triggers an exceptoin --> OS runs and sends the output it stashed earlier --> OS goes back to running some program - when context switches and exceptions - context switch = switching from one thread/process to another does not include system calls/function calls --- it's the same thread, even though we have different register values while running function/syscall - exception = OS runs to handle something - a lot of somethings: ~ system calls = request from a program ~ instructoins in program that hardware can't handle itself: ~ privileged instructions ~ accessing I/O devices (typically) ~ changing the page table base register ~ changing how exceptions are handled ~ ... ~ divide-by-zero ~ page fault (accessing a virtual page that is invalid) ~ ... ~ events from an I/O device (or something pretending to be one): ~ keyboard, network ~ timers "events" = anything where the OS needs to do something and might not already be running (input, output) - kernel slide 11 ~~ reason why we can't directly allow calls to kernel stuff ~ system call mechancism passes a system call number to identify the operation to do example: read, write ~ why can't this just be like a special function call, where we just specify the function to run ~ there are a bunch of functions in the kernel that program should not be able to run directly ~ we deliberately choose our system call interface to have a very limited set of functions ~ example in slide: ~ it's okay for the user program to run the function that reads from a file after checking permissions, but it's not okay to run the function that reads without checking them (even though that second functoin is probably in the kernel) - fork / exec / waitpid ~~ what they return ~ fork() returns twice [because we copied the process] in the parent: pid of the new child process in the child: 0 to let you know it's the child ~ exec*() doesn't return unless it fails if it suceeds, we are in main() in a freshly loaded program ~ waitpid() returns the pid it waited for -- or -1 if it didn't wait for anything [this is because you can pass a pid argument that matches multiple pids] pseudo-returns the "status" of the process waited for by filling in an int you pass a pointer to that status can be decoded with W*() macros like WIFEXITED(...) - flags for opening/reading files open(path, flags, chmod-style-octal-permissions) --- and the chmod-style-octal-permissions are and'd with the complement of the current umask value on portal the default umask is 022 open() takes flags which OR'd together (bitwise OR operator | ) ~ read/write/both: O_RDONLY/O_WRONLY/O_RDWR ~ create if not existing: O_CREAT ~ truncate (set length to 0) if exists: O_TRUNC ~ (and more, see manpage) if you're used to fopen(), open() in Python, these achieve the same functionality as the "r", "w", "r+", "a", ... argument to those functions, but different values you need to specify O_WRONLY | O_CREAT | O_TRUNC ~~ "w" - dup2 + close interaction - each process has its file descriptor table table is an array of pointers file descriptors are indices into the table - dup2(X, Y) table[Y] = table [X]; pointer assignment so like assigning references in Java, these both refer to the same thing now - close(Z) table[Z] = null; --- and there's correct/fast garbage collection (so if there's no references to a open file description, it will go away) - /proc/PID/maps and the page table - /proc/PID/maps is the "logical" view of what a program's memory is ~ list of regions of memory the program has - the page table is the "physical" view ~ AKA what the processor itself needs to know - when these disagree, the program will try to access their memory using the page table, and this will trigger an exception (usu. page fault) - Linux uses the /proc/PID/maps information to fil in the page table - result: the effective memory the program sees is /proc/PID/maps (but it might need to have the OS run for that to work) - copy-on-write process - if we didn't have copy on write and we wanted to copy a bunch of memory, we'd make two copies, located in separate pages - with copy-on-write, we make the virtual pages of both copies point to the same physical pages, but be marked as not writeable then, if any program tries to write to those virtual pages, we'll get an exception we'll handle that exception by making a copy of that page at that time - pagetable readability/writeablity - permission bits in page table are checked by the processor looking at what type of operation triggered the access, and comparing to the permission bits - writeable bit = the processor will check this if the operatoin is something like a move to that memory location - exceutable bit = the processor will check this if the operatoin is fetching machine code from that location - ... - (different system have different sets of permissions bits, depending on cpu) - Q4 S2025 midterm ~~ pagetable lookup 1-level page table, page table entries are 4 bytes 2^16 byte pages --> 16-bit page offsets (~ 4 hexadecimal radits) PTBR 0x40000, VA 0x89ABC -- VPN 0x8, PO 0x9ABC (b/c of 16-bit page ofsfset) access index 0x8 in the page table (array of 4-byte values) base + 4 [element size] x index = 0x40000 + 4 x 0x8 page table entry: PPN 0x3 final physical address PPN 0x3, PO 0x9ABC [same as VA] --> 0x39ABC - multi-level page table lookup, why the x page size in our diagrams two-level lookup: VA [ VPNpt1 ][ VPNpt2 ][ PO ] we'll lookup a 1st-level pagetable entry using the base = PTBR , index = VPNpt1 base + index * PTE size then we'll get: PPN of the second-level table, + valid bit + permissions we wnat to lookup in the 2nd-level page table entry using base = PPN of second-level table , index=VPNpt2 vvvvvvvvvvvvvvvv--- bytes base + index * PTE size ^^^^ pages to do this math, we need the units the same, PPN * page size + VPNpt2 * PTE size ----- -Q1 S2025 1b - permissions question: can we imitiate an access control list with chmod-style permissions: for this to work, we need to specify one user + one group + default for every else that matches what the ACL does in this examople, directory A: user:foo:r-x user:bar:rwx group:quux:r-x other:--x if programs running as foo always are in group quux, we could have the file owned by bar, group quux, and chmod with u=rwx,g=rx,o=x permissions if programs running as foo don't always have group quux (or some other group that all programs running with group quux also have), then we can't do this - ACL conflicting entries - in POSIX ACLS - user takes precedence over group takes precedence over other - may be different on other systems (e.g. Windows) --- - mmap with MAP_SHARED: gives you the illusion that the file is in your memory this implemented by making sure everyone else who accesses the file uses the pages that will mapped in your page table (if they're loaded) with MAP_PRIVATE: copy-on-write ~ generalizing mmap is how Linux tracks memory regions