- assembly -> object file -> executable ~ assembly: * human-readable names for instructions, registers, constants, etc. * not usable at all by the processor ~ object file * assembler has done everything it can to prepare values for actually being used by the processor ~ instructions, register IDs, etc. ---> machine code form ~ constants are ready to load into memory, etc. * but the assembler can't prepare everything b/c we don't know where this bit of assembly will be used ~ need to remember where code/variables/etc. that might be used elsewhere are v correspond to "name:" in assembly ^ > symbol table entry: something else might refer to this location ~ still need to fill in references to code/variables/etc. that are elsewhere corresponding to using a "name" that is defined elsewhere e.g. "jmp name", "call name", "mov $name, %rax", "jmp 0x10 + name(%rax, %rcx, 8)" > relocation: indicates we need to fill in an address or offset later on ~ even within the same file, we may need to fill in the addresses of things when we don't know where they placed yet ~ executable * should generally be ready to load and jump to * the _linker_ needs to take a bunch of object files and process every relocation (using the symbol table entries) to fill in missing addresses/offsets - doing multiplication using lea - in general on x86-64: 0x1234(%rax, %rbx, 4) ---> the value in memory at 0x1234 + %rax + %rbx * 4 - lea SOMETHING-IN-MEMORY, DESTINATION, sets DESTINATION to the memory address of SOMETHING-IN-MEMORY compute a memory address, and put in the destination rather than actually using the memory --> so it doesn't actually matter if SOMETHING-IN-MEMORY is at a usable address - lea (%rax), %rbx --> same as mov %rax, %rbx X lea %rax, %rbx <--- WILL NOT ASSEMBLE b/c lea needs something in memory as its first operand - lea 0x4(%rax), %rbx --> same as { mov %rax, %rbx; add $4, %rbx } - lea (%rax, %rax, 1), %rbx --> same as { mov %rax, %rbx; add %rax, %rbx } { mov %rax, %rbx; imul $2, %rbx } - lea (%rax, %rax, 2), %rbx --> same as { mov %rax, %rbx; imul $3, %rbx } [ rbx <- rax + rax * 2 ] - Quiz 5 Q1 (SEQ PC register) executing mrmovq 10(%rax), %rbx on the single-cycle processor While the address 10(%rax) is being computed for the above instruction, the program counter register will output ____. the address of the mrmovq instruction single-cycle processor at the rising edge of the clock, the PC changes to output the currnet instruction's address (and as any register, it keeps outputting that until the next rising edge) then during the clock cycle (before the next rising edge) we read all the values for the instructoin and do all its calculations <<< 10(%rax) address is being computed and have the results ready on the inputs to { PC register, register file, data memory, ... } - throughput and pipelining when we do pipelining, if we have N-stage pipeline, ideally we start stage 1 an instruction every cycle and finish stage N of an instruction every cycle (and the same for every stage in between) --> ideal throughput = 1 instruction / cycle problem: sometimes we can't actually do this usually b/c of dependencies between instructions (that we don't solve with a "fast" mechanism like forwarding or correct prediction) so instead we'll either * not a start a new instruction some cycles * have to discard something we started but shouldn't have (mispredicted instruction) e.g. if we can't start a new instruction 10% of time (and no misprediction), then we'll get .9 instructoins/cycle - dirty bits - write-allocate policy: if we don't have a value in our cache and want to write it: this policy says: add it to our cache - write-no-allocate policy: if we don't have a value in our cache and want to write it: this policy says: don't add it to our cache > BUT: still MUST not forget the value, so we'll send it to the next level instead - write-back policy: if we have a value in our cache and want to write to it, this policy says: we don't update the next level, but update the cache > BUT: still MUST maintain that we don't forget about the updated value how could we forget? ~ we could store a value from another location in the same block solution: ~ we'll write a note to ourselves to "please write this to the next level IF we'd discard it" --> note is a dirty bit (1 if "we need to write it", 0 otherwise) - write-through policy: if we have a value in our cache and want to write to it, this policy says: we update the next level and the cache - OOO and jumps - out-of-order processors typically do a lot of branch prediction ~ means that they can find more instructions to try to run - if everything is predicted right and no exceptions/etc., then out-of-order processor should start several instructions every cycle - details you won't test on: - if something is predicted wrong, a "simple" stragety to deal with it: commit stage tracks the mapping from architectural -> physical registers as of the first instruction that's has completed AND that has all the instructions before it has completed ~ when this instruction was mispredicted, we just copy the register mapping back to the rename stage and reset the PC > optimizations: can we do this before this instruction is completed - conclusion of this: - recovering from a misprediction is a lot more complicated than a pipelined processor - means that it may take order 10 cycles - and in order 10 cycles we can run order 40 instructions - versus pipelined processor: we'd miss out on about 2 or 3 - Quiz 15 Q1 (allocate on demand) "In order to configure a memory region to be allocated on demand, the operating system will set page table entries ____." > configure to be allocated in the future we need to get a page fault when there's "demand" to make sure a page fault happens, we need to have an invalid page table entry for the address the program will "demand". - page table permissions and multiple levels - most processors check the permissions at each level what about TLBs? --- we can either require the OS to make the permissions consistent at the lower levels (e.g. you aren't allowed to have a writeable second-level entry pointed to by a read-only first-level) OR we can edit hte page table entry the PTE before we store in the TLB --- - bit-masking bitwise operators like &, |, ^ take two numbers and do an operation between ever pair of bits in the same place (1's, 2's, 4's, ...) can think of & 000011100 as saying "clear bits 0, 1 and 5 through ... and keep bits 2-4 b/c bits 2, 3, and 4 are set to 1, and all the others are 0 and AND with 1 --> keep bit same; AND with 0 --> clear bit 000011100 is a mask specifying to do something on bits 2-4 we can apply the same idea to | --> something = "set the bits (to 1)" we can apply the same idea to ^ --> something = "flip the bits" - squashing discarding instructions that were run incorrectly we guessed that these were the right ones to run, but they weren't in pipehw2: we figured out that the instruction wasn't the right one to run before it did anything besides having temporary values in pipeline registers --> change the pipeline registers to have new values (values for nop) in many real processors, we might have to undo some other operations --> change the value of registers that have been written --> stop values from being written to cache/memory --> change back condition codes to their previous value ... - Quiz 14 Q1 (exceptions) where in memory can we put the exceptoin handler? > asking for a place to put CODE that the OS needs to run exception handler = code that gets run when an exception happens > so this needs to be executable space > but we don't want normal programs to edit the exception handler (maliciously or by accident) > want it to be kernel-mode only and/or read-only - multi-level paging and VPN part lengths - length of a VPN part for multi-level lookup = log_2(# entries in the table at that level) - usually processors try to make page table sizes one page, but that's not always possible - when that's not possible, one of the levels is going to have a different-sized table (usually the higher level has the smaller table from what I've seen) - when we figure out the base address for level K+1, we typically get a *physical page number for level K vvvvvvvvvvvv--- 0s for "beginning of that page" BUT we want a physical address: [physical page number][page offset]` ^^^^^^^^^^^^^^^^^^^^--- from the page table entry at level K - Quiz 15 Q5-6 (TLB) - vitrual address 0x52345 and we have 16384-byte 2^14 pages 0x52345 >> 14 (to remove the page offset) == 0x14 --> virtual page number - accessing index 0x0 in the first-level page table (VPN part 1 0x0) - acessing index 0x14 in the second-level page table (VPN part 2 0x15) --> VPN 0x14 --> that's what the TLB sees as input to lookup/store the entry found - 32-entry, 2-way TLB, 32/2 = 16 sets --> 4 index bits to identify a set 0x14 VPN used to lookup a set in the TLB and find the tag 0001 0100 ^^^^-- set index 4 ^^^^ ------ tag 1