- physical memory versus disk versus virtual memory etc. virtual --- what programs see physical --- what RAM and caches see page # --- top bits of address page offset --- locatoin within page (same idea as block offset in caches) - multi-level page tables divide VPN into pieces use first piece to lookup page table entry, BUT page table entry gives location of *next* page table if all entries of second-level (or third-, etc.) table would be INVALID, instead just mark higher level table entry as INVALID if everything is used, takes more space: still need a last-level entry for every allocated page AND need entries for higher-level page tables - which cycle are we at given a pipeline - exam 2Z # 8 - which hazards require stalls? - in our 5-stage processor: - RET - load/use (produce value in M; use in E in next instr) - general procedure: - draw timeline - mark where value is produced - mark where value is used - if line between marks is not going forwards -- add stal (line also tells you where forwarding can happen) - cache blocking - innermost loops over a small "block" of data - SHOULD fit in cache - "natural" order would have less locality SOMETIMES - previous outermost loop would have bad locality - now we have an inner loop that varies the same thing a little bit --- more spatial and/or temporal locality - applicable when: loop orders aren't just strictly better - e.g. multiple arrays with different indices (like in ROTATE --- best for dst != best for src) - signals - software simulation of exceptions - instead of HW noticing -- OS notices SOMETHING - has program run handler for it - versus exceptions: HW notices SOMETHING - has OS run handler for it - HW doesn't know about signals ---- - typical API for signals: register a signal handler (signal(, ) OS calls ("signal handler") when our process gets a signal signal handler is very limited - INTERRUPTS random part of program - therefore, can't change state program might be changing (unles you're really careful) - functions like printf(), malloc() have program state they change --> can't use these functions if we might've interrupted them addt'l APIs (which can help above problems): - BLOCKING signals "please, OS, don't send me this now!" while the signal blocked, won't receive BUT OS tracks "pending" signals (- also APIs for "polling" for signals "tell me what signals are pending now and make them not pending") - pending signals (typically) OS has one bit for each *type* of waiting signal multiple signals of one type --- only one instance may be deliverdk - loop unrolling - repeat loop body X times but one instance of the "loop overhead" "loop overhead" - index variable maintaince - are we done condition check? - works BEST with loops with a fixed number of iterations - works WELL with loops with large, but arbtrairy # of iterations: BUT we need extra code to handle "left over" iterations N iterations, unrolled X times N % X left-over iterations - fork() -- called once, returns twice - fork() copies the program context: Program Parent: [%rax = 42, %rsp = 0x455554, ..., PTBR=...] Program Child: [%rax = 42, %rsp = 0x455554, ..., PTBR=*...] (* memory + page tables copied?) after it copies the context, it modifies the context (e.g. change stored value of %rax) to return from fork() in Parent: returns the ID of the child in Child: returns 0 then Parent and Child are run like normal proceses, they context switched to, and return from fork() - the synonym problem and solutions - for performance, we'd like caches to store Virtual Addr - no address translation USUALLY - but trouble w/correctness: - multiple virtual addreses can have same physical addr - modify one virtual address in cache, SHOULD modify the other (otherwise, behavior changes depending on cache hits) - simplest solution: DO ADDRESS TRANSLATION FIRST - second-simplest solution: - virtually-indexed, physically-tagged - Virt. Addr: [ VPN ][ Page Off ] - Phys. Addr: [ PPN ][ Page Off ] - NB: exactly the same Page Off - Phys. Addr: [ C. Tag ][C.Index][Off] - IF C. Index only overlaps with Page Offset, then we don't need to do the full translation to acces the correct SET - that part of Virt. Addr same as physical - TRICK: - [index] access cache set with page offset (from *virtual addr*, because it's convenient) (same result as using phys addr) - and IN PARALLEL find the PPN - [tag] then check cache tags with the Phys. Addr. - Condition: C. Index must be w/in Page Off # sets * block size <= page size # set index bits + # offset bits <= # page off bits - ALMOST all current procesors do this - Why Intel has high associativity on L1 cache (reduce # of sets) - harder solutions: - don't map the mutlple VAs to same PAs (OS could do this) - OS needs to make sure more bits of VA and PA match by contolling what page numbers it uses - scan for synonyms explicitly in the cache even though this means accessing multiple sets (on write) - F2016 Final #2 - Cache versus TLB - Cache: (usu Phys) Byte Addr -> Data at that Address - TLB: Virtual Page #s -> Page Table Entries (NOT DATA) - typically NO index bits (each block is one PTE) -