# Pipelining
- stalling for slow execution units
- identifying when you need to forward
- load/use hazard:
    - (computed in) memory -> (used in) execute only solved with stall + forwarding
    - memory -> memory is solved with forwarding only

# Cache / Memory
- direct-mapped versus associative
    - diret-mapped = associative with associativity 1 (one block/set)
    - fully associative = associative with associativity <number of blocks>
- tag/index/offset and same set/block/etc.
    - 11/001/1  -- tag/index/offset
        - know sizes by # bytes/block (offset), # sets (index), left over (tag)
            (unique bit pattern for each byte, set)
        - same index: same set
        - same index + tag: same block
    - quiz question
- parallel vs serial cache accesses to a SINGLE CACHE
    - serial option: check tag for match before looking at data
        - slower but uses less power (no lookup on miss)
            - maybe don't fetch all blocks in set
    - parallel option: check tag for match while fetching data
        - faster but uses more power (start data read earlier)
- multiple of levels of cache/memory
    - access level 1
    - if that fails, then access level 2
        - almost always serially
- caches and processor
    - latency for memory stage: if cache  hit
    - if cache miss: pipelined processor will stall
        - wait for "done?" signal
- write-back versus write-through
    - write-through: always send writes to memory IMMEDIATELY
    - write-back: send writes to memory when you'd forget about value otherwise
        - (on replacement)
        - extra state: "dirty bit" -- do we need to write this eventually
            - one bit per block
- DRAM vs SRAM: speed, physical difference 
    - DRAM: denser (more bytes/size) and slower and cheaper --- main memory usually
    - SRAM: less dense and faster and more expensive --- caches usually
- tech trends: CPUs faster >> memory
    - we didn't cover it
- lines versus sets
    - book: line = block + metadata
    - other sources sometimes: line = block
    - set: has multiple blocks/lines (if set-associative cache)
- AMAT -- average memory access time (overall cache performance metric)
    - hit-latency * hit-rate + miss-latency * miss-rate
    - hit-latency + miss-PENALTY * miss-rate
        - miss-penalty = miss-latency - hit-latency (extra time for a miss)
        - different optimizations improve different parts

# Cache Performance/Locality
- cache blocking (access order in for loops)
    - loop have temporal locality? --- access something repeatedly
    - loop have spatial locality? --- access (nearly) adjacent things consecutively

# OOO
- precise exceptions and reorder buffers
- data flow model