# Caches - cache sets - tag/index/offset - offset: which byte in block - 2^k bytes in a block? k bits to know - index: which set in cache - 2^k sets in a cache? k bits to know - tag: everything else - know that address used to store matches addresses being looked up - last post-quiz 3-way sets/blocks - 1.5 MB 3-way cache with 64 byte blocks - addresses: (tag) (index) (offset) - 64 byte blocks = 2^6 byte blocks --> 6 bits for offset - 1.5 MB 3-way --> .5 MB/way --> .5 MB / 64 B of blocks per way - 2^19 / 2^6 blocks = 2^13 blocks --> 13 bits for the index - tag: everything else - 0x12345 <0><001 0010 0011 01> <00 0101> - same set? same index - same block? same index and tag - conflict misses versus other kinds of misses - 3Cs: compulsory, capacity, conflict - compulsory/cold: first time something is accessed - capacity: the cache isn't big enough (would it be a miss in a fully-assoiative cache? fully-associative: associativity = # blocks) - conflict: two or more things map to same place, one of them had to go (but there would've been enough room if we could've stored them elsewhere in the cache) - implementing LRU - least recently used --- what we want evict - ordering for blocks in set - needs to be stored (in tag store) - perfect LRU: need to store exact order - block versus line - line: block + metadata (valid + tag bit) - some sources: block = line - write-back versus write-through - write-through: when you get write, do it immediately to memory - write-back: when you get write, store locally - do it to memory only when you would otherwise lose the value (on replacement) - extra state: "dirty" bit -- is value not set in memory yet - cache block size increasing effect? - larger blocks: more adv. from spatial locality, but more conflict misses (more conflict misses since less sets to choose from (or blocks/set)) (more conflict misses because bringing in more data with block you don't use) - larger blocks: slower to read/write from next level - larger blcoks: slower to read/write from data array? - which cache has largest blocks? - can you find out the # offset bits? (knowing # tag or index bits probably helps) - N-bit addresses: fully-assoc. 6-bit tags, 128 lines (= blocks) 1 set --- 0 index bits N-bit address: (6-bit tag) (0-bit index) ((N-6)-bit block) = 2^(N-6) byte blocks - N-bit addresses: direct-mappedc. 6-bit tags, 128 lines (= blocks) 128 sets -- 2^7 sets -- 7 index bits N-bit address: (6-bit tag) (7-bit index) ((N-7-6)-bit block) # Misc. - matrix squaring -- what is it? B = A^2 (A, B matrices) B_{i,j} = sum(for all k: A_{i,k} * A_{k,j}) A_{row,col} for (int i = 0; i < N; ++i) { for (int j = 0; j < N; ++j) { for (int k = 0; k < N; ++k) { B_{i,j} += A_{i,k} * A_{k,j} // good temporal locality B_{i,j} // good spatial locality in A_{i,k} (if rows stored together) } } } # Pipelining - ret stalling and other hazards at the same time jle F D E M W ret F D E M W ??? F F - counting stalls quiz question(quiz 13) # OOO - out-of-order -- what is it? - run independent instructions while waiting for slow instruction (e.g. cache miss) - reorder buffers