CS 3330: Lecture 14: Understanding Caches

This page does not represent the most current semester of this course; it is present merely as an archive.

B = number of bytes per block
b = lg(B) -- that is, B = 1 << b
E = number of lines per set
m = bits per address
S = number of sets in the cache
s = lg(S) -- that is, S = 1 << s

class Line {
    boolean              valid;
    tbits                tag;
    Word[B/sizeof(Word)] block;
}
cache = new Set<Line, E>[S];

A block has B = 1 << b bytes, but is addressable in words which may be some power-of-two bytes long. There is not a standard symbol for the size of a word.

If E = 1, we call the cache a Direct-Mapped Cache

If S = 1, we call the cache a Fully-Associative Cache

Otherwise we call the cache a Set-Associative Cache

Read Example

Let's explore a cache with E = 3, s = 3, and b = 5. Assume that addresses are 16-bits long (m = 16) and that words are 4-bytes long.

Given an address like 1101001011111001 we can break it down as follows:

11010010     111       110  01  
  tag     set-index  blockOffset

The block offset is the last b bits, and is made of two parts:

the bottom 2 bits are the byte within out 4-byte words
and the top (b − 2 = 3) are the index of the word in the block.

The set index is the s bits above the block offset.

The tag is any bits above the set index.

Our cache would look like

           valid    tag            block (as int[] literal)
         /   0   00010011  { 748, 561, 983, 192, 379, 834, 620, 434}
set 000 <    1   01100000  {  57, 404, 777, -45,  72, 186, 685, 291}
         \   0   10000111  { 877, 640, 613, 112, 542, 356, -70, 644}

         /   0   01001010  { 950,  83, 385, 622,  36, 186, 935, 431}
set 001 <    1   10111000  { 816,  78, 583, 443, 346, 385, 281, -20}
         \   0   01110110  { 456, 796,  70,  56, 699, -83, 118, 727}

         /   0   10111100  { 331, 482, 411, 947, -69, 813, 695, 383}
set 010 <    0   00110001  {  47, 484, 750, 613, 786, 202, 718, 485}
         \   1   01010001  { 784, 538, 805, 595, 332, 539, 334, 689}

         /   0   01100100  { 537, 511, 954, 682, 325, 777, 304, 186}
set 011 <    1   10000111  { 982, 484, 516, 753,  55, 303, -85, 499}
         \   0   00011101  { 326, 238, 720, 567, 973, 824, 552, 992}

         /   1   00000010  { 476, 849, 306, 909, 600, 942, 446, 102}
set 100 <    1   00010100  { 151, 727, 546, 837, 663, 501, 299, 938}
         \   1   11010110  { 188, 114, 609, 134, 876, 594, 241, 331}

         /   1   00001001  { 990, 193, -28,  94, 887, 605, 573, 982}
set 101 <    0   01100111  { 321,  29, 295, 692, -84, 818, 964, 633}
         \   0   10001110  { 577, 872, 150, 487, -91, 331, 139, 852}

         /   1   01011010  { 608, 490, 368, -47, 347,  44, -28, 425}
set 110 <    1   00100011  { -62, 907, 297, 826,  31, 506, 522, 696}
         \   0   00110100  {  82, 770,   8, 345, 619, 578, 493, 167}

         /   0   01011110  { 166, 482, 733, 373,  60, -27, 622, 207}
set 111 <    0   11010010  { 823, 472, 523, 905, 849, -18,  99, 987}
         \   0   11000000  { 721, 980, 964, 912, -40, 353, 704, 118}

Consider now our example address of 11010010 111 110 01. The last two bits don't mean anything to our cache, since they are smaller than a word, so we'd get only 11010010 111 110 xx from the lower-level cache (often expressed by saying that m = 14 and b = 3). That is tag 11010010, set address 111, and block offset 110. W first go to set 111

         /   0   01011110  { 166, 482, 733, 373,  60, -27, 622, 207}
set 111 <    0   10111111  { 823, 472, 523, 905, 849, -18,  99, 987}
         \   0   11000000  { 721, 980, 964, 912, -40, 353, 704, 118}

and check if any of the tags match 11010010. They don't so the cache fetches it from the next level of the cache hierarchy, replacing one line already in the set

         /   0   01011110  { 166, 482, 733, 373,  60, -27, 622, 207}
set 111 <    0   10111111  { 823, 472, 523, 905, 849, -18,  99, 987}
         \   1   11010010  {1110,2110,2150,3330,4414,4457,4970,4980}

Because it was just fetched it is marked as valid. The cache then returns word 110 = 6, which is 4970, toward the CPU.

If we find a line that matches the tag but has a 0 valid bit, that means the line does not reflect the current state of memory at that address (perhaps that memory has been updated by a direct-memory access disk read or that address has never been used to the line contains random garbage). We only have a cache hit if the tag matches and the valid bit is 1.

Write Example

If the CPU then wanted to set the same address to 0x03, the lower-level cache would bundle this up into a word (4970 = 0x0000136a; assuming little endian, byte 01 = the 0x13; replacing it with 0x03 gives 0x0000036a = 874). We thus have coming in Write 874 to address (11010010 111 110 xx). We again go to set 111

         /   0   01011110  { 166, 482, 733, 373,  60, -27, 622, 207}
set 111 <    0   10111111  { 823, 472, 523, 905, 849, -18,  99, 987}
         \   1   11010010  {1110,2110,2150,3330,4414,4457,4970,4980}

and check if any of the tags match 11010010. The third one does, so we check if it is a valid cache line. It is, so we change word 110 = 6 to be 874:

         /   0   01011110  { 166, 482, 733, 373,  60, -27, 622, 207}
set 111 <    0   10111111  { 823, 472, 523, 905, 849, -18,  99, 987}
         \   1   11010010  {1110,2110,2150,3330,4414,4457, 874,4980}

Depending on the cache implementation details, that change might immediately propagate up to higher-level caches (known as write-through) or it might wait until that line needs to be replaced by another block (known as write-back). Write-back has some advantages, but includes the use of a dirty bit in the cache that is set on a write operation so that the cache knows which lines need to be written back.

If we wanted to write but did not have the right line in cache, many possibilities follow. One big decision: do we fetch the new block (known as write-allocate) or do we simply forward the write up to the next cache (known as no-write-allocate)? There are many other considerations here, which we will ignore in this course.