CS 3330: Lecture 14: Understanding Caches

This page does not represent the most current semester of this course; it is present merely as an archive.

Of Bytes and Words

It is common for caches to be accessed in sizes larger than a single byte. A single accessed value is called a word. Because the textbook assumes single-byte words, we will also assume single-byte words in what follows. Note, though, that most real hardware assumes 4-byte words and that addresses that are not a multiple of 4 will generally run more slowly as a consequence.

Terms

The textbook uses the following symbols:

B = number of bytes (or words) per block
b = lg(B) -- that is, B = 1 << b
E = number of lines per set
m = bits per address
S = number of sets in the cache
s = lg(S) -- that is, S = 1 << s

These are conceptually organized as

class Line {
    bit   valid;
    bit   tag[m-s-b];
    word  block[B];
}
cache = new Set<Line, E>[S];

A block has B = 1 << b bytes.

If E = 1, we call the cache a Direct-Mapped Cache

If S = 1, we call the cache a Fully-Associative Cache

Otherwise we call the cache a Set-Associative Cache

Read Example

Let's explore a cache with E = 3, s = 3, and b = 3. Assume that addresses are 14-bits long (m = 14).

Given an address like 11010010111110 we can break it down as follows:

11010010     111         110 
  tag     set-index  blockOffset

The block offset is the last b bits.

The set index is the s bits above the block offset.

The tag is any bits above the set index.

Our cache would look like

           valid    tag            block (as int[] literal)
         /   0   00010011  { 748, 561, 983, 192, 379, 834, 620, 434}
set 000 <    1   01100000  {  57, 404, 777, -45,  72, 186, 685, 291}
         \   0   10000111  { 877, 640, 613, 112, 542, 356, -70, 644}

         /   0   01001010  { 950,  83, 385, 622,  36, 186, 935, 431}
set 001 <    1   10111000  { 816,  78, 583, 443, 346, 385, 281, -20}
         \   0   01110110  { 456, 796,  70,  56, 699, -83, 118, 727}

         /   0   10111100  { 331, 482, 411, 947, -69, 813, 695, 383}
set 010 <    0   00110001  {  47, 484, 750, 613, 786, 202, 718, 485}
         \   1   01010001  { 784, 538, 805, 595, 332, 539, 334, 689}

         /   0   01100100  { 537, 511, 954, 682, 325, 777, 304, 186}
set 011 <    1   10000111  { 982, 484, 516, 753,  55, 303, -85, 499}
         \   0   00011101  { 326, 238, 720, 567, 973, 824, 552, 992}

         /   1   00000010  { 476, 849, 306, 909, 600, 942, 446, 102}
set 100 <    1   00010100  { 151, 727, 546, 837, 663, 501, 299, 938}
         \   1   11010110  { 188, 114, 609, 134, 876, 594, 241, 331}

         /   1   00001001  { 990, 193, -28,  94, 887, 605, 573, 982}
set 101 <    0   01100111  { 321,  29, 295, 692, -84, 818, 964, 633}
         \   0   10001110  { 577, 872, 150, 487, -91, 331, 139, 852}

         /   1   01011010  { 608, 490, 368, -47, 347,  44, -28, 425}
set 110 <    1   00100011  { -62, 907, 297, 826,  31, 506, 522, 696}
         \   0   00110100  {  82, 770,   8, 345, 619, 578, 493, 167}

         /   0   01011110  { 166, 482, 733, 373,  60, -27, 622, 207}
set 111 <    0   11010010  { 823, 472, 523, 905, 849, -18,  99, 987}
         \   0   11000000  { 721, 980, 964, 912, -40, 353, 704, 118}

Consider now our example address of 11010010 111 110. That is tag 11010010, set address 111, and block offset 110. W first go to set 111

         /   0   01011110  { 166, 482, 733, 373,  60, -27, 622, 207}
set 111 <    0   10111111  { 823, 472, 523, 905, 849, -18,  99, 987}
         \   0   11000000  { 721, 980, 964, 912, -40, 353, 704, 118}

and check if any of the tags match 11010010. They don't so the cache fetches it from the next level of the cache hierarchy, replacing one line already in the set

         /   0   01011110  { 166, 482, 733, 373,  60, -27, 622, 207}
set 111 <    0   10111111  { 823, 472, 523, 905, 849, -18,  99, 987}
         \   1   11010010  {1110,2110,2150,3330,4414,4457,4970,4980}

Because it was just fetched it is marked as valid. The cache then returns word 110 = 6, which is 4970, toward the CPU.

If we find a line that matches the tag but has a 0 valid bit, that means the line does not reflect the current state of memory at that address (perhaps that memory has been updated by a direct-memory access disk read or that address has never been used to the line contains random garbage). We only have a cache hit if the tag matches and the valid bit is 1.

Write Example

If the CPU then wanted to set the same address to 0x03, the lower-level cache would bundle this up into a word (4970 = 0x0000136a; assuming little endian, byte 01 = the 0x13; replacing it with 0x03 gives 0x0000036a = 874). We thus have coming in Write 874 to address (11010010 111 110). We again go to set 111

         /   0   01011110  { 166, 482, 733, 373,  60, -27, 622, 207}
set 111 <    0   10111111  { 823, 472, 523, 905, 849, -18,  99, 987}
         \   1   11010010  {1110,2110,2150,3330,4414,4457,4970,4980}

and check if any of the tags match 11010010. The third one does, so we check if it is a valid cache line. It is, so we change word 110 = 6 to be 874:

         /   0   01011110  { 166, 482, 733, 373,  60, -27, 622, 207}
set 111 <    0   10111111  { 823, 472, 523, 905, 849, -18,  99, 987}
         \   1   11010010  {1110,2110,2150,3330,4414,4457, 874,4980}

Depending on the cache implementation details, that change might immediately propagate up to higher-level caches (known as write-through) or it might wait until that line needs to be replaced by another block (known as write-back). Write-back has some advantages, but includes the use of a dirty bit in the cache that is set on a write operation so that the cache knows which lines need to be written back.

If we wanted to write but did not have the right line in cache, many possibilities follow. One big decision: do we fetch the new block (known as write-allocate) or do we simply forward the write up to the next cache (known as no-write-allocate)? There are many other considerations here, which we will ignore in this course.

Copyright © 2015 by Luther Tychonievich. All rights reserved.
Last updated 2015-03-16 12:14 -0400