







#### cache coherency states

extra information for each cache block overlaps with valid, dirty bits

stored in each cache

different caches may have different states for same block

# scheme 1: MSI







# scheme 1: MSI



## scheme 1: MSI

| State       | hear read    | hear write | read     | write    |
|-------------|--------------|------------|----------|----------|
| Invalid     |              |            | Shared   | Modified |
| Shared      | —            | to Invalid | Modified |          |
| Modified    | Shared       | Invalid    |          |          |
| blue: trans | sition sends | bus signal |          |          |
|             |              |            |          |          |
|             |              |            |          |          |
|             |              |            |          |          |

**MSI** example

|        | •  |       |        |         |       |    |       |
|--------|----|-------|--------|---------|-------|----|-------|
|        | CI | PU1   |        | CPU2    |       |    | MEM1  |
| addres | S  | value | state  | address | value | st | tate  |
| 0xA30  | 0  | 100   | Shared | 0x9300  | 172   | S  | hared |
| 0xC40  | 0  | 200   | Shared | 0xA300  | 100   | S  | hared |
| 0xE50  | 0  | 300   | Shared | 0xC500  | 200   | S  | hared |



5

# **MSI** example

|                                                              | et ( <mark>writeb</mark> a | ack)     |               |     |         |  |  |  |
|--------------------------------------------------------------|----------------------------|----------|---------------|-----|---------|--|--|--|
| C                                                            | PU1                        |          | CPU2          |     | MEM1    |  |  |  |
| address value state                                          |                            | state    | address value |     | state   |  |  |  |
| 0xA300                                                       | <del>101102</del>          | Modified | 0x9300        | 172 | Shared  |  |  |  |
| 0xC400 200 Share                                             |                            | Share    | 0xA300 100    |     | Invalid |  |  |  |
| <sup>(0)</sup> Modified state — nothing communicated! Shared |                            |          |               |     |         |  |  |  |
|                                                              |                            |          |               |     |         |  |  |  |
| CPU1 writes 102 to 0xA300                                    |                            |          |               |     |         |  |  |  |

# **MSI** example



6

#### **MSI** example "Write 102 into 0xA300" CPU1 CPU2 MEM1 value address address state value state Shared 0xA300 <del>102</del> 0x9300 172 Shared 0xC400 200 Shared 0xA300 100 Invalid <sup>0×E</sup> Written back to memory early Shared (could also become Invalid at CPU1) CPU2 reads 0xA300

#### update memory

6

6

to write value (enter modified state), only need to invalidate others

more efficient: shorter bus message

#### on cache replacement/writeback

still happens — e.g. want to store something else

changes state to invalid

```
requires writeback if modified (= dirty bit)
```

#### scheme 1: MSI

| Modified | value is <mark>different than memory</mark> and<br>I am the only one who has it |
|----------|---------------------------------------------------------------------------------|
| Shared   | value is the same as memory                                                     |
| Invalid  | l don't have the value; I will need<br>to ask for it                            |

# **MSI** complaints

**modifying** (read then write then write) a value often three messages:

initial read from memory

invalidate other caches (and maybe write to memory) on initial write

final writeback

# scheme 2: MESI

| Modified  | value is <mark>different than memory</mark> and<br>I am the only one who has it |
|-----------|---------------------------------------------------------------------------------|
| Exclusive | value is same as memory <i>and</i> I am<br>the only one who has it              |
| Shared    | value is the same as memory                                                     |
| Invalid   | I don't have the value; I will need<br>to ask for it                            |

8



#### scheme 2: MESI



#### scheme 2: MESI



#### read for ownership

reading to modify a value soon?

read into Exclusive state even if reading from cache

invalidate and read

second way to enter exclusive state

## **MESI complaints**

have to update memory to share a modified value ... even though caches read from other caches

read from which cache?

## scheme 2: MESI



#### scheme 3: MOESI

| Modified | value is different than memory and |
|----------|------------------------------------|
|          | I am the only one who has it       |

- **Owned** value is different than memory *and* I must update memory
- **Exclusive** value is same as memory *and* I am the only one who has it
- Shared value is same as memory or cache in Owned state

16

Invalid I don't have the value







# **MOESI** example



CPU1: read 0xA300 CPU1: write 0xA300 CPU1: read 0xA300 CPU2: read 0xA300 CPU2: write 0xA300







| MOESI example                          |                                                            |                      |         |       |        |    | MOESI example                             |                                  |                      |                 |                    |          |
|----------------------------------------|------------------------------------------------------------|----------------------|---------|-------|--------|----|-------------------------------------------|----------------------------------|----------------------|-----------------|--------------------|----------|
| CPU1: "0×A300 = 101"<br>CPU1 CPU2 MEM1 |                                                            |                      |         |       |        |    | •                                         | U2: "I'm<br>PU1                  | changing             | 0xA300"<br>CPU2 |                    | MEM1     |
| address                                | value                                                      | state                | address | value | state  |    | address                                   | value                            | state                | address         | value              | state    |
| 0xA300                                 | 101                                                        | Owned                | 0xA300  | 101   | Shared |    | 0xA300                                    | <del>101</del>                   | Invalid              | 0xA300          | <del>101</del> 102 | Modified |
| CPU1:<br>CPU1:<br>CPU2:                | read 0xA<br>write 0xA<br>read 0xA<br>read 0xA<br>write 0xA | A300<br>A300<br>A300 |         |       |        | 18 | CPU1:<br>CPU1:<br>CPU1:<br>CPU2:<br>CPU2: | write 0×<br>read 0×/<br>read 0×/ | A300<br>A300<br>A300 |                 |                    | 18       |

#### **MSI versus MESI versus MOESI**

CPU1: read 0xA300 CPU1: write 0xA300 MSI: invalidate CPU1: read 0xA300 CPU2: read 0xA300 MSI/MESI: memory write CPU2: write 0xA300 MSI: invalidate

#### Other cache coherency options

can invalidate instead of updating other caches on write

invalidation message faster to send than new value

tradeoff: faster if other cache won't use value

| Dropping states from MOESI |                                                                           |    |  |  |  |  |
|----------------------------|---------------------------------------------------------------------------|----|--|--|--|--|
| Modified                   | value is different than memory <i>and</i><br>I am the only one who has it |    |  |  |  |  |
| Owned                      | value is different than memory <i>and</i><br>I must update memory         |    |  |  |  |  |
| Exclusive                  | value is same as memory <i>and</i> I am<br>the only one who has it        |    |  |  |  |  |
| Shared                     | value is same as memory or cache<br>in Owned state                        |    |  |  |  |  |
| Invalid                    | I don't have the value                                                    | 21 |  |  |  |  |

#### Mapping to the paper

| MSI + reread to | get in Modified: | Synapse |
|-----------------|------------------|---------|
|-----------------|------------------|---------|

- $\mathsf{MESI} + \mathsf{full}\text{-}\mathsf{write-to-invalidate: write-once}$
- $\mathsf{MOSI} + \mathsf{forward}\mathsf{-}\mathsf{on}\mathsf{-}\mathsf{write} \text{: } \mathsf{Berkeley}$
- $\mathsf{MESI} + \mathsf{forward}\mathsf{-on}\mathsf{-write} : \mathsf{III}\mathsf{inois}$
- $\mathsf{MESI}\ +\ \mathsf{invalidate-on-write:}\ \ \mathsf{Firefly}$
- $\mathsf{MOESI} + \mathsf{forward}\mathsf{-}\mathsf{on}\mathsf{-}\mathsf{write} : \ \mathsf{Dragon}$

# **Dropping states from MOESI**

| Modified  | value is different than memory <i>and</i><br>I am the only one who has it |
|-----------|---------------------------------------------------------------------------|
| Owned     | value is different than memory <i>and</i><br>I must update memory         |
| Exclusive | value is same as memory <i>and</i> I am<br>the only one who has it        |
| Shared    | value is same as memory or cache<br>in Owned state                        |
| Invalid   | I don't have the value                                                    |

# "System Power"

sum of processor utilizations

how much time are CPUs spending waiting for bus

what about overlapping cache accesses and computation??



#### simulation caveats

workloads?

variation in hardware?

#### false sharing

cache blocks are shared even if you are accessing different parts

huge performance problem with writes

# Present-day snooping cache coherency

AMD processors use MOESI

Intel uses something called MESIF

plus some techniques we'll talk about next time

#### **MESIF** states

| Modified   | value is different than memory <i>and</i><br>I am the only one who has it |
|------------|---------------------------------------------------------------------------|
| Exclusive  | value is same as memory <i>and</i> I am<br>the only one who has it        |
| Shared     | value is same as memory                                                   |
| Invalid    | I don't have the value                                                    |
| Forwarding | value is same as memory <i>and</i> I should provide it if requested       |



#### **Non-bus topologies**

28

necessary to connect large numbers of caches

higher bandwidth — if you don't broadcast everything

next time: avoiding broadcast



#### Implementing compare-and-swap

get block into Exclusive or Modified state read from memory/cache if necessary invalidate other caches if necessary

compare, if value matches, do write (Modified state)

#### Coherency

common property: single 'responsible' cache for possibly changed values Owned, Exclusive, Modified states

responsible cache must reply to reads of address

#### variation:

when is responsibility acquired? (only on write?) when is it relinguished? (only on other's write?)