Changelog:

9 Oct 2023: remove section on strict consistency; add section on memory barriers; add brief discussion of typical memory models for processors/compilers
10 Oct 2023: cleanup description of compiler requirements re: synchronization guarnetees; fix a bunch of grammatical issues in memory barriers section, add mention of partial barriers + the term fence

In practice, true shared-memory concurrency (where there is only one piece of hardware that stores the bits of a particular memory location) is not very scalable. L1 caches routinely create separate copies for each core; systems with multiple CPU chips may replicate across all cache layers; supercomputers may have multiple copies of the entirety of RAM; and distributed systems, where separate computers are communicating over a network, virtually always replicate data at each computer.

Thus, we need to worry about the consistency of multiple copies. Roughly speaking, by consistent we mean

when one processor writes to an address, other processors see when they read that address.

There are many more formal definitions of consistency¹. This writeup mentions just a few of the most common.

For memory accesses with multiple cores, most processors provide something in between the strong consistency model and the weak consistency model discussed below. This means that not all processors will have a fully consistent view of the values in memory — reads and writes may appear to occur in different order on different cores.

Memory consistency is also a concern for compilers. For example, when there’s a loop accessing a global variable, an optimizing compiler may want to store the global variable temporarily in a register. But this optimization prevents the value being visible to other threads while it is being changed. In contrast to most processors, compilers typical only provide relatively minimal consistency guarentees. Essentially, most compilers require code to use synchronization primitives (like locks, thread joining functions, etc.) to get any gaurentees about modifications to shared values being visible to other threads.

1 Memory Barriers

Most processors provide special instructions called memory barriers to allow a program to ensure that values are made consistently between processors. A memory barrier gaurentees that all reads and writes before memory barrier appear to occur before all operations after the memory barrier to all other processors.

Implementations of synchronization primitives often use memory barriers. For example, a common way to implement a unlock method might look something like:

MemoryBarrier();
the_lock->locked = FALSE;

The memory barrier here ensures that, even under weaker consistency models, any processor that sees the lock as unlocked will also see the results of all operations performed while the lock was locked.

Memory barriers are sometimes called memory fences.

Many processors provide partial memory barriers, where, rather than gaurenteeing the apparent order of all operations as observed on other processors, only a subset (e.g. only stores) are guarenteed.

On most architectures, in addition to dedicated memory barrier instructions, the read/modify/write atomic operations, used to implement most synchronziation primitives, also act as memory barrier or partial memory barrier.

2 Some Consistency Models

2.1 Strong Consistency / Sequential Consistency

All accesses (read and write) are seen by all parallel processors in the same sequential order.

If you programmed only with memory barriers between every memory access, you’d have strong consistency.

2.2 Weak consistency

All accesses (read and write) to synchronization variables are seen by all parallel processors in the same sequential order.

This is a relaxation of strong consistency to apply to only a few special memory locations. It is what is provided by correct use of things like pthread_mutex_lock(&m) – m is one such synchronization variable.

2.3 Sequential consistency

The result of execution is the same as if each process was interleaved in some order on a single sequential processor.

This definition is similar to strong consistency, but is based on result, not event. Some people even use the terms interchangeably.

Sequential consistency, being about the result (instead of each result), requires Eventual consistency.

Suppose process 1 performed operations A; B; C; in parallel with process 2 performing x; y;. If the outcomes are the same when one processor executes any one of the following:

A; B; C; x; y;
A; B; x; C; y;
A; B; x; y; C;
A; x; B; C; y;
A; x; B; y; C;
A; x; y; B; C;
x; A; B; C; y;
x; A; B; y; C;
x; A; y; B; C;
x; y; A; B; C;

then it is sequentially consistent. If any of the outcomes are different, it is not.

2.4 Eventual consistency

There exists a time after the last write to each address when all processors will receive the same value when reading from that address.

Note that this does not make any guarantees about what will be seen in that address, only that every process will agree.

Suppose one processor executes *x = 0x1234 and another executes *x = 0x5678. Eventual consistency can be satisfied even if

there is a time when the two disagree about *x’s value
they eventually converge on the value being 0x1278 or some other value neither of them tried to write.

Even providing eventual consistency requires some effort, but it is achievable enough that many distributed systems guarantee it.

See, e.g., the Wikipedia category and summary article ↩︎