In shared memory, a memory location holds an arbitrary binary value. We can decide between single/multiple readers/writers. But this is not enough - we need atomic RMW (read-modify-write) instructions. This can be thought of as a function that:
Returns the previous value of a memory location.
Atomically updates the memory location to a new value mumble(r).
Does so atomically, in a mutually exclusive way.
A weak RMW only supports 2 threads. For example:
1class WeakRMWLocation(privateval init:Int){2privatevar value:Int= init
34def read():Int=this.synchronized {5val prior = value
6 value = value // Identity function7 prior
8}910def exchange(v:Int):Int=this.synchronized {11val prior = value
12 value = v // A write, essentially13 prior
14}1516def fetchAndAdd(v:Int):Int=this.synchronized {17val prior = value
18 value = prior + v // Add v to the current value19 prior
20}21}
A strong RMW (CAS) enables synchronisation between any number of threads:
The sequential consistency model (SC, aka interleaving semantics) is inefficient and expensive to implement, but a good abstraction of concurrency for high level reasoning. Everything else is a weak memory models such as Hardware Models (e.g. x86) and Software Models (e.g. Java Memory Model, C11 standard). These models allow for more optimisations, but are harder to reason about. In SC:
Instructions of each thread are executed in program order (not necessarily true in high-level languages like C).
Instructions of different threads are interleaved arbitrarily.
denote shared memory locations.
denote private registers.
denote expressions over integers and registers.
denotes a read from memory location into register .
denotes a write from register to memory location .
denotes an assignment of the value of expression to register .
2.1 ConWhile
We assume a language of Boolean and Program expressions that describe thread-local computations. ConWhile has sequential commands:
A concurrent program is a map from thread identifiers to sequential commands: . We can use the shorthand notation for concurrent programs where: denotes an -threaded program with and .
Example 2.2 Math Syntax
We model the shared memory as a map from locations to values where denotes a set of all values including ints and bools. Each thread also has a store of private registers . We also define a store map associating each thread with a private store .
An SC configuration is a triple where is the program to be executed, is the store map, and is the shared memory.
2.2 Operational Semantics
We describe SC operational semantics by separating program transitions from storage / memory transitions. This achieves modularity.
An SC transition label may be:
An empty label denotes a silent transition.
A read label denotes reading value from memory location .
A write label denotes writing value to memory location .
A successful RMW denotes updating the old value of location , to a new value (e.g. successful CAS, FAA).
A failed RMW denotes a failed CAS instruction where the old value in memory does not match the expected value.
2.2.1 Sequential Transitions
Of the form which evaluates against store , resulting in simpler command and new store with transition label .
We assume total functions and that evaluate Boolean expressions and program expressions in store .
We have new and special cases for memory accesses. Program transitions don't include memory, no update takes place.
Transition for assumes an arbitrary old value in memory location :
Transition for may arbitrarily succeed or fail, by assuming that matches the expected value or not:
and do not do any updates, this is done in their respective storage transitions.
2.2.2 Program Transitions
Sequential transitions are per thread, so we can just lift them into a program transition. When a thread of program takes a sequential transition with label , the whole program takes a transition with label . So, program transitions are of the form :
2.2.3 Storage Transitions
Storage transitions are of the form , updating memory to when a thread executes with step . Here the choice of is arbitrary (arbitrary interleaving!).
2.2.4 Combining Transitions
We can combine program transitions and storage transitions into operational semantic transitions of the form :
Here we make sure the same thread is updating both the program and the memory with the same label .
2.3 Traces
We can define as the reflexive transitive closure of , formally: , where:
Now we can define:
The initial memory where every memory location holds 0.
Similarly the initial store where every register holds 0.
Similarly the initial store map where every thread has the initial store.
The terminated program where every thread has terminated.
Given a program , the initial SC configuration is .
Given a program , an SC trace of is an evaluation path from the initial confugration to : for some store map and memory . Then is the SC outcome of .
2.4 Determinacy & Confluence
A program is deterministic if it has exactly one SC trace. A program is confluent if all its SC traces lead to the same outcome. Determinacy implies confluence, but not vice versa.
In general, a program is non-deterministic and not confluent.
3. Total Store Ordering (TSO)
Sequential consistency (interleaving semantics) means instructions of each thread are executed in order, but different threads can be interleaved arbitrarily. Unlike SC, in real life instructions may be reordered. We can observe weak behaviours (those not obvservable in SC).
x86 follows a weak memory model called TSO.
3.1 Write - Read Reordering
Only adjacent instructions may not be reordering. TSO allows for the follow reordering:
Write - Read reordering on different locations. A later read on can be reordered before an earlier write on when .
Write - Read reordering on the same location. A ;ater read on can be reordered before an earlier write on provided that the value of the write is forwarded to (inlined in) the read.
No other reordering is allowed. Only different locations cause weak behaviours. Essentially, TSO is SC with these read-write reorderings and store buffering.
3.2 Store Buffers
Every thread is given a private store buffer that temporarily holds writes before they are visible to other threads. This is a FIFO queue to improve the performance of a program and remove load on the databus. The processor decides how to unbuffer.
3.3 Memory Fencing
To enforce ordering, we can use a special instruction called a memory fence (or memory barrier). This ensures that all preceding memory operations are completed before any subsequent ones begin. This prevents reordering by locking the thread (bad!) until the store buffer is empty.
Alternatively, we could use a RMW instruction, as they also flush the store buffer and write to memory directly.
3.4 TSO Semantics
We extend ConWhile with the mfence instruction:
Once again, shared memory, store maps and stores are modelled as before:
Each thread now has a private buffer, modelled as a FIFO sequence of delayed write labels. A buffer entry denotes a delayed write on with value :
Then we define a buffer map associating each thread with a private buffer: . That is, is the buffer of thread .
A TSO configuration is a quadruple where is the program to be executed, is the store map, is the buffer map, and is the shared memory.
As with SC, we separate program and storage transitions. The program transitions are the same as SC, except for the mfence instruction. Storage transitions are different as we have to take buffers into account.
3.4.1 TSO Transition Labels
A TSO transition label may be:
An empty label denotes a silent transition.
A read label denotes reading value from memory location .
A write label denotes writing value to memory location .
A successful RMW denotes updating the old value of location , to a new value (e.g. successful CAS, FAA).
A failed RMW denotes a failed CAS instruction where the old value in memory does not match the expected value.
A memory fence label denotes a memory fence operation.
3.4.2 TSO Sequential Transitions
Sequential transitions are the same as SC with the addition of the mfence instruction:
3.4.3 TSO Program Transitions
3.4.4 TSO Storage Transitions
TSO storage transitions are of the form , updating memory to and buffer map to when a thread executes with step . Here the choice of is arbitrary (arbitrary interleaving!).
A memory write is buffered, not immediately visible to other threads. Here denotes sequence concatenation:
A memory read on first checks the buffer for writes on , returning the latest write. Otherwise, we read from memory . We capure the lookup chain with :
A memory fence stops reordering by ensuring the buffer is empty:
A successful RMW on updates the old value to a new value in memory . The buffer must be empty before the update:
A failed RMW on does not update memory, but the buffer must be empty:
At any point in time, the storage system may take a silent step to propogate a delayed write into memory. It does so in FIFO order:
3.4.5 Combining TSO Transitions
As before, in a silent step the storage system remains unchanged. Similarly, the program and store maps also remain the same:
Finally, if both the program and storage systems take the same transition, we can combine them into a transition over a TSO configuration:
3.4.6 TSO Traces
As before, we define for the reflexive transitive closure of . Then we define:
Let the initial memory, initial store map, and the terminated program be as before:
Then:
The initial buffer map is where every thread has an empty buffer.
Given a program , the initial TSO configuration of is .
Given a program , a TSO trace of is an evaluation path from the initial configuration to : there exists and such that: .
Given a program , a TSO outcome is a pair such that .
Once again, a TSO program is neither deterministic nor confluent.