Concurrent Semantics

1. Read Modify Write (RMW)

In shared memory, a memory location holds an arbitrary binary value. We can decide between single/multiple readers/writers. But this is not enough - we need atomic RMW (read-modify-write) instructions. This can be thought of as a function that:

A weak RMW only supports 2 threads. For example:

1class WeakRMWLocation(private val init: Int) { 2 private var value: Int = init 3 4 def read(): Int = this.synchronized { 5 val prior = value 6 value = value // Identity function 7 prior 8 } 9 10 def exchange(v: Int): Int = this.synchronized { 11 val prior = value 12 value = v // A write, essentially 13 prior 14 } 15 16 def fetchAndAdd(v: Int): Int = this.synchronized { 17 val prior = value 18 value = prior + v // Add v to the current value 19 prior 20 } 21}

A strong RMW (CAS) enables synchronisation between any number of threads:

1class StrongRMWLocation(private val init: Int) { 2 private var value: Int = init 3 4 def compareAndSet(expected: Int, newValue: Int): Boolean = this.synchronized { 5 if (value == expected) { 6 value = newValue 7 true 8 } else { 9 false 10 } 11 } 12}

2. Sequential Consistency

The sequential consistency model (SC, aka interleaving semantics) is inefficient and expensive to implement, but a good abstraction of concurrency for high level reasoning. Everything else is a weak memory models such as Hardware Models (e.g. x86) and Software Models (e.g. Java Memory Model, C11 standard). These models allow for more optimisations, but are harder to reason about. In SC:

2.1 ConWhile

We assume a language of Boolean and Program expressions that describe thread-local computations. ConWhile has sequential commands:

A concurrent program is a map from thread identifiers to sequential commands: . We can use the shorthand notation for concurrent programs where: denotes an -threaded program with and .

Example 2.2 Math Syntax

We model the shared memory as a map from locations to values where denotes a set of all values including ints and bools. Each thread also has a store of private registers . We also define a store map associating each thread with a private store .

An SC configuration is a triple where is the program to be executed, is the store map, and is the shared memory.

2.2 Operational Semantics

We describe SC operational semantics by separating program transitions from storage / memory transitions. This achieves modularity.

An SC transition label may be:

2.2.1 Sequential Transitions

Of the form which evaluates against store , resulting in simpler command and new store with transition label .

We assume total functions and that evaluate Boolean expressions and program expressions in store .

We have new and special cases for memory accesses. Program transitions don't include memory, no update takes place.

Transition for assumes an arbitrary old value in memory location :

Transition for may arbitrarily succeed or fail, by assuming that matches the expected value or not:

and do not do any updates, this is done in their respective storage transitions.

2.2.2 Program Transitions

Sequential transitions are per thread, so we can just lift them into a program transition. When a thread of program takes a sequential transition with label , the whole program takes a transition with label . So, program transitions are of the form :

2.2.3 Storage Transitions

Storage transitions are of the form , updating memory to when a thread executes with step . Here the choice of is arbitrary (arbitrary interleaving!).

2.2.4 Combining Transitions

We can combine program transitions and storage transitions into operational semantic transitions of the form :

Here we make sure the same thread is updating both the program and the memory with the same label .

2.3 Traces

We can define as the reflexive transitive closure of , formally: , where:

Now we can define:

2.4 Determinacy & Confluence

A program is deterministic if it has exactly one SC trace. A program is confluent if all its SC traces lead to the same outcome. Determinacy implies confluence, but not vice versa.

In general, a program is non-deterministic and not confluent.

3. Total Store Ordering (TSO)

Sequential consistency (interleaving semantics) means instructions of each thread are executed in order, but different threads can be interleaved arbitrarily. Unlike SC, in real life instructions may be reordered. We can observe weak behaviours (those not obvservable in SC).

x86 follows a weak memory model called TSO.

3.1 Write - Read Reordering

Only adjacent instructions may not be reordering. TSO allows for the follow reordering:

No other reordering is allowed. Only different locations cause weak behaviours. Essentially, TSO is SC with these read-write reorderings and store buffering.

3.2 Store Buffers

Every thread is given a private store buffer that temporarily holds writes before they are visible to other threads. This is a FIFO queue to improve the performance of a program and remove load on the databus. The processor decides how to unbuffer.

3.3 Memory Fencing

To enforce ordering, we can use a special instruction called a memory fence (or memory barrier). This ensures that all preceding memory operations are completed before any subsequent ones begin. This prevents reordering by locking the thread (bad!) until the store buffer is empty.

Alternatively, we could use a RMW instruction, as they also flush the store buffer and write to memory directly.

3.4 TSO Semantics

We extend ConWhile with the mfence instruction:

Once again, shared memory, store maps and stores are modelled as before:

Each thread now has a private buffer, modelled as a FIFO sequence of delayed write labels. A buffer entry denotes a delayed write on with value :

Then we define a buffer map associating each thread with a private buffer: . That is, is the buffer of thread .

A TSO configuration is a quadruple where is the program to be executed, is the store map, is the buffer map, and is the shared memory.

As with SC, we separate program and storage transitions. The program transitions are the same as SC, except for the mfence instruction. Storage transitions are different as we have to take buffers into account.

3.4.1 TSO Transition Labels

A TSO transition label may be:

3.4.2 TSO Sequential Transitions

Sequential transitions are the same as SC with the addition of the mfence instruction:

3.4.3 TSO Program Transitions

3.4.4 TSO Storage Transitions

TSO storage transitions are of the form , updating memory to and buffer map to when a thread executes with step . Here the choice of is arbitrary (arbitrary interleaving!).

A memory write is buffered, not immediately visible to other threads. Here denotes sequence concatenation:

A memory read on first checks the buffer for writes on , returning the latest write. Otherwise, we read from memory . We capure the lookup chain with :

A memory fence stops reordering by ensuring the buffer is empty:

A successful RMW on updates the old value to a new value in memory . The buffer must be empty before the update:

A failed RMW on does not update memory, but the buffer must be empty:

At any point in time, the storage system may take a silent step to propogate a delayed write into memory. It does so in FIFO order:

3.4.5 Combining TSO Transitions

As before, in a silent step the storage system remains unchanged. Similarly, the program and store maps also remain the same:

Finally, if both the program and storage systems take the same transition, we can combine them into a transition over a TSO configuration:

3.4.6 TSO Traces

As before, we define for the reflexive transitive closure of . Then we define:

Let the initial memory , initial store map , and the terminated program be as before:

Then:

Once again, a TSO program is neither deterministic nor confluent.

Back to Home