Dynamic Datarace Detection
1. Dynamic Datarace Detection
How do tools like thread sanitizer work? For example, some bad code:
1static void ThreadBody(int &x, int &y, int &z, std::mutex &m) { 2 m.lock(); 3 x = x + 1; 4 m.unlock(); 5 6 m.lock(); 7 y = y + 1; 8 m.unlock(); 9 10 z = z + 1; // SHOULD BE PROTECTED 11}
Although this program does have a datarace, there is no room for the compiler to exploit undefined behavior, and the chance of concurrent threads stepping on each other's toes is low. This means some dataraces are very difficult to find without a thread sanitizer.
To santize, the compiler will have to record every read and write which may end up in shared memory (local variables where the address is not leaked). It will also record:
- Thread launch / join.
- Mutex acquire / release.
- Atomic operations.
Each library function updates the dynamic race detection state. This state allows conditions for data races to be checked as a program runs. The library functions must be mutually exclusive to avoid data races in the race detection state itself. This leads to significant runtime overhead.
2. Vector Clock Algorithm
Suppose:
- There are
threads with IDs . - Synchronisation is done with a set of locks,
. is a set of possibly shared memory locations where races can occur. - Each thread keeps track of a logical clock: a non negative integer incremented each time the thread releases a mutex.
- A vector clock is a mapping
, one per thread . is the logical clock for thread . is the set of all vector clocks. - The bottom vector clock is
. - There is a partial order on vector clocks:
. - To join vector clocks:
. - An increment function
2.1 Algorithm State
The algorithm state is given by
is mapping of threads to their vector clocks. is mapping of locks to their vector clocks. is mapping of memory locations to the vector clock of the last read. is mapping of memory locations to the vector clock of the last write.
2.1.1 Per-Thread Clocks
A thread's clock
is my logical clock, always positive. means I know 's logical clock is at least .
When
2.1.2 Per-Lock Clocks
A lock's clock
2.1.3 Per-Location Clocks
A location's read clock
A location's write clock
Example
- Suppose a thread
reads from : . - Recall partial order on vector clocks:
. - If
, then a datarace is detected: some thread wrote to without knowing about it, i.e. . This only happens when
and have not synchronised and a write-read race occurs.
2.2 Intercepted Operations
Every read/write to/from possibly shared memory is intercepted:
: thread reads from location . : thread writes to location .
Every lock acquire/release is intercepted:
: thread acquires mutex . : thread releases mutex .
2.3 Initial State
Initially, we set:
. Each thread knows its logical clock is and thinks all other threads' logical clocks are . no mutex has been released. no location has been read. no location has been written.
2.4 Operation Semantics
2.4.1 Shared Memory Rules
The
The
2.4.2 Lock Rules
The thread
- With
thread makes the world aware of its logical clock and the latest values knows about other threads. When another thread acquires the lock, it will learn this information. - With
, thread increments its own logical clock since it is performing a release operation.
2.4.3 Datarace Rules
If a thread
If a thread
2.5 Correctness
Only
- Initially,
so and . - Subsequently,
only reveals its logical clock when it releases a lock, but immediately increments it afterwards. Thus, no other thread can ever know 's current logical clock.
Suppose a the
- The accesses are by the same thread. If
then is impossible since always knows its own logical clock. - The accesses are ordered by synchronisation. Assume wlog that there are no interleaving operations. Then:
| Initial | ||||
Since we know that
2.6 Problems
Space Inefficient - requires a vector clock per thread for every possible memory location. We can optimise this by only tracking who last updated a memory location - this is enough.