If we lack hardware, code or data to perform experiments, we can use modelling. We need to make some assumptions:
Input assumptions (e.g. data follows some known distribution).
System assumptions (e.g. we do not model system noise).
Code assumptions (e.g. code is deterministic).
We can model using two approaches:
Numerical / Experimental Model: a series of datapoints describing the observed behaviour of the system are collected, and a model is fitted to the data. Predictive power depends on interpretation.
Analytical Model is a formal characterization of the relationship between parameters and performance metrics. Difficult to understand by humans. Prediction performed by evaluating the model.
2. Numerical Modelling
Gather data - for each parameter setting, run a microbenchmark (small, specially designed program to test a specific portion of a system) multiple times to get an average performance metric.
Interpret - fit a model to the data (e.g. interpolation).
Example: Memory Subsystem Microbenchmark
1externint* input;2extern size_t N;// Large const3extern size_t stride;// Experiment parameter4int sum =0;5for(size_t i =0; i < N; i += stride)6 sum += input[i];// We benchmark memory accesses
Numerical models are easy to get (if system is available), based on groudn truth & easy to interpret. However, it generalizes poorly (cannot be applied to new environments), requires massive amounts of experimental data, has limited accuracy in prediction (data may be missing/inaccurate), limited interpretability & limited insight.
3. Analytical Modelling
Analytical models require detailed understanding of the system & requires extensive validation (results are always questionable). Often end up very complex to deal with edge cases.
The line between the two is blurry, but analytical models can model behaviours that numerical models cannot. An analytical model has:
A Characteristic Equation (with parameters) that describes the behaviour of the darget metric of the experiment or a system in dependence of a varied parameter.
Values for system parameters (e.g. access latency, granularity, cache sizes).
4. Modelling Memory
Assume known system parameters:
: Size of a General Purpose Register of the CPU
: Access latency of level 1 cache
: Capacity of a General Purpose Register of the CPU
: Size of a cache line in level 1 cache
: Access latency of level 2 cache
: Capacity of level 1 cache
: Size of a cache line in level 2 cache
: Access latency of level 3 cache
: Capacity of level 2 cache
: Size of a memory page
: Lookup time in the page table
: Number of memory pages in the TLB times page size
In general, is the access granularity, is the miss latency and is the capacity of a memory level.
4.1 Access Time
With this we can make a characteristic non linear equation for memory access time. Let be the stride, then:
Alternatively, based on the size of the data accessed, :
Some parameters can be read from documentation, but some require experimentation to find. Self tuning systems require less work, are more resilient, scale forward (work on future architectures) and can be more accurate.
4.2 Access Patterns
We now need to apply these equations to real code. A more complex example:
1externint* input1;// length = 1024, uniform random data2externint* input2;// length = 64, random data3int sum =0;4for(size_t i =0; i < N; i += stride)5 sum += input2[input1[i]];
We can use parameters to capture behaviour of a memory region. We have:
the length (i.e. number of stored tuples).
the width (i.e. size of each tuple in processor words).
the size (i.e. ).
We also have as the number of words read in each access. We can now model different access patterns.
4.2.1 Sequential Access (Non Examinable)
The number of cache misses can be modelled as:
Where denotes sequential access, and is the memory level, and is the sequential traversal access pattern. Essentially:
If the gap between two cache lines is greater than the block size, we have 1 cache miss for every access.
If the gap is less than the block size, we have 1 cache miss for every block.
We can model complex access patterns with two operators:
is the sequential execution of access patterns and .
is the concurrent (interleaved) execution of access patterns and .
For the code above, we can use access patterns and to model the access pattern as:
5. Modelling Stateful Systems
Some effects / componants have dynamic state that influences behaviour & performance.
We could use stochastic modelling as opposed to analytical modelling.
5.1 Markov Chains
A Markov Chain is essentially a finite state machine with transition probabilities. The Markov Property states that the next state depends only on the current state & a random variable, not the history of previous states.
Let's try to model branch prediction:
1externint* input;// uniform rand data2int sum =0;3for(size_t i =0; i < N; i++)4if(input[i]>20)5 sum += input[i];
We can model a saturated counter with 4 states:
: Strongly Not Taken
: Weakly Not Taken
: Weakly Taken
: Strongly Taken
The probability of it being in any state is a stationary distribution. Here, the branch misprediction rate is .
6. Performance Modelling Recipe
Modelling is usually infeasible due to scale & noise. We must:
Identify parts of the code that matter for performance using a profiler (a *bottleneck).
Re-create relevant behaviour in a controlled environment (e.g. microbenchmark).