Introduction
Performance is how well a machine does work. We assume all software is bug-free. The quality percieved by the user is often associated with execution time.
HPC focuses on a single, simple problem of high value supported by special-purpose hardware. SPE focuses on systems, made from interacting components. These systems are widely applicable, (or generic). A system is domain-agnostic (e.g. DBMS, OS, Grep, Tensorflow).
A challenge in SPE is building a system that balances maintanability, speed & wide applicability.
1. Systems Performance Engineering
SPE is about fulfulling non-functional requirements. To define an appropriate optimsiation level, we define a target metric (e.g. throughput, latency, scalability). The target metric is often a trade-off between multiple factors (e.g. speed vs memory usage). Either set an optimisation budget (e.g. developer time) or optimisation target/threshold, also called a quality of service (QoS) objective, (the statistical properties of a metric that shall hold for the system).
A Service Level Agreement (SLA) is a formal, legal contract specifying QoS objectives as well as penalties for violations.
1.1 SMART
When defining requirements, be SMART:
- Specific: State exactly what is acceptable in numeric terms.
- Measurable: Make sure it can be quantified and tracked.
- Acceptable: Rigorous enough to guarantee success.
- Realisable: Lenient enough to allow implementation.
- Thorough: Ensure all aspects of a system are specified.
2. Measuring Performance
To measure performance, we have to measure on an actual (possibly prototype) system. Often costly & based on instrumentation. We can either:
- Monitor: constant monitoring required to enforce SLAs. Observe system performance, collect statistics, analyze data & report SLA violations. Can incur costs, and often not continuous.
- Benchmark: two step process, first get system into a predefined steady state, then perform a series of operations (workload) while measuring performance. Can happen outside production. A workload can be batch (e.g. query set) for measuring throughput, or interactive (uses a driver that generates requests) for measuring latency.
A single datapoint is too noisy, we need to aggregate multiple runs and report a measure of variance.
Alternatively, we can model performance using analytical models or simulations. We could also use a hybrid approach.
3. Identifying Optimisation Opportunities
Parameters (aka resources) are the system and workload characteristics that affect performance. Can be:
- System parameters: dont change while system runs (e.g. CPU, instruction costs).
- Workload parameters: change while system runs (e.g. users, available memory, clock rate).
Parameters can be numeric or nominal (e.g. runs on a battery, has a GPU).
3.1 Utilization & Bottlenecks
A service has a certain amount of resources available to perform. The total budget of available units of a resource is a parameter. Utilization is the percentage of a resource used to perform the service.
A bottleneck is the resource with the highest utilisation (e.g. CPU-bound).
3.2 Code Paths
Many complex systems have frew performance dominating codepaths. By restricting optimisation efforts to these codepaths, we can get the most performance gain for the least effort. A critical path is the sequential part of the code. A hot path is the path that takes the most time.
4. Optimisation
The goal is to quickly compare design alternatives to select optimal parameters. This is called parameter tuning. Usually only change system parameters, minimising resource consumption or maximising a performance metric. This is very expensive, which can be optimised with analytical models.
An analytical performance model is a formal characterisation of the relationship between system parameters and performance metrics. We must model a dynamic system using a small static model, which is complicated. Can be stateless (e.g. characterising equations) or stateful (e.g. Markov chains). These are fast, simplifying tuning. However, they are often inaccurate and hard to build.
A simulation is a single observed run of a stateful model. Can be extremely expensive to calculate.