Dynamic Scheduling

1. Instruction Parallelism

We want to allow instructions behind a stall to proceed. This enables out-of-order execution and allows out-of-order completion. We need to distinguish between when an instruction is issued, begins exeuction and completes execution. In a dynamically scheduled pipeline, all instructions are issued in-order.

Execution order is constrained by data dependencies, name dependencies and control dependencies. Instruction is dependent on instruction if:

RAW HazardWAR HazardWAW Hazard

2. Tomasulo

To enable dynamic scheduling, we need to:

In Tomasulo's design, registers each have a tag that indicates which unit will produce the value. If the tag is empty, the register contains the most recent value. If not, it is waiting for a value to be produced. Each unit is given a reservation station which waits for correct operands to be available and then executes the instruction. When an instruction is issued:

  1. ISSUE: An idle reservation station is selected. If the operands are stored at valid registers (no tag), set them. Set the destination register's tag to the reservation station's ID. Overwrite if necessary.
  2. EXECUTE: If an operand becomes available on the control bus with a matching tag, set it. When all the operands are available, execute the instruction.
  3. WRITE RESULT: When the result is available, broadcast it and the reservation station's ID on the control bus. This will update the registers and waiting reservation stations.

We have two databuses:

This design is quite complicated, having many associative stores at high speed. Performance is limited by the common DB, due to its high wiring density. Additionally, we can only handle one completion per cycle. Additionally, it has non precise interrupts.

Power of Tomasulo

Tomasulo's scheme can even execute loop iterations in parallel. It effectively implements register renaming with the CDB! It is building the data flow dependency graph on the fly.

2.1 Reorder Buffer

However, Tomasulo's scheme doesn't handle exceptions such as page faults & divide by zero (we don't know when the program errored). We also can't rollback after a branch misprediction, as instructions may have already written their results. To fix these, we can add a commit stage to the Tomasulo pipeline, and maintain a reorder buffer:

During commit, if the prediction does not match the actual result in ROB, we flush the ROB, reset issue-side registers to committed registers, and restart fetching from the correct address. However:

2.2 Speculative Stores & Loads

We need to make sure stores aren't sent to memory until the instruction is committed. But, we need to make sure loads on the same location still work.

Store to Load Forwarding is when we have an uncommitted store followed by a load to the same address. The load can get the value from the store buffer instead of memory. The problem is that real computers use computed addresses which means we don't know an address until execution. We could speculate that the load and store addresses are different, and if they turn out to be the same, we flush and restart.

2.3 Register Update Unit

Register Update Units (RUUs) combine instruction tracking and register updates in one structure for out-of-order execution. Reorder Buffers (ROBs), by contrast, store results separately to ensure in-order retirement and precise exceptions. RUUs integrate more functions, while ROBs simplify commit handling.

In ROB, registers and ROB entries have a tag, so every register, ROB entry and reservation station needs a comparator to monitor the databus. In RUU, the tags are the ROB entry numbers, so the ROB is indexed by the tag on the databus.

Back to Home