## Lecture 4: Advanced Pipelines

 Data hazards, control hazards, multi-cycle in-order pipelines (Appendix C.4-C.8)

# A 5-Stage Pipeline



Source: H&P textbook <sup>2</sup>



- Structural hazards: different instructions in different stages (or the same stage) conflicting for the same resource
- Data hazards: an instruction cannot continue because it needs a value that has not yet been generated by an earlier instruction
- Control hazard: fetch cannot continue because it does not know the outcome of an earlier branch – special case of a data hazard – separate category because they are treated in different ways

## **Data Hazards**



# Bypassing



Some data hazard stalls can be eliminated: bypassing



# **Pipeline Implementation**

- Signals for the muxes have to be generated some of this can happen during ID
- Need look-up tables to identify situations that merit bypassing/stalling the number of inputs to the muxes goes up



@ 2007 Elsevier, Inc. All rights reserved.

7

# **Detecting Control Signals**

| Situation                 | Example code            | Action                                  |  |  |
|---------------------------|-------------------------|-----------------------------------------|--|--|
| No dependence             | LD <b>R1</b> , 45(R2)   | No hazards                              |  |  |
|                           | DADD R5, R6, R7         |                                         |  |  |
|                           | DSUB R8, R6, R7         |                                         |  |  |
|                           | OR R9, R6, R7           |                                         |  |  |
| Dependence                | LD <b>R1</b> , 45(R2)   | Detect use of R1 during ID of DADD      |  |  |
| requiring stall           | DADD R5, <b>R1</b> , R7 | and stall                               |  |  |
|                           | DSUB R8, R6, R7         |                                         |  |  |
|                           | OR R9, R6, R7           |                                         |  |  |
| Dependence                | LD <b>R1</b> , 45(R2)   | Detect use of R1 during ID of DSUB      |  |  |
| overcome by<br>forwarding | DADD R5, R6, R7         | and set mux control signal that accepts |  |  |
|                           | DSUB R8, <b>R1</b> , R7 | result from bypass path                 |  |  |
|                           | OR R9, R6, R7           |                                         |  |  |
| Dependence with           | LD <b>R1</b> , 45(R2)   | No action required                      |  |  |
| accesses in order         | DADD R5, R6, R7         |                                         |  |  |
|                           | DSUB R8, R6, R7         |                                         |  |  |
|                           | OR R9, <b>R1</b> , R7   | 8                                       |  |  |















 For the 5-stage pipeline, bypassing can eliminate delays between the following example pairs of instructions: add/sub
R1, R2, R3 add/sub/lw/sw
R4, R1, R5

| lw | R1, 8(R2) |
|----|-----------|
| SW | R1, 4(R3) |

The following pairs of instructions will have intermediate stalls:

| lw<br>add/sub/ | ĺw | R1, 8(R<br>R3, R1, | , | or | SW | R3, 8(R1) |
|----------------|----|--------------------|---|----|----|-----------|
| fmul<br>fadd   | ,  | 2, F3<br>1, F4     |   |    |    |           |

- Simple techniques to handle control hazard stalls:
  - for every branch, introduce a stall cycle (note: every 6<sup>th</sup> instruction is a branch!)
  - assume the branch is not taken and start fetching the next instruction – if the branch is taken, need hardware to cancel the effect of the wrong-path instruction
  - fetch the next instruction (branch delay slot) and execute it anyway – if the instruction turns out to be on the correct path, useful work was done – if the instruction turns out to be on the wrong path, hopefully program state is not lost

### **Branch Delay Slots**



© 2007 Elsevier, Inc. All rights reserved.

14

- Perfect pipelining with no hazards → an instruction completes every cycle (total cycles ~ num instructions)
  → speedup = increase in clock speed = num pipeline stages
- With hazards and stalls, some cycles (= stall time) go by during which no instruction completes, and then the stalled instruction completes
- Total cycles = number of instructions + stall cycles
- Slowdown because of stalls = 1/(1 + stall cycles per instr)

# **Pipelining Limits**



Gap between indep instrs: T + Tovh Gap between dep instrs: T + Tovh



F

Α

B

С

D

Ε

В

Gap between indep instrs: T/3 + Tovh Gap between dep instrs: T + 3Tovh

Gap between indep instrs:  $T/6 + T_{ovh}$ Gap between dep instrs:  $T + 6T_{ovh}$ 

Assume that there is a dependence where the final result of the first instruction is required before starting the second instruction <sup>16</sup>

Ε

F

# **Multicycle Instructions**



© 2007 Elsevier, Inc. All rights reserved.

| Functional unit | Latency | Initiation interval |  |
|-----------------|---------|---------------------|--|
| Integer ALU     | 1       | 1                   |  |
| Data memory     | 2       | 1                   |  |
| FP add          | 4       | 1                   |  |
| FP multiply     | 7       | 1                   |  |
| FP divide       | 25      | 25                  |  |

# Effects of Multicycle Instructions

- Structural hazards if the unit is not fully pipelined (divider)
- Frequent RAW hazard stalls
- Potentially multiple writes to the register file in a cycle
- WAW hazards because of out-of-order instr completion
- Imprecise exceptions because of o-o-o instr completion

Note: Can also increase the "width" of the processor: handle multiple instructions at the same time: for example, fetch two instructions, read registers for both, execute both, etc.

### • On an exception:

- must save PC of instruction where program must resume
- Il instructions after that PC that might be in the pipeline must be converted to NOPs (other instructions continue to execute and may raise exceptions of their own)
- temporary program state not in memory (in other words, registers) has to be stored in memory
- potential problems if a later instruction has already modified memory or registers
- A processor that fulfils all the above conditions is said to provide precise exceptions (useful for debugging and of course, correctness)

- Multiple writes to the register file: increase the number of ports, stall one of the writers during ID, stall one of the writers during WB (the stall will propagate)
- WAW hazards: detect the hazard during ID and stall the later instruction
- Imprecise exceptions: buffer the results if they complete early or save more pipeline state so that you can return to exactly the same state that you left at



### Bullet