# Lecture: Pipelining Hazards

- Topics: Basic pipelining implementation, hazards, bypassing
- HW2 posted, due Wednesday

- For the following code sequence, show how the instrs flow through the pipeline:
  - ADD R1, R2, → R3 BEZ R4, [R5]
  - LD [R6]  $\rightarrow$  R7 ST [R8]  $\leftarrow$  R9







 Convert this C code into equivalent RISC assembly instructions

a[i] = b[i] + c[i];

 Convert this C code into equivalent RISC assembly instructions

```
a[i] = b[i] + c[i];
```

LD [R1], R2 # R1 has the address for variable i MUL R2, 8, R3 # the offset from the start of the array ADD R4, R3, R7 # R4 has the address of a[0] ADD R5, R3, R8 # R5 has the address of b[0] ADD R6, R3, R9 # R6 has the address of c[0] LD [R8], R10 # Bringing b[i] LD [R9], R11 # Bringing c[i] ADD R10, R11, R12 # Sum is in R12 ST [R7], R12 # Putting result in a[i] 5

## A 5-Stage Pipeline



Source: H&P textbook <sup>6</sup>



- Structural hazards: different instructions in different stages (or the same stage) conflicting for the same resource
- Data hazards: an instruction cannot continue because it needs a value that has not yet been generated by an earlier instruction
- Control hazard: fetch cannot continue because it does not know the outcome of an earlier branch – special case of a data hazard – separate category because they are treated in different ways

- Example: a unified instruction and data cache → stage 4 (MEM) and stage 1 (IF) can never coincide
- The later instruction and all its successors are delayed until a cycle is found when the resource is free → these are pipeline bubbles
- Structural hazards are easy to eliminate increase the number of resources (for example, implement a separate instruction and data cache)

## A 5-Stage Pipeline



Source: H&P textbook <sup>9</sup>

Show the instruction occupying each stage in each cycle (no bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R7+R8→R9
 CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8



Show the instruction occupying each stage in each cycle (no bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R7+R8→R9
 CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8



Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
 Identify the input latch for each input operand.

CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8 IF IF IF IF IF IF IF IF D/RD/RD/RD/RD/RD/RD/RD/R



Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
 Identify the input latch for each input operand.

CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8 CYC-1 IF IF IF IF IF IF IF IF 12 13 14 15 11 D/RD/RD/RD/RD/RD/RD/RD/R14 11 12 13 L5 L3 ALU L4 L3 ALU L3 L3 ALU ALU ALU ALU ALU ALU 11 12 13 DM DM DM DM DM DM DM DM 13 11 12 RW RW RW RW RW RW RW RW 12 13 11

## **Pipeline Implementation**

Signals for the muxes have to be generated – some of this can happen during ID
Need look-up tables to identify situations that merit bypassing/stalling – the number of inputs to the muxes goes up



14





Source: H&P textbook <sup>15</sup>





Source: H&P textbook <sup>16</sup>







 For the 5-stage pipeline, bypassing can eliminate delays between the following example pairs of instructions: add/sub
 R1, R2, R3 add/sub/lw/sw
 R4, R1, R5

| lw | R1, 8(R2) |
|----|-----------|
| SW | R1, 4(R3) |

The following pairs of instructions will have intermediate stalls:

| lw<br>add/sub/ | R1, 8(R2)<br>/lw R3, R1, R4 | or | SW | R3, 8(R1) |
|----------------|-----------------------------|----|----|-----------|
|                | F1, F2, F3<br>F5, F1, F4    |    |    |           |

• Consider this 8-stage pipeline



- For the following pairs of instructions, how many stalls will the 2<sup>nd</sup> instruction experience (with and without bypassing)?
  - ADD R1+R2→R3
     ADD R3+R4→R5
  - LD [R1]→R2 ADD R2+R3→R4
  - LD [R1]→R2 SD [R2]←R3
  - LD [R1]→R2 SD [R3]←R2

Consider this 8-stage pipeline (RR and RW take a full cycle)

IF DE RR AL AL DM DM RW

- For the following pairs of instructions, how many stalls will the 2<sup>nd</sup> instruction experience (with and without bypassing)?
  - ADD R1+R2→R3
     ADD R3+R4→R5
  - LD [R1]→R2
     ADD R2+R3→R4
  - LD [R1]→R2 SD [R2]←R3
  - LD [R1]→R2 SD [R3]←R2

- without: 5 with: 1
- without: 5 with: 3
- without: 5 with: 3
- without: 5 with: 1



#### Bullet