CSCI 370 Computer Architecture: Homework 5 Solutions

Due date (firm): On or before Thursday, May 2, 2024
Absolutely no copying others’ works
Name: _____Professor Hu______


This homework is intended to help you understand the relationship between forwarding, hazard detection, and ISA (instruction set architecture) design. Problems in this exercise refer to the following sequence of instructions, and assume that it is executed on a 5-stage pipelined datapath:
             I1: add  $5, $2, $1
             I2: lw   $3, 4($5)
             I3: lw   $2, 0($2)
             I4: or   $3, $5, $3
             I5: sw   $3, 0($5)
  1. (20%) Indicate dependences and their types (i.e., RAR, RAW, WAR, or WAW).
    Each dependence includes a type, a register, and two different instructions; e.g., RAW on $5 for I1 and I2.
    Ans>
            I1: add  $5, $2, $1          # $5 = $2 + $1
            I2: lw   $3, 4($5)           # $3 = mem[[$5]+4]
            I3: lw   $2, 0($2)           # $2 = mem[[$2]+0]
            I4: or   $3, $5, $3          # $3 = $5 ∨ $3
            I5: sw   $3, 0($5)           # mem[[$5]+0] = $3
    1. RAW on $5 for I1 and I2
    2. RAW on $5 for I1 and I4
    3. RAW on $5 for I1 and I5
    4. RAW on $3 for I2 and I4
    5. RAW on $3 for I2 and I5
    6. RAW on $3 for I4 and I5
    7. WAR on $2 for I1 and I3
    8. RAR on $2 for I1 and I3
    9. RAR on $5 for I2 and I4
    10. RAR on $5 for I2 and I5
    11. RAR on $3 for I4 and I5
    12. RAR on $5 for I4 and I5
    13. WAW on $3 for I2 and I4

  2. (10%) If there is no forwarding or hazard detection, insert nops to ensure correct execution.
    Ans>
         I1: add  $5, $2, $1
             nop
             nop
         I2: lw   $3, 4($5)
         I3: lw   $2, 0($2)
             nop
         I4: or   $3, $5, $3
             nop
             nop
         I5: sw   $3, 0($5)
  3. (10%) Now, change and/or rearrange the code to minimize the number of nops needed. You can assume register $t0 can be used to hold temporary values in your modified code.
    Ans>
      It is NOT possible to reduce the number of nops.

        For your information: We can move up an instruction by swapping its place with another instruction that has no dependences with it, so we can try to fill some nop slots with such instructions. However, it does not reduce the number of nops.
           I1: add  $5, $2, $1
           I3: lw   $2, 0($2)            # Moved up to fill nop slot
               nop
           I2: lw   $3, 4($5)
               nop                       # Had to add another nop here.
               nop                       # So there is no performance gain.
           I4: or   $3, $5, $3
               nop
               nop
           I5: sw   $3, 0($5)
  4. (10%) If the processor has forwarding, but we forgot to implement the hazard detection unit, what happens when the original code executes?
    Ans>
      The code executes correctly. We need hazard detection only to insert a stall when the instruction following a load uses the result of the load. That does not happen in this case.

  5. (20%) If there is forwarding, for the first seven cycles during the execution of this code, complete the table below and specify the following control signal values of the original code:

    • PCWrite from the hazard detection unit,
    • IF/IDWrite from the hazard detection unit,
    • ForwardA from the forwarding unit for the upper multiplexor of an ALU input, and
    • ForwardB from the forwarding unit for the lower multiplexor of an ALU input

    in each cycle by hazard detection and forwarding units in the figure in Slide 14.12. The control values for the forwarding multiplexors are given in the table in Slide 14.10.
    Use the symbols “***,” “---,” and “X” to represent a stall, a flush, and “Don’t care,” respectively.
    Ans>
            I1: add  $5, $2, $1          # $5 = $2 + $1
            I2: lw   $3, 4($5)           # $3 = mem[[$5]+4]
            I3: lw   $2, 0($2)           # $2 = mem[[$2]+0]
            I4: or   $3, $5, $3          # $3 = $5 ∨ $3
            I5: sw   $3, 0($5)           # mem[[$5]+0] = $3
    Instruction Sequence First Seven Cycles
    1 2 3 4 5 6 7 8
    add IF ID EX MEM WB        
    lw   IF ID EX MEM WB      
    lw     IF ID EX MEM WB    
    or       IF ID EX MEM WB  
    sw         IF ID EX MEM WB
      PCWrite=1
    IF/IDWrite=1
    ForwardA=X
    ForwardB=X
    PCWrite=1
    IF/IDWrite=1
    ForwardA=X
    ForwardB=X
    PCWrite=1
    IF/IDWrite=1
    ForwardA=00
    ForwardB=00
    PCWrite=1
    IF/IDWrite=1
    ForwardA=01
    ForwardB=00
    PCWrite=1
    IF/IDWrite=1
    ForwardA=00
    ForwardB=00
    PCWrite=1
    IF/IDWrite=1
    ForwardA=00
    ForwardB=10
    PCWrite=1
    IF/IDWrite=1
    ForwardA=00
    ForwardB=00
       

    1. ForwardA = X; ForwardB = X (no instruction in EX stage yet)
    2. ForwardA = X; ForwardB = X (no instruction in EX stage yet)
    3. ForwardA = 00; ForwardB = 00 (no forwarding; values taken from registers)
    4. ForwardA = 01; ForwardB = 00 (base register taken from result of previous instruction)
    5. ForwardA = 00; ForwardB = 00 (base register taken from registers)
    6. ForwardA = 00; ForwardB = 10 (rs = $5 taken from register; rt = $3 taken from result of the 1st lw – two instructions ago)
    7. ForwardA = 00; ForwardB = 00 (base register taken from register file)

  6. (10%) If there is no forwarding, what new inputs and output signals do we need for the hazard detection unit in the figure in Slide 14.12? Using this instruction sequence as an example, explain why each signal is needed.
    Ans>
      Additional Inputs:
      The instruction that is currently in the ID stage needs to be stalled if it depends on a value produced by (or forwarded from) the instruction in the EX or the instruction in the MEM stage. So we need to check the destination registers of these two instructions.
      • For the instruction in the EX stage, we need to check value of Rw from EX/MEM register.
      • For the instruction in the MEM stage, the destination register is already selected, so we need to check that register number (this is the bottommost output of the MEM/WB pipeline register).
      The hazard unit already has the value of Rw from the EX/MEM register as inputs, so we need only add the value of Rw from the MEM/WB register.

      Additional outputs:
      No additional outputs are needed. We can stall the pipeline using the three output signals that we already have. The value of Rw from EX/MEM is needed to detect the data hazard between the add and the following lw. The value of Rw form MEM/WB is needed to detect the data hazard between the first lw instruction and the or instruction.

  7. (20%) For the new hazard detection unit (without forwarding) from the previous problem, complete the table below and specify which output signals it asserts in each of the first six cycles during the execution of this code using the figure in Slide 14.12.
    Use the symbol “***” to represent a stall and the symbol “---” for a flush.
    Ans>
      As explained in the previous problem, we only need to specify the value of the PCWrite signal, because IF/IDWrite is equal to PCWrite and the bubble (ID/EXzero) signal is its opposite. We have:

      Instruction Sequence First Six Cycles
      1 2 3 4 5 6
      add $5, $2, $1 IF ID EX MEM WB
      lw $3, 4($5)   IF ID *** *** EX
      lw $2, 0($2)     IF *** *** ID
      or $3, $5, $3       *** *** IF
      sw $3, 0($5)            
        PCWrite=1
      IF/IDWrite=1
      bubble=0
      PCWrite=1
      IF/IDWrite=1
      bubble=0
      PCWrite=1
      IF/IDWrite=1
      bubble=0
      PCWrite=0
      IF/IDWrite=0
      bubble=1
      PCWrite=0
      IF/IDWrite=0
      bubble=1
      PCWrite=1
      IF/IDWrite=1
      bubble=0