CS 3330: HCL2D part 5: PIPE Lab 1

This page is for a prior offering of CS 3330. It is not up-to-date.

In this lab we’ll add some basic pipelining to a subset of the Y86-64 instruction set. In particular, we’ll deal with a subset of instructions: nop, halt, irmovq, and rrmovq. We’ll add just one pipeline register, between decode and writeback (there is no execute or memory phase for these instructions).

Download pipelab1_base.hcl to get a copy of the sequential simulator with only those instructions implemented.

1 Approach

To add pipelining,

  1. Identify where in your code the pipeline register should go
  2. Identify which wires cross that point and put them in a pipeline register
  3. Replace wires with register inputs and outputs
  4. Look for hazards and solve them using stalls and/or forwards

We’ll explore this idea by adding a pipeline register between decode and writeback. Following the textbook’s tradition, we’ll call the input side of the register d for decode and the output side W for writeback:

register dW {
    # todo: fill in the details here

2 What wires cross the pipeline register bank?

Look through pipelab1_base.hcl; each value used as an input in writeback that is not also computed in that stage will need to be stored in a pipeline. For example, the reg_inputE mux uses icode, reg_outputA, and valC as inputs, so we’ll need all three of those in our new register, as well as registers for the other signals used as inputs in writeback.

Always pick the default values in the pipeline register to be the values you’d expect for nop; in NOP in the icode, REG_NONE in any register spots, etc.

3 Replace wires with register inputs/outputs

Recall that if we name our register bank dW then whatever signal we put into d_thing will come out of W_thing on the next cycle.

Go through each signal and, if it crosses the register bank, replace every use before the register bank with d_... and ever use after the register bank with W_....

For example, consider icode:

Do the same thing with valC.

The signals reg_outputA, reg_dstE, and Stat have to be treated specially because they interact with fixed functionality. Thus, reg_outputA (an output created during decode) will need to be saved into d_... during decode and used as W_... afterwards, as in

# in decode:
d_rvalA = reg_outputA;
# in execute and later phases, used W_rvalA instead of reg_outputA

Similarly, reg_dstE will need to be originally computed as d_dstE during Decode and then reg_dstE = W_dstE placed in writeback to get that value back out. Stat is an output like reg_dstE and will need the same treatment (set d_Stat before the pipeline register and Stat = W_Stat afterward).

At this point, the rrmovq.yo we made in lab2

irmovq $5678, %rax
irmovq $34, %rcx
rrmovq %rax, %rdx
rrmovq %rcx, %rax

should take 6 (not 5) cycles to set three registers:

| RAX:               22   RCX:               22   RDX:             162e |

and should leave the pc at address 0x18. It should also take a few less cycles overall than the 690 used by pipelab1_base.hcl as a result of increased throughput, though if it does not don’t be worried; we aren’t focusing on speed right now.

4 Look for hazards


irmovq $1, %rax
rrmovq %rax, %rbx

In a pipeline diagram (given that we have no execute or memory phases), these will look like

Instr cycle 1 cycle 2 cycle 3
irmovq FD W
rrmovq FD W

Note that the immediate value won’t be written to the register file until the after of cycle 2, but it will be attempted to be read by the next instruction at the during of cycle 2. This is an example of a data dependency that exercises a hazard in our hardware design so far.

We can bypass this hazard in two ways. We can either stall, or we can forward data. Forwarding is always preferred to stalling if both are possible, so we’ll forward.

4.1 Forward

We want to grab the value that is being prepped for writing to the register file before it actually gets written if it is the register we are trying to read. Thus, d_rvalA will be reg_outputA unless reg_dstE is both (1) not REG_NONE and (2) the same as the decode phase’s reg_srcA; in that case, we’ll forward reg_inputE into d_rvalA instead.

If correctly implemented, y86/irrr7.yo

irmovq $1, %rax
rrmovq %rax, %rbx

should take 4 cycles to put a 1 in both %rax and %rbx, while y86/rrmovq.yo

irmovq $5678, %rax
irmovq $34, %rcx
rrmovq %rax, %rdx
rrmovq %rcx, %rax

should still take 6 cycles and result in

| RAX:               22   RCX:               22   RDX:             162e |

like it did before.

I mention the number of cycles because the other solution (stalling) would increase them.

4.2 Handling halt

You can think of halt and invalid instructions as special kind of control hazard, since the instructions after a halt (or an invalid instruction) are not supposed to run.

We recommend stalling the register that feeds the fetch stage when you encounter a halt or invalid instruction to avoid starting to executing instructions that aren’t part of the program. However, in the two-stage pipeline in this lab, instructions do not change any state (memory, program registers, condition codes) until the last stage, so this is not strictly necessary.

5 Understanding HCL Errors

5.1 Use debug mode

If your hcl compiles, you can run it in debug mode: mysimulator.exe -i -d somefile.yo

5.2 Initialize what you declare

The simulator has to provide input to every wire and register in order to run. It does not know what those inputs should be unless you tell it. Thus, if you say

wire baz:4;
register qB { xyxxy:32 = 0; }

then you must also say

baz = something;
q_xyxxy = something_else;

or else you will get

ERROR: failed to initialize baz, q_xyxxy

5.3 Known bug with error messages

If you put complicated expressions inside a mux, you might get nonsensical error messages. In particular, do not put a wire slice operator or a mux inside a mux.

If you encounter another bug in HCL2D, email Prof Reiss your .hcl so he can diagnose and fix hcl2d.

5.4 make often

We’ve been telling you this for years now, but make pipelab1.exe often! Particularly when working with a language you don’t know well, frequent feedback is useful.

6 Submit

Submit pipelab1.hcl on the submission page.

If you didn’t have time to finish everything, still submit the file (it’s OK if it is incomplete; we are looking for effort more than correctness).

7 For your edification

If you want to understand pipelines more, I’d encourage you to add another pipeline register between Fetch and Decode. Don’t submit that three-stage-pipeline file, though.

Copyright © 2016–2017 by Samira Khan, Luther Tychonievich, and Charles Reiss.
Last updated 2017-03-27 13:55:21