CS 3330: HCL part 5: PIPE Lab 1

This page is for a prior offering of CS 3330. It is not up-to-date.

In this lab we’ll add some basic pipelining to a subset of the Y86-64 instruction set. In particular, we’ll deal with a subset of instructions: nop, halt, irmovq, and rrmovq. We’ll add just one pipeline register, between decode and writeback (there is no execute or memory phase for these instructions).

Download pipelab1_base.hcl to get a copy of the sequential simulator with only those instructions implemented.

1 Approach

To add pipelining,

  1. Identify where in your code the pipeline register should go
  2. Identify which wires cross that point and put them in a pipeline register
  3. Replace wires with register inputs and outputs
  4. Look for hazards and solve them using stalls and/or forwards

We’ll explore this idea by adding a pipeline register between decode and writeback. Following the textbook’s tradition, we’ll call the input side of the register d for decode and the output side W for writeback:

register dW {
    # todo: fill in the details here
}

2 What wires cross the pipeline register bank?

Look through pipelab1_base.hcl; each value used as an input in writeback that is not also computed in that stage will need to be stored in a pipeline. For example, the reg_inputE mux uses icode, reg_outputA, and valC as inputs, so we’ll need all three of those in our new register, as well as registers for the other signals used as inputs in writeback.

Always pick the default values in the pipeline register to be the values you’d expect for nop; in NOP in the icode, REG_NONE in any register spots, etc.

3 Replace wires with register inputs/outputs

Recall that if we name our register bank dW then whatever signal we put into d_thing will come out of W_thing on the next cycle.

Go through each signal and, if it crosses the register bank, replace every use before the register bank with d_... and ever use after the register bank with W_....

For example, consider icode:

Do the same thing with valC.

The signals reg_outputA, reg_dstE, and Stat have to be treated specially because they interact with fixed functionality. Thus, reg_outputA (an output created during decode) will need to be saved into d_... during decode and used as W_... afterwards, as in

# in decode:
d_rvalA = reg_outputA;
# in execute and later phases, used W_rvalA instead of reg_outputA

Similarly, reg_dstE will need to be originally computed as d_dstE during Decode and then reg_dstE = W_dstE placed in writeback to get that value back out. Stat is an output like reg_dstE and will need the same treatment (set d_Stat before the pipeline register and Stat = W_Stat afterward).

At this point, the rrmovq.yo we used for irrr.hcl

irmovq $5678, %rax
irmovq $34, %rcx
rrmovq %rax, %rdx
rrmovq %rcx, %rax

should take 6 (not 5) cycles to set three registers:

| RAX:               22   RCX:               22   RDX:             162e |

Once you handle halt according to the instructions below, it should leave the PC at address 0x18, like the single-cycle processor.

4 Look for hazards

Consider

irmovq $1, %rax
rrmovq %rax, %rbx

In a pipeline diagram (given that we have no execute or memory phases), these will look like

Instr cycle 1 cycle 2 cycle 3
irmovq FD W
rrmovq FD W

Note that the immediate value won’t be written to the register file until the after of cycle 2, but it will be attempted to be read by the next instruction at the during of cycle 2. This is an example of a data dependency that exercises a hazard in our hardware design so far.

We can bypass this hazard in two ways. We can either stall, or we can forward data. Forwarding is always preferred to stalling if both are possible, so we’ll forward.

4.1 Forward

We want to grab the value that is being prepped for writing to the register file before it actually gets written if it is the register we are trying to read. Thus, d_rvalA will be reg_outputA unless reg_dstE is both (1) not REG_NONE and (2) the same as the decode phase’s reg_srcA; in that case, we’ll forward reg_inputE into d_rvalA instead.

If correctly implemented, y86/irrr7.yo

irmovq $1, %rax
rrmovq %rax, %rbx

should take 4 cycles to put a 1 in both %rax and %rbx, while y86/rrmovq.yo

irmovq $5678, %rax
irmovq $34, %rcx
rrmovq %rax, %rdx
rrmovq %rcx, %rax

should still take 6 cycles and result in

| RAX:               22   RCX:               22   RDX:             162e |

like it did before.

I mention the number of cycles because the other solution (stalling) would increase them.

4.2 Handling halt

You can think of halt and invalid instructions as special kind of control hazard, since the instructions after a halt (or an invalid instruction) are not supposed to run.

We recommend stalling the register that feeds the fetch stage when you encounter a halt or invalid instruction to avoid starting to executing instructions that aren’t part of the program. However, in the two-stage pipeline in this lab, instructions do not change any state (memory, program registers, condition codes) until the last stage, so this is not strictly necessary.

5 Understanding HCL Errors

5.1 Use debug mode

If your hcl compiles, you can run it in debug mode: mysimulator.exe -i -d somefile.yo

5.2 Initialize what you declare

The simulator has to provide input to every wire and register in order to run. It does not know what those inputs should be unless you tell it. Thus, if you say

wire baz:4;
register qB { xyxxy:32 = 0; }

then you must also say

baz = something;
q_xyxxy = something_else;

or else you will get an error.

5.3 Test your code often

We’ve been telling you this for years now, test your code often! At least check that it compiles with ./hclrs --check pipelab1.hcl. Particularly when working with a language you don’t know well, frequent feedback is useful.

6 Testing your Code

You can run make test-pipelab1 to test your code on the list of testcases in testdata/pipelab1-tests.txt, comparing its output to reference outputs we have included in testdata/pipelab1-reference.

See here for an explanation of the output format.

7 Submit

Submit pipelab1.hcl on the submission page.

If you didn’t have time to finish everything, still submit the file (it’s OK if it is incomplete; we are looking for effort more than correctness).

8 For your edification

If you want to understand pipelines more, I’d encourage you to add another pipeline register between Fetch and Decode. Don’t submit that three-stage-pipeline file, though.

Copyright © 2016–2017 by Samira Khan, Luther Tychonievich, and Charles Reiss.
Last updated 2017-10-16 21:00:48