CS 3330: HCL2D part 5: PIPE Lab 1

This page does not represent the most current semester of this course; it is present merely as an archive.

In this lab we’ll add some basic pipelining to a subset of the Y86-64 instruction set. In particular, we’ll deal with a subset of instructions: nop, halt, irmovq, and rrmovq. We’ll add just one pipeline register, between decode and writeback (there is no execute or memory phase for these instructions).

Download pipelab1_base.hcl to get a copy of the sequential simulator with only those instructions implemented.

1 Approach

To add pipelining,

Identify where in your code the pipeline register should go
Identify which wires cross that point and put them in a pipeline register
Replace wires with register inputs and outputs
Look for hazards and solve them using stalls and/or forwards

We’ll explore this idea by adding a pipeline register between decode and writeback. Following the textbook’s tradition, we’ll call the input side of the register d for decode and the output side W for writeback:

register dW {
    # todo: fill in the details here
}

2 What wires cross the pipeline register bank?

Look through pipelab1_base.hcl; each value used as an input in writeback that is not also computed in that stage will need to be stored in a pipeline. For example, the reg_inputE mux uses icode, reg_outputA, and valC as inputs, so we’ll need all three of those in our new register, as well as registers for the other signals used as inputs in writeback.

Always pick the default values in the pipeline register to be the values you’d expect for nop; in NOP in the icode, REG_NONE in any register spots, etc.

3 Replace wires with register inputs/outputs

Recall that if we name our register bank dW then whatever signal we put into d_thing will come out of W_thing on the next cycle.

Go through each signal and, if it crosses the register bank, replace every use before the register bank with d_... and ever use after the register bank with W_....

For example, consider icode:

Remove the wire:4 icode declaration since we have it in dW.
In fetch and decode, replace all occurrences of icode with d_icode
In writeback replace all occurrences of icode with W_icode

Do the same thing with valC.

The signals reg_outputA, reg_dstE, and Stat have to be treated specially because they interact with fixed functionality. Thus, reg_outputA (an output created during decode) will need to be saved into d_... during decode and used as W_... afterwards, as in

# in decode:
d_rvalA = reg_outputA;
# in execute and later phases, used W_rvalA instead of reg_outputA

Similarly, reg_dstE will need to be originally computed as d_dstE during Decode and then reg_dstE = W_dstE placed in writeback to get that value back out. Stat is an output like reg_dstE and will need the same treatment (set d_Stat before the pipeline register and Stat = W_Stat afterward).

At this point, the rrmovq.yo we made in lab2

irmovq $5678, %rax
irmovq $34, %rcx
rrmovq %rax, %rdx
rrmovq %rcx, %rax

should take 6 (not 5) cycles to set three registers:

| RAX:               22   RCX:               22   RDX:             162e |

and should leave the pc at address 0x18. It should also take a few less cycles overall than the 690 used by pipelab1_base.hcl as a result of increased throughput, though if it does not don’t be worried; we aren’t focusing on speed right now.

4 Look for hazards

Consider

irmovq $1, %rax
rrmovq %rax, %rbx

In a pipeline diagram (given that we have no execute or memory phases), these will look like

Instr	cycle 1	cycle 2	cycle 3
`irmovq`	FD	W
`rrmovq`		FD	W

Note that the immediate value won’t be written to the register file until the after of cycle 2, but it will be attempted to be read by the next instruction at the during of cycle 2. This is an example of a data dependency that exercises a hazard in our hardware design so far.

We can bypass this hazard in two ways. We can either stall, or we can forward data. Forwarding is always preferred to stalling if both are possible, so we’ll forward.

4.1 Forward

We want to grab the value that is being prepped for writing to the register file before it actually gets written if it is the register we are trying to read. Thus, d_rvalA will be reg_outputA unless reg_dstE is both (1) not REG_NONE and (2) the same as the decode phase’s reg_srcA; in that case, we’ll forward reg_inputE into d_rvalA instead.

If correctly implemented, y86/irrr7.yo

irmovq $1, %rax
rrmovq %rax, %rbx

should take 4 cycles to put a 1 in both %rax and %rbx, while y86/rrmovq.yo

irmovq $5678, %rax
irmovq $34, %rcx
rrmovq %rax, %rdx
rrmovq %rcx, %rax

should still take 6 cycles and result in

| RAX:               22   RCX:               22   RDX:             162e |

like it did before.

I mention the number of cycles because the other solution (stalling) would increase them.

5 Understanding HCL Errors

5.1 Use debug mode

If your hcl compiles, you can run it in debug mode: mysimulator.exe -i -d somefile.yo

5.2 Initialize what you declare

The simulator has to provide input to every wire and register in order to run. It does not know what those inputs should be unless you tell it. Thus, if you say

wire baz:4;
register qB { xyxxy:32 = 0; }

then you must also say

baz = something;
q_xyxxy = something_else;

or else you will get

ERROR: failed to initialize baz, q_xyxxy

5.3 Known bug with error messages

If you put complicated expressions inside a mux, you might get nonsensical error messages. In particular, do not put a wire slice operator or a mux inside a mux.

If you encounter another bug in HCL2D, email prof Tychonievich your .hcl so he can diagnose and fix hcl2d.

5.4 `make` often

We’ve been telling you this for years now, but make pipelab1.exe often! Particularly when working with a language you don’t know well, frequent feedback is useful.

6 Submit

Submit pipelab1.hcl on the submission page.

If you didn’t have time to finish everything, still submit the file (it’s OK if it is incomplete; we are looking for effort more than correctness).

7 For your edification

If you want to understand pipelines more, I’d encourage you to add another pipeline register between Fetch and Decode. Don’t submit that three-stage-pipeline file, though.