This page does not represent the most current semester of this course; it is present merely as an archive.
In this lab we’ll add some basic pipelining to a subset of the Y86-64 instruction set. In particular, we’ll deal with a subset of instructions: nop
, halt
, irmovq
, and rrmovq
. We’ll add just one pipeline register, between decode and writeback (there is no execute or memory phase for these instructions).
Download pipelab1_base.hcl to get a copy of the sequential simulator with only those instructions implemented.
To add pipelining,
We’ll explore this idea by adding a pipeline register between decode and writeback. Following the textbook’s tradition, we’ll call the input side of the register d
for decode and the output side W
for writeback:
register dW {
# todo: fill in the details here
}
Look through pipelab1_base.hcl
; each value used as an input in writeback that is not also computed in that stage will need to be stored in a pipeline. For example, the reg_inputE
mux uses icode
, reg_outputA
, and valC
as inputs, so we’ll need all three of those in our new register, as well as registers for the other signals used as inputs in writeback.
Always pick the default values in the pipeline register to be the values you’d expect for nop
; in NOP
in the icode
, REG_NONE
in any register spots, etc.
Recall that if we name our register bank dW
then whatever signal we put into d_thing
will come out of W_thing
on the next cycle.
Go through each signal and, if it crosses the register bank, replace every use before the register bank with d_...
and ever use after the register bank with W_...
.
For example, consider icode
:
wire:4 icode
declaration since we have it in dW
.icode
with d_icode
icode
with W_icode
Do the same thing with valC
.
The signals reg_outputA
, reg_dstE
, and Stat
have to be treated specially because they interact with fixed functionality. Thus, reg_outputA
(an output created during decode) will need to be saved into d_...
during decode and used as W_...
afterwards, as in
# in decode:
d_rvalA = reg_outputA;
# in execute and later phases, used W_rvalA instead of reg_outputA
Similarly, reg_dstE
will need to be originally computed as d_dstE
during Decode and then reg_dstE = W_dstE
placed in writeback to get that value back out. Stat
is an output like reg_dstE
and will need the same treatment (set d_Stat
before the pipeline register and Stat = W_Stat
afterward).
At this point, the rrmovq.yo
we made in lab2
irmovq $5678, %rax
irmovq $34, %rcx
rrmovq %rax, %rdx
rrmovq %rcx, %rax
should take 6 (not 5) cycles to set three registers:
| RAX: 22 RCX: 22 RDX: 162e |
and should leave the pc at address 0x18. It should also take a few less cycles overall than the 690 used by pipelab1_base.hcl
as a result of increased throughput, though if it does not don’t be worried; we aren’t focusing on speed right now.
Consider
irmovq $1, %rax
rrmovq %rax, %rbx
In a pipeline diagram (given that we have no execute or memory phases), these will look like
Instr | cycle 1 | cycle 2 | cycle 3 |
---|---|---|---|
irmovq |
FD | W | |
rrmovq |
FD | W |
Note that the immediate value won’t be written to the register file until the after of cycle 2, but it will be attempted to be read by the next instruction at the during of cycle 2. This is an example of a data dependency that exercises a hazard in our hardware design so far.
We can bypass this hazard in two ways. We can either stall, or we can forward data. Forwarding is always preferred to stalling if both are possible, so we’ll forward.
We want to grab the value that is being prepped for writing to the register file before it actually gets written if it is the register we are trying to read. Thus, d_rvalA
will be reg_outputA
unless reg_dstE
is both (1) not REG_NONE
and (2) the same as the decode phase’s reg_srcA
; in that case, we’ll forward reg_inputE
into d_rvalA
instead.
If correctly implemented, y86/irrr7.yo
irmovq $1, %rax
rrmovq %rax, %rbx
should take 4 cycles to put a 1 in both %rax
and %rbx
, while y86/rrmovq.yo
irmovq $5678, %rax
irmovq $34, %rcx
rrmovq %rax, %rdx
rrmovq %rcx, %rax
should still take 6 cycles and result in
| RAX: 22 RCX: 22 RDX: 162e |
like it did before.
I mention the number of cycles because the other solution (stalling) would increase them.
If your hcl compiles, you can run it in debug mode: mysimulator.exe -i -d somefile.yo
The simulator has to provide input to every wire and register in order to run. It does not know what those inputs should be unless you tell it. Thus, if you say
wire baz:4;
register qB { xyxxy:32 = 0; }
then you must also say
baz = something;
q_xyxxy = something_else;
or else you will get
ERROR: failed to initialize baz, q_xyxxy
If you put complicated expressions inside a mux, you might get nonsensical error messages. In particular, do not put a wire slice operator or a mux inside a mux.
If you encounter another bug in HCL2D, email prof Tychonievich your .hcl
so he can diagnose and fix hcl2d
.
make
oftenWe’ve been telling you this for years now, but make pipelab1.exe
often! Particularly when working with a language you don’t know well, frequent feedback is useful.
Submit pipelab1.hcl
on the submission page.
If you didn’t have time to finish everything, still submit the file (it’s OK if it is incomplete; we are looking for effort more than correctness).
If you want to understand pipelines more, I’d encourage you to add another pipeline register between Fetch and Decode. Don’t submit that three-stage-pipeline file, though.