This page does not represent the most current semester of this course; it is present merely as an archive.
In this lab we'll add some basic pipelining to a subset of the Y86 instruction set. In particular, we'll deal only with nop
, halt
, irmovl
, and rrmovl
and add a pipeline register between decode and writeback. Download lab5_base.hcl to get a copy of the simulator with only those instructions implemented.
To add pipelining,
We'll explore this idea by adding a pipeline register after decode (the "E" register bank in the textbook).
register E {
# todo: fill in the details here
}
Put it up at the top of the file (we'll need it to be defined before we first use any e_...
wire).
Let's consider each wire in our HCL file. In the reference lab4 solution that is:
pc
is output by our code before the pipeline registeri6bytes
is returned by the instruction memory before the pipeline register and used before it tooicode
is computed before the pipeline register but used on both sidesneed_regs
is computed and used before the pipeline registerneed_immediate
is computed and used before the pipeline registerrA
is computed and used before the pipeline registerrB
is computed and used before the pipeline registervalC
is computed before the pipeline register but used after itvalP
is computed and used before the pipeline registersrcA
is output by our code before the pipeline registerrvalA
is returned by the register file before the pipeline register but used after itdstE
is computed before the pipeline register but used after itwvalE
is output by our code after the pipeline registerStat
is output by our code after the pipeline register, but also used to stall the P
register bank before the pipeline registerp_pc
and stall_P
are interfacing with the P
register, which is before the pipeline registerThus, we'll need copies of icode
, valC
, rvalA
, dstE
, and Stat
in the E
pipeline register.
Always pick the default values in the pipeline register to be sensible nop
values that do nothing; that way when we start running and it takes a cycle for E
to have values given it by the previous stages it will not have done anything. In particular, that means putting icode:4 = NOP;
, dstE:4 = REG_NONE;
, and Stat:3 = STAT_AOK;
in register E
Recall that whatever signal we put into x_thing
will come out of X_thing
on the next cycle. Thus, any signal that needs to cross register bank E
will need to use e_...
on the pre-E
side and E_...
on the post-E
side.
Go through each signal and, if it crosses E
, replace every use before E
with e_...
and ever use after E
with E_...
.
For example, consider icode
:
wire:32 icode
declaration since we have it in E
.icode
with e_icode
icode
with E_icode
Do the same thing with valC
.
The signals rvalA
, dstE
, and Stat
have to be treated specially because they are inputs to our outputs from the register file. Thus, rvalA
(an output created during decode) will need to be saved into e_...
during decode and used as E_...
afterwards, as in
# in decode:
e_rvalA = rvalA;
# in execute and later phases, used E_rvalA instead of rvalA
Similarly, dstE
will need to be originally computed as e_dstE
during Fetch and then dstE = E_dstE
placed in writeback to get that value back out. Stat
is an output like dstE
and will need the same treatment (set e_Stat
before the pipeline register and Stat = E_Stat
afterward).
At this point, the rrmovl.yo
we made in lab2
irmovl $5678, %eax
irmovl $34, %ecx
rrmovl %eax, %edx
rrmovl %ecx, %eax
should take 6 (not 5) cycles to set three registers:
| EAX: 22 ECX: 22 EDX: 162e EBX: 0 |
and should leave the pc at address 0x11. It should also take a few less cycles overall than the 355 used by lab5_base.hcl
as a result of increased throughput, though if it does not don't be worried; we aren't focusing on speed right now.
Consider
irmovl $1, %eax
rrmovl %eax, %ebx
In a pipeline diagram (given that we have no execute or memory phases), these will look like
Instr | cycle 1 | cycle 2 | cycle 3 |
---|---|---|---|
irmovl |
FD | W | |
rrmovl |
FD | W |
Note that the immediate value won't be written to the register file until the after of cycle 2, but it will be attempted to be read by the next instruction at the during of cycle 2. This will create a load-use dependency.
We can bypass this dependency in two ways. We can either stall, or we can forward data. Let's try both solutions for our edification.
Copy your work so far into lab5_stall.hcl
We want to stall the decode phase if it needs to read something a later phase intends to write. The register in front of decode is P
, so we'll be setting stall_P
. In addition to the logic we already have for that, we stall if the writeback phases's dstE
is (1) not REG_NONE
and (2) the same as the decode phase's srcA
.
If correctly implemented,
irmovl $1, %eax
rrmovl %eax, %ebx
should take 5 cycles to put a 1 in both eax and edx, while
irmovl $5678, %eax
irmovl $34, %ecx
rrmovl %eax, %edx
rrmovl %ecx, %eax
should still take 6 cycles and result in
| EAX: 22 ECX: 22 EDX: 162e EBX: 0 |
like it did before.
Copy your work pre-stall work into lab5_forward.hcl
We want to grab the value that is being prepped for writing to the register file before it actually gets written if it is the register we are trying to read. Thus, e_rvalA
will be rvalA
unless dstE
is (1) not REG_NONE
and (2) the same as the decode phase's srcA
; in that case, we'll forward wvalE
into e_rvalA
instead.
If correctly implemented,
irmovl $1, %eax
rrmovl %eax, %ebx
should take 4 cycles to put a 1 in both eax and edx, while
irmovl $5678, %eax
irmovl $34, %ecx
rrmovl %eax, %edx
rrmovl %ecx, %eax
should still take 6 cycles and result in
| EAX: 22 ECX: 22 EDX: 162e EBX: 0 |
like it did before.
Submit two files, one named lab5_stall.hcl
and one named lab5_forward.hcl
on the submission page. You'll have to upload them one at a time…
If you didn't have time to finish everything, still submit both files (it's OK if they are incomplete; we are looking for effort more than correctness).
If you want to understand pipelines more, I'd encourage you to add another pipeline register between Fetch and Decode. Don't submit that three-stage-pipeline file, though.