In the previous HW you pipelined
cmovXX. In lab you pipelined
mrmovq (if you didn’t finish, we have posted an example solution). To finish your pipelined simulator, you need to combine those two and then add
You may approach this however you wish, but I suggest the following flow:
pipelab2.hcland test the combination.
jXXwith speculative execution and branch misprediction recovery. Predict that all branches are taken. Test.
retwith handling for the return hazard, and test.
All of the tests that either source file passed ought to still pass the combination.
This one is messy because we have branch prediction, speculative execution, and recovery from misprediction through stage bubbling. Let’s look at through a set of questions
jXX are taken (i.e., that the new PC is
valC for all
predPC register inside the
xF register bank (or
pP or whatever else you called it).
By setting the
pc to the predicted PC (
pc = F_predPC).
If the condition codes evaluate to
At the end of the
jXX’s execute stage (which is when we check the condition codes).
End of execute = beginning of memory, so we can look in the
M_... register bank outputs.
We need to fetch the correct address (
valP) and bubble any stages that we should not have run.
jXX is in memory, we keep that one (and writeback, which is a pre-
jXX instruction); since we are fixing fetch we keep that one too; so we bubble just the decode and execute stages.
By using a mux to pick
mispredicted: oldValP and
valPfor a non-jump,
valCfor a jump.
jXXin Memory and (2) the result of checking the condition codes stored in the
eMregister bank is
mrmovq except you use
rB and ±8 not
valC. Also has a writeback component for
popq updates two registers, so it will need both
popq reads from the old
pushq writes to the new
call 0x1234 is
push valP; jXX 0x1234. Combining the logic of push and unconditional jump should be sufficient.
ret is jump-to-the-read-value-when-popping. It always encounters the
You’ll have to stall the fetch stage as long as a
ret is in decode, execute, or memory and forward the value from
W_valM to the
Your code should have the same semantics as
tools/yis: set the same registers and memory
As a general rule, your pipelined processor will need
1 cycle per instruction executed
4 extra cycles because we have a five-stage pipeline; even
halt takes 5 cycles now.
+1 more cycle for each load-use hazard (i.e., read from memory in one cycle, use as src next cycle)
+2 more cycles for each conditional jump the code should not take (the misprediction penalty)
+3 more cycles for each
irmovq $3, %rax irmovq $-1, %rbx a: jmp b c: jge a halt b: addq %rbx, %rax jmp c
takes 25 cycles and leaves
| RAX: ffffffffffffffff RCX: 0 RDX: 0 | | RBX: ffffffffffffffff RSP: 0 RBP: 0 |
A full trace is available as pipe-jxx.txt
irmovq $3, %rax irmovq $256, %rsp pushq %rax
takes 8 cycles and leaves
| RAX: 3 RCX: 0 RDX: 0 | | RBX: 0 RSP: f8 RBP: 0 | | used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f | | 0x000000f_: 03 00 00 00 00 00 00 00 |
A full trace is available as pipe-push.txt
irmovq $4, %rsp popq %rax
takes 7 cycles and leaves
| RAX: fb0000000000000 RCX: 0 RDX: 0 | | RBX: 0 RSP: c RBP: 0 |
A full trace is available as pipe-pop.txt
irmovq $256, %rsp call a addq %rsp, %rsp a: halt
takes 7 cycles and leaves
| RBX: 0 RSP: f8 RBP: 0 | | used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f | | 0x000000f_: 13 00 00 00 00 00 00 00 |
A full trace is available as pipe-call.txt
irmovq $256, %rsp irmovq a, %rbx rmmovq %rbx, (%rsp) ret halt a: irmovq $258, %rax halt
takes 13 cycles and leaves
| RAX: 102 RCX: 0 RDX: 0 | | RBX: 20 RSP: 108 RBP: 0 | | used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f | | 0x0000010_: 20 00 00 00 00 00 00 00 |
A full trace is available as pipe-ret.txt.
(It is okay to diagree with this trace about what instruction is fetched and ignored while waiting for the ret, but you should take the same number of cycles and produce the same final results.)
The same tests that should have worked on your single-cycle processor in seqhw should produce the correct results on your pipelined processor.
As an experiment this semester, one of our TAs prepared a video tutorial on debugging pipelined procesors which is available here.
Our general advice for debugging this assignment:
pipehw2.hcl on the submission page.