This page does not represent the most current semester of this course; it is present merely as an archive.
In the previous HW you pipelined
rrmovl. In lab you pipelined
mrmovq (if you didn’t finish, we have an example solution). To finish your pipelined simulator, you need to combine those two and then add
You may approach this however you wish, but I suggest the following flow:
pipelab2.hcland test the combination.
OPqand condition codes and test.
jXXwith speculative execution and branch misprediction recovery. Predict that all branches are taken. Test.
retwith handling for the return hazard, and test.
All of the tests that either source file passed ought to still pass the combination.
OPqand condition codes
Put the condition codes in their own register bank; you don’t want to bubble them if you bubble another register bank as part of stalling a stage.
Also check the condition codes in execute (based on the
ifun there) and store the result of that comparison in the
eM register bank.
cmovXXand condition codes
cmovXX, all you need to do is change
reg_dstE to be either the value you’d normally predict for it or
REG_NONE, depending on the truth of the condition codes. The easiest way to do that is probably by adding a mux in the execute stage, something like
e_dstE = [ /* this is a cmovXX and the condition codes are not satisfied */ : REG_NONE; 1 : E_dstE; ];
This one is messy because we have branch prediction, speculative execution, and recovery from misprediction through stage bubbling. Let’s look at through a set of questions
jXX are taken (i.e., that the new PC is
valC for all
predPC register inside the
xF register bank (or
pP or whatever else you called it).
By setting the
pc to the predicted PC (
pc = F_predPC).
If the condition codes evaluate to
At the end of the
jXX’s execute stage (which is when we check the condition codes).
End of execute = beginning of memory, so we can look in the
M_... register bank outputs.
We need to fetch the correct address (
valP) and bubble any stages that we should not have run.
jXX is in memory, we keep that one (and writeback, which is a pre-
jXX instruction); since we are fixing fetch we keep that one too; so we bubble just the decode and execute stages.
By using a mux to pick
mispredicted: oldValP and
valPfor a non-jump,
valCfor a jump.
jXXin Memory and (2) the result of checking the condition codes stored in the
eMregister bank is
mrmovq except you use
rB and ±8 not
valC. Also has a writeback component for
popq updates two registers, so it will need both
popq reads from the old
pushq writes to the new
call 0x1234 is
push valP; jXX 0x1234. Combining the logic of push and unconditional jump should be sufficient.
ret is jump-to-the-read-value-when-popping. It always encounters the
You’ll have to stall the fetch stage as long as a
ret is in decode, execute, or memory and forward the value from
W_valM to the
Your code should have the same semantics as
tools/yis: set the same registers and memory
As a general rule, your pipelined processor will need
1 cycle per instruction executed
4 extra cycles because we have a five-stage pipeline; even
halt takes 5 cycles now.
+1 more cycle for each load-use hazard (i.e., read from memory in one cycle, use as src next cycle)
+2 more cycles for each conditional jump the code should not take (the misprediction penalty)
+3 more cycles for each
irmovq $7, %rdx irmovq $3, %rcx addq %rcx, %rbx subq %rdx, %rcx andq %rdx, %rbx xorq %rcx, %rdx andq %rdx, %rsi
takes 12 cycles and leaves
| RAX: 0 RCX: fffffffffffffffc RDX: fffffffffffffffb | | RBX: 3 RSP: 0 RBP: 0 |
A full trace is available as pipe-opq.txt
irmovq $2766, %rbx irmovq $1, %rax andq %rax, %rax cmovg %rbx, %rcx cmovne %rbx, %rdx irmovq $-1, %rax andq %rax, %rax cmovl %rbx, %rsp cmovle %rbx, %rbp xorq %rax, %rax cmove %rbx, %rsi cmovge %rbx, %rdi irmovq $2989, %rbx irmovq $1, %rax andq %rax, %rax cmovl %rbx, %rcx cmove %rbx, %rdx irmovq $-1, %rax andq %rax, %rax cmovge %rbx, %rsp cmovg %rbx, %rbp xorq %rax, %rax cmovl %rbx, %rsi cmovne %rbx, %rdi irmovq $0, %rbx
takes 30 cycles and leaves 0xace in
A full trace is available as pipe-cmovXX.txt
irmovq $3, %rax irmovq $-1, %rbx a: jmp b c: jge a halt b: addq %rbx, %rax jmp c
takes 25 cycles and leaves
| RAX: ffffffffffffffff RCX: 0 RDX: 0 | | RBX: ffffffffffffffff RSP: 0 RBP: 0 |
A full trace is available as pipe-jxx.txt
irmovq $3, %rax irmovq $256, %rsp pushq %rax
takes 8 cycles and leaves
| RAX: 3 RCX: 0 RDX: 0 | | RBX: 0 RSP: f8 RBP: 0 | | used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f | | 0x000000f_: 03 00 00 00 00 00 00 00 |
A full trace is available as pipe-push.txt
irmovq $4, %rsp popq %rax
takes 7 cycles and leaves
| RAX: fb0000000000000 RCX: 0 RDX: 0 | | RBX: 0 RSP: c RBP: 0 |
A full trace is available as pipe-pop.txt
irmovq $256, %rsp call a addq %rsp, %rsp a: halt
takes 7 cycles and leaves
| RBX: 0 RSP: f8 RBP: 0 | | used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f | | 0x000000f_: 13 00 00 00 00 00 00 00 |
A full trace is available as pipe-call.txt
irmovq $256, %rsp irmovq a, %rbx rmmovq %rbx, (%rsp) ret halt a: irmovq $258, %rax halt
takes 13 cycles and leaves
| RAX: 102 RCX: 0 RDX: 0 | | RBX: 20 RSP: 108 RBP: 0 | | used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f | | 0x0000010_: 20 00 00 00 00 00 00 00 |
A full trace is available as pipe-ret.txt
pipehw2.hcl on the submission page.