This page does not represent the most current semester of this course; it is present merely as an archive.
In the previous HW you pipelined nop
, halt
, irmovl
, and rrmovl
. In lab you pipelined rmmovq
and mrmovq
(if you didn’t finish, we have an example solution). To finish your pipelined simulator, you need to combine those two and then add OPq
, jXX
, cmovXX
, pushq
, popq
, call
, and ret
.
You may approach this however you wish, but I suggest the following flow:
pipehw1.hcl
and pipelab2.hcl
and test the combination.OPq
and condition codes and test.cmovXX
and test.jXX
with speculative execution and branch misprediction recovery. Predict that all branches are taken. Test.pushq
and test.call
and test.popq
and test.ret
with handling for the return hazard, and test.All of the tests that either source file passed ought to still pass the combination.
OPq
and condition codesPut the condition codes in their own register bank; you don’t want to bubble them if you bubble another register bank as part of stalling a stage.
Also check the condition codes in execute (based on the ifun
there) and store the result of that comparison in the eM
register bank.
cmovXX
and condition codesTo implement cmovXX
, all you need to do is change reg_dstE
to be either the value you’d normally predict for it or REG_NONE
, depending on the truth of the condition codes. The easiest way to do that is probably by adding a mux in the execute stage, something like
e_dstE = [
/* this is a cmovXX and the condition codes are not satisfied */ : REG_NONE;
1 : E_dstE;
];
jXX
This one is messy because we have branch prediction, speculative execution, and recovery from misprediction through stage bubbling. Let’s look at through a set of questions
That all jXX
are taken (i.e., that the new PC is valC
for all jXX
).
In a predPC
register inside the xF
register bank (or pP
or whatever else you called it).
By setting the pc
to the predicted PC (pc = F_predPC
).
If the condition codes evaluate to false
.
At the end of the jXX
’s execute stage (which is when we check the condition codes).
End of execute = beginning of memory, so we can look in the M_...
register bank outputs.
We need to fetch the correct address (jXX
’s valP
) and bubble any stages that we should not have run.
Since jXX
is in memory, we keep that one (and writeback, which is a pre-jXX
instruction); since we are fixing fetch we keep that one too; so we bubble just the decode and execute stages.
By using a mux to pick pc
mispredicted: oldValP
and 1: F_predPC
1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
---|---|---|---|---|---|---|---|---|---|
jXX |
F | D | E | (available) | M | W | |||
wrong1 |
F | D | (needed) | ||||||
wrong2 |
F | (needed) | |||||||
right1 |
F | D | E | M | W |
xF
register bankvalP
for a non-jump, valC
for a jump.pc
to
jXX
in Memory and (2) the result of checking the condition codes stored in the eM
register bank is 0
.valP
pushq
and popq
Like an rmmovq
or mrmovq
except you use REG_RSP
not rB
and ±8 not valC
. Also has a writeback component for REG_RSP
.
Note that popq
updates two registers, so it will need both reg_dstE
and reg_dstM
.
Note that popq
reads from the old %rsp
while pushq
writes to the new %rsp
.
call
and ret
call 0x1234
is push valP; jXX 0x1234
. Combining the logic of push and unconditional jump should be sufficient.
ret
is jump-to-the-read-value-when-popping. It always encounters the
:ret
-hazard
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |||
---|---|---|---|---|---|---|---|---|---|---|
ret |
F | D | E | M | (available) | W | ||||
??? |
F | F | F | (needed) | F | D | E | M | W |
You’ll have to stall the fetch stage as long as a ret
is in decode, execute, or memory and forward the value from W_valM
to the pc
.
Your code should have the same semantics as tools/yis
: set the same registers and memory
As a general rule, your pipelined processor will need
1 cycle per instruction executed
4 extra cycles because we have a five-stage pipeline; even halt
takes 5 cycles now.
+1 more cycle for each load-use hazard (i.e., read from memory in one cycle, use as src next cycle)
+2 more cycles for each conditional jump the code should not take (the misprediction penalty)
+3 more cycles for each ret
executed
OPq
y86/opq.yo
irmovq $7, %rdx
irmovq $3, %rcx
addq %rcx, %rbx
subq %rdx, %rcx
andq %rdx, %rbx
xorq %rcx, %rdx
andq %rdx, %rsi
takes 12 cycles and leaves
| RAX: 0 RCX: fffffffffffffffc RDX: fffffffffffffffb |
| RBX: 3 RSP: 0 RBP: 0 |
A full trace is available as pipe-opq.txt
cmovXX
y86/cmovXX.yo
irmovq $2766, %rbx
irmovq $1, %rax
andq %rax, %rax
cmovg %rbx, %rcx
cmovne %rbx, %rdx
irmovq $-1, %rax
andq %rax, %rax
cmovl %rbx, %rsp
cmovle %rbx, %rbp
xorq %rax, %rax
cmove %rbx, %rsi
cmovge %rbx, %rdi
irmovq $2989, %rbx
irmovq $1, %rax
andq %rax, %rax
cmovl %rbx, %rcx
cmove %rbx, %rdx
irmovq $-1, %rax
andq %rax, %rax
cmovge %rbx, %rsp
cmovg %rbx, %rbp
xorq %rax, %rax
cmovl %rbx, %rsi
cmovne %rbx, %rdi
irmovq $0, %rbx
takes 30 cycles and leaves 0xace in %rcx
, %rdx
, %rsp
, %rbp
, %rsi
, and %rdi
.
A full trace is available as pipe-cmovXX.txt
jXX
y86/jxx.yo
irmovq $3, %rax
irmovq $-1, %rbx
a:
jmp b
c:
jge a
halt
b:
addq %rbx, %rax
jmp c
takes 25 cycles and leaves
| RAX: ffffffffffffffff RCX: 0 RDX: 0 |
| RBX: ffffffffffffffff RSP: 0 RBP: 0 |
A full trace is available as pipe-jxx.txt
pushq
y86/push.yo
irmovq $3, %rax
irmovq $256, %rsp
pushq %rax
takes 8 cycles and leaves
| RAX: 3 RCX: 0 RDX: 0 |
| RBX: 0 RSP: f8 RBP: 0 |
| used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f |
| 0x000000f_: 03 00 00 00 00 00 00 00 |
A full trace is available as pipe-push.txt
popq
y86/pop.yo
irmovq $4, %rsp
popq %rax
takes 7 cycles and leaves
| RAX: fb0000000000000 RCX: 0 RDX: 0 |
| RBX: 0 RSP: c RBP: 0 |
A full trace is available as pipe-pop.txt
call
y86/call.yo
irmovq $256, %rsp
call a
addq %rsp, %rsp
a:
halt
takes 7 cycles and leaves
| RBX: 0 RSP: f8 RBP: 0 |
| used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f |
| 0x000000f_: 13 00 00 00 00 00 00 00 |
A full trace is available as pipe-call.txt
ret
y86/ret.yo
irmovq $256, %rsp
irmovq a, %rbx
rmmovq %rbx, (%rsp)
ret
halt
a:
irmovq $258, %rax
halt
takes 13 cycles and leaves
| RAX: 102 RCX: 0 RDX: 0 |
| RBX: 20 RSP: 108 RBP: 0 |
| used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f |
| 0x0000010_: 20 00 00 00 00 00 00 00 |
A full trace is available as pipe-ret.txt
Submit pipehw2.hcl
on the submission page.