Contents
Your task
-
Submit your solutions to Problem Set-2 as a scanned PDF document on Gradescope. Problem Set-2 may be found in Collab under the resources folder. Alternatively, you may also get a physical copy of the assignment from me during lecture or office hours (Tu/Th 12:30pm at Rice 312).
- Combine your solution from the previous HW and the previous lab into a new file called
pipehw2.hclto create a five-stage pipelined processor with forwarding and branch prediction as described in the textbook that implements:nophaltirmovqrrmovqOPqcmovXXrmmovqmrmovq
We will provide an example lab solution
-
Add the jXX instruction (and make it predict all jumps as taken).
-
Test your combined simulator with
make test-pipehw2 - Submit your solution to kytos
Hints/Approach
General Approach
You may approach this however you wish, but I suggest the following flow:
- Combine your
pipehw1.hclandpipelab2.hcland test the combination. All of the tests that either source file passed previously ought to still pass the combination. - Add
jXXwith speculative execution and branch misprediction recovery. Predict that all branches are taken. Test.
Implementing jXX
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
|---|---|---|---|---|---|---|---|---|---|
jXX |
F | D | E | (next PC available) | M | W | |||
wrong1 |
F | D | (bubble needed) | ||||||
wrong2 |
F | (bubble needed) | |||||||
right1 |
F | D | E | M | W |
-
Replace
pcin thefF(orxForpPor whatever else you called it) register bank wtihpredPC, which will store a predicted PC value instead of the actual PC value.To speculatively use this prediction, we can set
pcto the predicted PC (pc = F_predPC). -
Your processor should predict that all
jXXs are taken (the new PC isvalC). -
We will detect that predictions are wrong near the end of the
jXX’s execute stage (when we check the condition codes). We will fetch the correct instruction during the fetch stage in the next cycle (whenjXXis in the memory stage). - When we react to a misprediction, we need to:
- Squash the mispredicted instructions (which are about to enter the decode and execute stages).
This can be done with by setting the
bubble_Xsignals in the cycle before the corrected instruction is fetched. (Setting thebubble_Xsignal will make theX_*pipeline registers output their default values in the next cycle instead of using their input values.) - Fetch the corrected instruction next cycle (e.g. with a MUX in front of the
pcsignal).
- Squash the mispredicted instructions (which are about to enter the decode and execute stages).
This can be done with by setting the
-
You can fetch the corrected instruction with a MUX front of the
pcsignal:pc = [ mispredicted : oldValP ; ... 1: F_predPC ; ];You may need to pass the
conditionsMetsignal or something equivalent through a pipeline register to be able to tell when a misprediction happened at the appropriate time. -
You will need access to the
valPfrom thejXXinstruction. To do so, you will probably need to pass it through pipeline registers. -
Make sure you correctly handle interactions between
jXXandhalt. Consider code like:jne foo halt foo: rmmovq %rax, (%rax) rmmovq %rax, (%rax) rmmovq %rax, (%rax)When the
haltis executed,F_predPCmay contain the address of anrmmovqinstead ofhalt, so simply settingstall_Fmay not be enough to fetch ahaltnext cycle.Some solutions to this problem may involve using an technique other than setting
stall_Fto prevent the PC from changing, like adding a case to thepc = [...]MUX. - If instead of squashing the mispredicted instructions when they are about to enter the decode and execute stages (like suggested above), you squash them when they are about to enter the execute and memory stages, you will have to worry about preventing the conditions codes from being changed by one of the mispredicted instructions.
Testing your code
-
You can run the command
make test-pipehw2to run your processor on almost all the files iny86/, comparing its output to references supplied intestdata/pipe-reference. The list of tested files is intestdata/pipehw2.txt. For the filespop-forward2.yo,pop-forward3.yo,pop-forward4.yo,load-store.yo, you should have the same values, but you may take fewer cycles. -
For each input file in
y86/, there is a trace from our reference implementation intestdata/pipe-traces. -
Your code should have the same semantics as
tools/yis: set the same registers and memory. You can use this to see if your processor does the correct thing on any input files, including files you come up with yourself. -
We will check the number of cycles your processor takes. As a general rule, your pipelined processor will need
- 1 cycle per instruction executed
- 4 extra cycles because we have a five-stage pipeline; even
halttakes 5 cycles now. - +1 more cycle for each load-use hazard (i.e., read from memory in one cycle, use with ALU next cycle)
- +2 more cycles for each conditional jump the code should not take (the misprediction penalty)
- +3 more cycles for each
retexecuted
Specific Test Cases
jXX
y86/j-cc.yo
irmovq $1, %rsi
irmovq $2, %rdi
irmovq $4, %rbp
irmovq $-32, %rax
irmovq $64, %rdx
subq %rdx,%rax
je target
nop
halt
target:
addq %rsi,%rdx
nop
nop
nop
halt
takes 15 cycles and leaves
| RAX: ffffffffffffffa0 RCX: 0 RDX: 40 |
| RBX: 0 RSP: 0 RBP: 4 |
| RSI: 1 RDI: 2 R8: 0 |
A full trace is available in testdata/pipe-traces/j-cc.txt
y86/jxx.yo
irmovq $3, %rax
irmovq $-1, %rbx
a:
jmp b
c:
jge a
halt
b:
addq %rbx, %rax
jmp c
takes 25 cycles and leaves
| RAX: ffffffffffffffff RCX: 0 RDX: 0 |
| RBX: ffffffffffffffff RSP: 0 RBP: 0 |
A full trace is available in testdata/pipe-traces/jxx.txt (distributed with hclrs.tar)