Changelog:

  • 21 Nov 2025 (while still tentative): adjust assembly code in all problems to be different for this semester; enable answer sheet
  • 22 Nov 2025 (while still tentative): try to clarify description of what it means for operands to be ready for an instruction to be issued
  • 8 Dec 2025: in description of the issue change replace it with the data cache and each of these ALUs to hopefully avoid misinterpretation; also change this cycle to the same cycle; change it is written back to its result is written back

See the answer sheet.

1 Hypothetical OOO processor

For this assignment, we will consider an out-of-order processor which has two execution units for arithmetic instructions, and one execution unit for memory instructions.

1.0.1 This processor’s pipeline

Instructions in this processor are executed as follows:

Given these stages, an non-memory instruction will take 7 cycles to go through the pipeline if it does not need to wait for it operands or wait for earlier instructions to commit (fetch, decode, rename, issue/register read, execute, writeback, commit). A memory instruction will take 9 cycles (two extra cycles of `execution’ because of the speed of the data cache).

When operands are not available, an instruction will spend extra cycles waiting in the instruction queue between its rename and issue stage. When an instruction is written back but some prior instructions have not done so, then the instruction will spend extra cycles waiting in the reorder buffer before it is committed.

1.0.2 This processor’s registers

The processor has 15 logical registers %r01 through %r15, but these are implemented using 64 physical registers and register renaming. We call the physical registers %x01 through %x64.

We will show instructions in a three-argument form like (regardless of whether register renaming has taken place):

add %r01, %r02 -> %r03

this indicates to add %r01 and %r02 and put the result into %r03.

Answer the questions on the answer sheet linked above.

1.1 Part 1: register renaming

Assume initially that %r01 is assigned to %x01, %r02 to %x02, and so on. Registers %x16 through %x63 are available for register renaming.

Rewrite the following instructions using the %x01 taking into account how register renaming might occur that would allow all of these instructions to be placed in the instruction queue at once:

add %r01, %r02 -> %r02  
add %r02, %r01 -> %r01  
imul %r02, %r03 -> %r03 
imul %r04, %r01 -> %r01
sub %r02, %r01 -> %r01  
add %r03, %r04 -> %r04  
movq 0x0(%r04) -> %r01  
imul %r01, %r03 -> %r03 
movq %r03, 0x0(%r01)-> 

Record your answer at the answer sheet linked above.

1.2 Part 2: instruction dispatch

Suppose the instruction queue for this processor contains the following instructions after renaming

 A. movq (%x05)-> %x16
 B. add %x05, %x08 -> %x18
 C. sub %x01, %x02 -> %x19
 D. imul %x16, %x02 -> %x20
 E. movq (%x06) -> %x21
 F. add %x19, %x08 -> %x22
 G. add %x21, %x18 -> %x23
 H. movq (%x22) -> %x24
 I. imul %x22, %x20 -> %x25
 J. movq %x25, (%x23) ->

Initially the value of registers %x01 through %x15 is available.

Complete the timeline to show how all the instructions can be issued and executed. (Assume there are no cases where loads and stores access the same or overlapping memory regions.)

                cycle 1    2    3   4    5    6   7    8     9    10   11    12 
execution unit:       
arithmetic 1          B
arithmetic 2          C
memory stage 1        A
memory stage 2             A
memory stage 3                  A

Record your answer at the answer sheet linked above.

1.3 Part 3: pipeline diagram

Complete a pipeline diagram for the processor running the following instructions (shown in the form from before register renaming):

1. add %r02, %r01 -> %r01
2. add %r02, %r01 -> %r01
3. add %r02, %r03 -> %r03
4. add %r05, %r04 -> %r04
5. add %r04, %r05 -> %r05
6. add %r05, %r06 -> %r06
7. add %r02, %r03 -> %r03
8. add %r01, %r08 -> %r08

Identify the stages as:

Leave the specified stage blank when an instruction is not being processed in any of the above ways (such as when the instruction is in the instruction queue waiting to be issued or any cycles between when an instruction is written back and when it is committed).

The first three rows are:

1 F D R I E W C
2 F D R   I E W C
3   F D R I E W C

complete the remaining rows of the table at the answer sheet linked above.

You may assume that register values computed by previous instructions are available and that physical registers are free such that stages will not be delayed for missing previous register values or lack of available physical registers.