# Precise Exceptions and **Out-of-Order Execution** Samira Khan

# Multi-Cycle Execution

- Not all instructions take the same amount of time for "execution"
- Idea: Have multiple different functional units that take different number of cycles
  - Can be pipelined or not pipelined
  - Can let independent instructions start execution on a different functional unit before a previous long-latency instruction finishes execution

2





# HANDLING EXCEPTIONS IN PIPELINING

- Exceptions versus interrupts
- Cause
  - Exceptions: internal to the running thread
    Interrupts: external to the running thread
- When to Handle
  - Exceptions: when detected (and known to be non-speculative)
  - Interrupts: when convenient
     Except for very high priority ones
     Power failure
     Machine check
- Priority: process (exception), depends (interrupt) Handling Context: process (exception), system (interrupt)

## PRECISE EXCEPTIONS/INTERRUPTS

- The architectural state should be consistent when the exception/interrupt is ready to be handled
- 1. All previous instructions should be completely retired.
- 2. No later instruction should be retired.

Retire = commit = finish execution and update arch. state

### WHY DO WE WANT PRECISE EXCEPTIONS?

- Aid software debugging
- Enable (easy) recovery from exceptions, e.g. page faults
- Enable (easily) restartable processes























## SIMPLIFYING REORDER BUFFER ACCESS

- Idea: Use indirection
- Access register file first
  - If register not valid, register file stores the ID of the reorder buffer entry that contains (or will contain) the value of the register
    Mapping of the register to a ROB entry
- Access reorder buffer next
- What is in a reorder buffer entry?

V DestRegID DestRegVal StoreAddr StoreData BranchTarget PC/IP Control/valid bits Can it be simplified further?



## **REORDER BUFFER PROS AND CONS**

• Pro

Conceptually simple for supporting precise exceptions

• Con

 Reorder buffer needs to be accessed to get the results that are yet to be written to the register file

 CAM or indirection → increased latency and complexity





- Decode (D): Access regfile/ROB, allocate entry in ROB, check if instruction can
   execute, if so dispatch instruction
- Execute (E): Instructions can complete out-of-order
- Completion (R): Write result to reorder buffer
- Retirement/Commit (W): Check for exceptions; if none, write result to architectural register file or memory; else, flush pipeline and start from exception handler

In-order dispatch/execution, out-of-order completion, in-order retirement









# **TOMASULO'S ALGORITHM**

OoO with register renaming invented by Robert Tomasulo
 Used in IBM 360/91 Floating Point Units
 Read: Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic
 Units," IBM Journal of R&D, Jan. 1967.

#### • What is the major difference today?

- What is the major difference today?
   Precise exceptions: IBM 360/91 did NOT have this
   Patt, Hwu, Shebanow, "HPS, a new microarchitecture: rationale and introduction," MICRO 1985.
   Patt et al., "Critical issues regarding HPS, a high performance microarchitecture," MICRO 1985.

- Variants are used in most high-performance processors
   Initially in Intel Pentium Pro, AMD K5
   Alpha 21264, MIPS R10000, IBM POWERS, IBM 2196, Oracle UltraSPARC T4, ARM Cortex A15

- These slides are not covered in the class
- These are for students who want to know more

#### • What is the insight of OOO execution?

# OUT-OF-ORDER EXECUTION (DYNAMIC SCHEDULING)

- Idea: Move the dependent instructions out of the way of independent ones (s.t. independent ones can execute) Rest areas for dependent instructions: Reservation stations
- Monitor the source "values" of each instruction in the resting area
- When all source "values" of an instruction are available, "fire" (i.e. dispatch) the instruction Instructions dispatched in dataflow (not control-flow) order

#### • Benefit:

Latency tolerance: Allows independent instructions to execute and complete in the presence of a long latency operation

# The Von Neumann Model/Architecture

• Also called *stored program computer* (instructions in memory). Two key properties:

#### Stored program

- Instructions stored in a linear memory array
- Memory is unified between instructions and data
   The interpretation of a stored value depends on the control signals
   When is a value interpreted as an instruction?
- Sequential instruction processing
  - One instruction processed (fetched, executed, and completed) at a time
  - Program counter (instruction pointer) identifies the current instr.
     Program counter is advanced sequentially except for control transfer
    instructions

#### 33







## OOO EXECUTION: RESTRICTED DATAFLOW

- An out-of-order engine dynamically builds the dataflow graph of a piece of the program • which piece?
- The dataflow graph is limited to the instruction window
   Instruction window: all decoded but not yet retired
   instructions
- Can we do it for the whole program?
- Why would we like to?
- In other words, how can we have a large instruction window?



