| CS 3330 Final | Exam - S | pring | 2016 |
|---------------|----------|-------|------|
|---------------|----------|-------|------|

| Name: Computing ID:    |                                                                                                                                                                                                                                                                                                                                                               |                  |                    |
|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|--------------------|
| Wr<br>Bu<br>As         | tters go in the boxes unless otherwise specified (e.g., for <b>C</b> 8 write Letters clearly: if we are unsure of what you wrote you will globle and Pledge the exam or you will lose points.  sume unless otherwise specified:  little-endian 64-bit architecture                                                                                            |                  |                    |
| Mu<br>Vai<br>Ma        | %rsp points to the most recently pushed value, not to the next questions are single-selection unless identified as select-all ultiple-select: are all clearly marked; put 1 or more letters in the briable Weight: point values per question are marked in square brark clarifications: If you need to clarify an answer, do so, and oner of your answer box. | oox.<br>rackets. |                    |
| Qu                     | uestion 1 [2 pt]: Which of the following is an example of pipelin                                                                                                                                                                                                                                                                                             | ing the crea     | ition of books?    |
| A<br>B<br>on<br>C<br>D | Person A is writing volume 2 while Person B is binding volume<br>Person A drafts the contents while Person B creates the paper to<br>Person A is writing one book while Person B is writing another<br>all of the above<br>none of the above                                                                                                                  | print it         | Answer:            |
|                        | formation for questions 2–3 otimization strategies are not restricted to code                                                                                                                                                                                                                                                                                 |                  |                    |
|                        | <b>nestion 2 [2 pt]:</b> (see above) An online retail company buying a example of                                                                                                                                                                                                                                                                             | package de       | elivery company is |
| В                      | function inlining eliminating loop inefficiencies cache blocking loop unrolling multiple accumulators                                                                                                                                                                                                                                                         |                  | Answer:            |

|        | testion 3 [2 pt]: (see above) This exam randomizes question order; grouping te this question and its pair) is intended to optimize your performance by utility.                                                                                                                                             |                     |
|--------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|
| B<br>C | cache blocking function inlining eliminating loop inefficiencies multiple accumulators loop unrolling                                                                                                                                                                                                       | Answer:             |
| Qu     | estion 4 [2 pt]: Memory segments are defined by                                                                                                                                                                                                                                                             |                     |
| B<br>C | data structures used by hardware only<br>data structures used by both hardware and software<br>data structures used by software only<br>none of the above; they are just an abstraction                                                                                                                     | Answer:             |
| Qu     | estion 5 [2 pt]: An exception table is                                                                                                                                                                                                                                                                      |                     |
| B<br>C | an array a hash table a tree none of the above                                                                                                                                                                                                                                                              | Answer:             |
| Th     | formation for questions 6–8 re following questions ask about how each of the three main types of exceptioner two.                                                                                                                                                                                           | ons differ from the |
| Qu     | estion 6 [2 pt]: (see above) Faults are different from other exception types                                                                                                                                                                                                                                | in that             |
| B<br>C | faults are not caused by running an assembly instruction faults never cause Aborts or Signals faults are handled by a different mechanism than other exceptions faults are intentionally triggered by user code faults always cause Aborts or Signals faults are never intentionally triggered by user code | Answer:             |
| Qu     | estion 7 [2 pt]: (see above) Traps are different from other exception types in                                                                                                                                                                                                                              | n that              |
| В      | traps are handled by a different mechanism than other exceptions traps never cause Aborts or Signals traps always cause Aborts or Signals                                                                                                                                                                   | Answer:             |

D traps are not caused by running an assembly instruction
 E traps are always intentionally triggered by user code
 F traps are never intentionally triggered by user code

| CS 33                  | 330 Spring 2016 Final Exam                                                                                                                                                                           | Variant O page 3 of 10                                                         | Email ID:          |                           |
|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|--------------------|---------------------------|
| A in B in C in D in    | stion 8 [2 pt]: (see above) Internaterrupts are not caused by runraterrupts never cause Aborts or nterrupts always cause Aborts on terrupts are never intentionally nterrupts are always intentional | ning an assembly instructi<br>Signals<br>r Signals<br>r triggered by user code | 1 ,                | pes in that  Answer:      |
| F in  Infor  Cons  typ | nterrupts are handled by a differ<br>mation for questions 9–10<br>ider the following C definitions:<br>pedef struct node_t { TYPE                                                                    | rent mechanism than othe                                                       | ode;               |                           |
| Ques                   | pedef struct range_t { size<br>stion 9 [2 pt]: (see above) Which<br>)? If multiple options are tied fo                                                                                               | h list uses the least memor                                                    | ry overall (includ | ling both heap and        |
| <b>B</b> r <b>C</b> T  | YPE *list ange list The answer is different if TYPE is ode *list                                                                                                                                     | char than if TYPE is int                                                       |                    | Answer:                   |
| -                      | stion 10 [2 pt]: (see above) We are tied for largest, select all t                                                                                                                                   | Thich list type puts the mathemethral that apply.                              | ost data on the    | stack? If multiple        |
| B r<br>C T             | The answer is different if TYPE is ange list YPE *list ode *list                                                                                                                                     | char than if TYPE is int                                                       |                    | Answer:                   |
| enab                   | stion 11 [2 pt]: All of the follow<br>led without it?<br>ssembly address size can differ                                                                                                             | ring are enabled by virtua                                                     | -                  | h one would <i>not</i> be |
| <b>C</b> coaddro       | nultiple processes can share the ode can be written using labels,                                                                                                                                    | letting the assembler gene                                                     |                    | Answer:                   |
| pose                   | stion 12 [2 pt]: Computers typi<br>of allowing the kernel to create<br>requently for the same reason th                                                                                              |                                                                                | •                  | -                         |
| <b>A</b> p             | age tables should not have too r                                                                                                                                                                     | nany levels                                                                    |                    | Answer:                   |

B cache sets should not have too many entries
C pipelines should not be made too deep
D loops should not be unrolled too many times

#### Information for questions 13–14

For each of the following, assume u and v are both declared as unsigned ints. Select all that could apply for some values of u and v; for example, given "u v" you'd select <, =, and > I use  $\wedge$  as a carat and  $\sim$  as a tilde, both larger than usual for increased legibility.

**Question 13 [2 pt]:** (see above) (u<<16) & (u>>16) u

A =

**B** > **C** <

**Question 14 [2 pt]:** (see above)  $u + \sim v$  u - v

A =

B >

**C** <

Answer:

Answer:

#### Information for questions 15–16

Thus far, fast-and-expensive storage has always been volatile (like SRAM, DRAM, and registers) and slow-and-cheap storage always nonvolatile (like tape, disk, and flash).

Question 15 [1 pt]: (see above) Suppose someone invents a new storage technology: it is about as fast as magentic disk but costs a lot less and is volatile. What should we use it for?

# Select all that apply

**A** it be good for existing file systems

**B** it be good for existing virtual memory swapping

**C** it be good for existing cache hierarchies

**D** none of the above

Answer:

(see above) Suppose someone invents a new storage technology: it costs Question 16 [1 pt]: similar to SRAM but is a little faster and nonvolatile. What should we use it for?

#### Select all that apply

**A** it be good for existing file systems

**B** it be good for existing cache hierarchies

**C** it be good for existing virtual memory swapping

**D** none of the above

Answer:

| Email ID: |  |
|-----------|--|
|           |  |

**Question 17 [3 pt]:** In the following diagram, indicate the control signals to give each pipeline register by putting a single letter in each box; use N for normal, B for bubble, and S for stall.

Assume that  $i_4$  and  $i_5$  resulted from incorrect speculative execution and should not be allowed to continue; that  $i_3$  needs another cycle in the execute stage; and that all other instructions are OK and may continue to execute normally.

Some points are for picking the solution with the fewest stalls.

| Stage:              | F     | D     | E     | M     | W     |
|---------------------|-------|-------|-------|-------|-------|
| Stage:<br>Contains: | $i_5$ | $i_4$ | $i_3$ | $i_2$ | $i_1$ |
| Answers:            |       |       |       |       |       |
|                     |       |       |       |       |       |

**Question 18 [2 pt]:** If we replace a set-associative cache with a different cache with half as many sets each containing twice as many lines (without changing block size),

- **A** the tag gets longer
- **B** the tag stays the same size
- **C** the tag gets shorter

Answer:

**Question 19 [2 pt]:** pushq is a 10-byte instruction. We can replace pushq with other operations (math and register-memory moves); how does the storage requirements for push change if we use other operations instead of pushq?

- A increases by more than 3 bytes
- **B** decreases by more than 3 bytes
- **C** increases by 2 or 3 bytes
- **D** changes by no more than 1 byte
- **E** decreases by more than 2 or 3 bytes

Answer:

#### Information for questions 20–23

Consider a floating-point format with 7 bits overall, 4 of which are exponent bits.

**Question 20 [2 pt]:** (see above) What exponent bits are used to represent  $-\frac{5}{8}$ ? Answer as four bits, such as 0000

Answer:

**Question 21 [2 pt]:** (see above) Which of the following is true using this format?

- **A** 1.0 / 32.0 is 0.0
- **B** 6.0 + 1.0 is 6.0
- **C** (x x) == 0 is true for all x
- **D** None of the above

Answer:

**Question 22 [2 pt]:** (see above) What number is represented by the bits 0101010? Answer as a base-2 number such as -101.11

Answer:

D interruptE none of the above

| <b>Question 23 [2 pt]:</b> (see above) What fraction bits are used to represent $-\frac{5}{8}$ ? Answer as two bits, such as 00                                                                                                                                                                                                                                                              | Answer:           |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|
| <b>Question 24 [0 pt]:</b> Cognitive break. Write a joke or anecdote here, or doodle s ing, or just smile at the blank space worth 0 points and move on.                                                                                                                                                                                                                                     | omething interest |
|                                                                                                                                                                                                                                                                                                                                                                                              |                   |
|                                                                                                                                                                                                                                                                                                                                                                                              |                   |
| Question 25 [2 pt]: Suppose we add a new ifun for OPq, mulq that require cycles in the Execute stage. That means execute may stall for a single operation impact pipeline hazards?  Select all that apply                                                                                                                                                                                    |                   |
| <ul> <li>A Two consecutive OPqs will become a new kind of hazard.</li> <li>B The branch misprediction hazard may now result in more instructions being removed from the pipeline via bubbling.</li> <li>C The load-use hazard can now need more than a single cycle of stalling.</li> <li>D The return hazard can now need extra cycles of stalling.</li> <li>E None of the above</li> </ul> | Answer:           |
| Information for questions 26–29 Various topics discussed during our exploration of exceptions enabled comme elements of a computer system. The following questions ask about these comme                                                                                                                                                                                                     |                   |
| Question 26 [1 pt]: (see above) Communication from kernel to user is enabled                                                                                                                                                                                                                                                                                                                 | d by              |
| <ul> <li>A fault</li> <li>B interrupt</li> <li>C signal</li> <li>D trap</li> <li>E none of the above</li> </ul>                                                                                                                                                                                                                                                                              | Answer:           |
| Question 27 [1 pt]: (see above) Communication from hardware to kernel is er                                                                                                                                                                                                                                                                                                                  | nabled by         |
| A signal B trap C fault                                                                                                                                                                                                                                                                                                                                                                      | Answer:           |

| Question 28 [1 pt]: | (see above) Communication from user to kernel is enabled by | Ţ |
|---------------------|-------------------------------------------------------------|---|
|---------------------|-------------------------------------------------------------|---|

A trap

**B** interrupt

**C** signal

**D** fault

**E** none of the above

# Answer:

### Question 29 [1 pt]: (see above) Communication from kernel to hardware is enabled by

A signal

**B** fault

**C** interrupt

D trap

**E** none of the above



#### Question 30 [2 pt]: An exception handler is

**A** both user- and kernel-mode software

**B** both kernel-mode software and hardware

**C** primarily kernel-mode software

**D** primarily hardware

**E** primarily user-mode software



#### Information for questions 31–32

A binary tree can be stored in an array; entry i's left child is 2i and its right child is 2i + 1:

| 0        | 1    | 2      | 3      | 4        | 5        | 6        | 7        | 8          |  |
|----------|------|--------|--------|----------|----------|----------|----------|------------|--|
| (unused) | root | root.l | root.r | root.l.l | root.l.r | root.r.l | root.r.r | root.l.l.l |  |

Consider such an array used with a direct-mapped cache with 128 lines, each large enough to hold 4 array entries. Suppose the array is aligned so that entries 0, 1, 2, and 3 are in the same cache line.

**Question 31 [2 pt]:** (see above) If code accesses root, then root.l, then root.l.l, then root.l.l.l, etc.; how many entries can we accesses before we have to evict one of the other entry's cache lines? Answer as a base-10 number.

Answer:

# **Question 32 [2 pt]:** (see above) Which method of tree traversal would have the best spatial locality?

**A** pre-order depth first

**B** in-order depth-first

**C** breadth-first

**D** post-order depth first

**E** all of the above have the same locality

Answer:

#### Information for questions 33–36

In a multi-level page table,

| Question 33 [2 pt]:   | (see above) If part-way through following the page table the MMU hardware |
|-----------------------|---------------------------------------------------------------------------|
| finds the read-only b | oit set and the CPU is attempting to write to memory,                     |

**A** stop; the next page table will be in RAM but will also be marked read-only

**B** stop; the next page table might not even be in RAM

**C** keep going, only stopping if the last page table is marked read-only

**D** keep going; even if the last page table is marked read-only it is the OS, not the MMU hardware, that enforces read-only

Answer:

Question 34 [2 pt]: (see above) Which of the following tells the location of the first page table?

**A** the PO

**B** the VPN from high-order bits of the address

**C** the TLB

**D** the VPN from low-order bits of the address

**E** the PTBR

**F** sometimes one of the above, sometimes another, depending on if we have a hit or not

Answer:

**Question 35 [2 pt]:** (see above) The last VPN used is

A an index into a page containing data (not a page table)

**B** it depends on if there is a page fault or not

**C** an index into a page table

Answer:

**Question 36 [2 pt]:** (see above) In the common case where there are 3 or 4 levels of page table and several thousand pages are allocated in a few contiguous regions of virtual memory, table storage  $\div$  data storage is

**A** less than  $\frac{1}{100}$ 

**B** between  $\frac{1}{2}$  and 2

C between 2 and 100

**D** more than 100

**E** between  $\frac{1}{2}$  and  $\frac{1}{100}$ 

Answer:

#### Information for questions 37–39

Consider 38-bit virtual addresses and 4-byte page-table entries, where each PTE stores 8 bits of metadata (executable, protected, etc).

**Question 37 [2 pt]:** (see above) If you have 256-byte pages, then the largest possible physical address space is how many bytes? Answer as a power of two, such as 16B or 128GB.

Answer:

| <b>Question 38 [2 pt]:</b> (see above) If you want to have a single-level page table and to fit the entire page table in one page of memory, what is the smallest page size (in bytes) you could use? Answer as a power of two, such as 16B or 128GB.                                | Answer:              |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
| <b>Question 39 [2 pt]:</b> (see above) If you want to have a three-level page table and to fit each page table in one page of memory, what is the smallest page size (in bytes) you could use? Answer as a power of two, such as 16B or 128GB.                                       | Answer:              |
| Question 40 [2 pt]: Suppose we wanted to add a conditional call instructional call (i.e., callg, callge, etc.) would require  Select all that apply                                                                                                                                  | on to Y86-64. Condi- |
| <ul> <li>A new branch prediction logic (beyond that already present for jXX)</li> <li>B more than the 9 bytes needed to encode call</li> <li>C more register read- or write-ports than call</li> <li>D none of the above</li> </ul>                                                  | Answer:              |
| Information for questions 41–42 The translation lookaside buffer is a cache that                                                                                                                                                                                                     |                      |
| Question 41 [2 pt]: (see above) Produces as output                                                                                                                                                                                                                                   |                      |
| <ul> <li>A the physical page number from an address</li> <li>B entire physical addresses</li> <li>C all the virtual page numbers from an address</li> <li>D entire virtual addresses</li> <li>E a single virtual page number from an address</li> <li>F none of the above</li> </ul> | Answer:              |
| Question 42 [2 pt]: (see above) Takes as input                                                                                                                                                                                                                                       |                      |
| <ul> <li>A all the virtual page numbers from an address</li> <li>B a single virtual page number from an address</li> <li>C entire virtual addresses</li> <li>D entire physical addresses</li> <li>E the physical page number from an address</li> <li>F none of the above</li> </ul> | Answer:              |
| Question 43 [2 pt]: Which of the following assembly snippets is removed by not present in the binary?                                                                                                                                                                                | y the assembler and  |
| <pre>A nop B loop: C addq %rax, %rcx D jg loop</pre>                                                                                                                                                                                                                                 | Answer:              |

Question 44 [2 pt]: After profiling your code you find that it is spending 20% of its time evaluating the comparison statement in the innermost for loop; and 50% of its time accessing elements of an array.

Which of the following optimizations would give the biggest speedup? Assume the optimizations add no overhead not explicitly mentioned.

A add an extra 30% to the runtime to set up, then split the entire program to run in parallel on two processors

**B** reorder loops to reduce memory access times by 30%

C blocking to reduce memory access times by 50%, but increase loop comparison time by  $1.5 \times$ 

**D**  $10 \times \text{loop unrolling}$ 

| Answer: |  |
|---------|--|
|         |  |
|         |  |
|         |  |
|         |  |

Question 45 [2 pt]: Suppose physical memory is larger than the virtual address space. Which of the following can benefit from swapping?

## Select all that apply

A a single user process by itself

**B** multiple concurrent user processes

**C** a single user process and the kernel

**D** none of the above

| Answer: |
|---------|
|         |
|         |
|         |

#### ..... Pledge:

On my honor as a student, I have neither given nor received aid on this exam.

Your signature here