All quizzes for Fall 2017 (key)

All quizzes for Fall 2017 (key)

Quiz 00

Question 1: Suppose a program has three tasks to perform:

  • Task A, which takes 10 seconds; then
  • Task B, which takes 6 seconds; then
  • Task C, which takes 3 seconds

Initially, the program performs each of these tasks one at a time. Which of the following changes will result in the fastest runtime? (Assume doing two tasks in parallel does not make either of the parallel tasks slower.)

  1. run task A, B, and C in parallel

  2. speedup B and C by a factor of 7

  3. (correct answer)

    speedup A and B by a factor of 3

  4. speedup A by a factor of 5

Question 2: Which of the following are true about object files?
Select all that apply

  1. (correct answer)

    The object file contains the names of labels from the corresponding assembly file.

  2. (correct answer)

    In addition to the machine code itself, the object file includes information about where in the machine code memory addresses will be when the program runs.

  3. In addition to the machine code itself, the object file includes information about where each instruction starts in the machine code.

Question 3: If the bytes 0x12, followed by 0x34, followed by 0x56, followed by 0x78 are interpreted as a 4-byte little endian integer, what value will they have?

  1. 0x12345678

  2. 0x21436587

  3. 0x87654321

  4. (correct answer)

    0x78563412

  5. none of the above

Question 4: After the statements char array[3] = {10, 12, 14}; char *ptr = array;, which of the following C expressions are true?
Select all that apply

  1. (correct answer)

    array + 0 == ptr

  2. (correct answer)

    ptr + 2 == &(array[2])

  3. (correct answer)

    *array + 1 == *ptr + 1

  4. (correct answer)

    *(ptr + 1) == 12

Quiz 01

Quiz 02

Question 1: After int array[3] = {10, 6, 2}; int *ptr = array;, which of the following C expressions are true:
Select all that apply

  1. (correct answer)

    sizeof(array[1]) == sizeof(ptr[1])

  2. sizeof(array) == 3 * sizeof(ptr)

  3. sizeof(array) == sizeof(&array[0])

  4. (correct answer)

    array + 1 == ptr + 1

Question 2: After the declaration:

    typedef struct foo {
        int x;
    } bar;

which of the following C statements declare a instance of the struct?
Select all that apply

  1. (correct answer)

    bar x;

  2. (correct answer)

    struct foo x;

  3. struct bar x;

  4. typedef bar baz;

Question 3: What is the value of %rax after running the following? (You may assume this assembly does not segfault or otherwise crash.)

    movq %rsp, %rax
    pushq %rax
    subq %rsp, %rax
    leaq (%rax, %rax, 2), %rax
  1. 12

  2. (correct answer)

    24

  3. 8

  4. it depends on the initial value of %rsp or on the contents of the stack

  5. none of the above

Question 4: What is the value of %rax after running the following?

    movq $10, %rax
    movq $20, %rbx
    subq %rbx, %rax
    jg skip
    addq $10, %rax
skip:
  1. -10

  2. (correct answer)

    0

  3. 10

  4. 20

  5. 30

  6. not enough information is provided

  7. none of the above

Quiz 03

Quiz 04

Question 1: Which of the following usually true about RISC-like instruction sets compared to CISC-like instruction sets?
Select all that apply

  1. simpler instructions can be encoded using less space than complicated instructions

  2. common operations (like pushing a value on the stack) have special, dedicated instructions

  3. (correct answer)

    they have more registers

  4. (correct answer)

    they have fewer ways of specifying instruction operands

Question 2: Consider the following C code:

    while (a + b > 0) {
        b += foo();
    }

If a stored in %r12 and b is stored in %rbx, which of the following are valid compilations of this code? (Note that both %r12 and %rbx are callee-saved registers.)
Select all that apply

  1. (correct answer)
        jmp middle
    top:
        call foo
        addq %rax, %rbx
    middle:
        movq %r12, %rax
        addq %rbx, %rax
        jg top
    bottom:
    
  2. top:
        addq %r12, %rbx
        jle bottom
        call foo
        addq %rax, %rbx
        jmp top
    bottom:
    
  3.    movq %r12, %rax
       addq %rbx, %rax
       jl bottom
    top:
       call foo
       addq %rax, %rbx
       movq %rbx, %rax
       addq %r12, %rax
       jg top
    

Question 3: Which of the following are valid Y86-64 assembly?
Select all that apply

  1. rrmovq %rax, %r15

  2. mrmovq 0x1234(%rbx,2), %r10

  3. (correct answer)

    xorq %rax, %rbx

  4. (correct answer)

    cmovne %rbx, %rcx

  5. (dropped from quiz)(correct answer)

    popq %rsp

Question 4: Which of the following are attributes of an instruction set?
Select all that apply

  1. the number of cycles each instruction takes to execute

  2. (correct answer)

    whether floating point is supported

  3. (correct answer)

    the number of registers available to programmers

  4. (dropped from quiz)(correct answer)

    the size of memory addresses

  5. (correct answer)

    whether condition codes or compare-and-branch instructions are used

Quiz 05

Question 1: Which of the following are true about combinatorial circuits?
Select all that apply

  1. They can implement a counter.

  2. They only act when triggered by the rising edge of a clock signal.

  3. (correct answer)

    They produce their output based on only their input.

Question 2: In the registers described in section 4.2.5 of our textbook, if a register is outputting the value a, and its input changes from a to b, the output will change:

  1. after a small delay that does not depend on a clock signal

  2. nearly immediately if the clock signal is high; when the clock signal becomes high otherwise

  3. upon the next rising or falling edge of the clock signal

  4. (correct answer)

    upon the next rising edge of the clock signal

  5. upon the next falling edge of the clock signal

Question 3: In the register file described in section 4.2.5 of our textbook, changes to the read register ID input will cause changes to the corresponding register value being outputted by the register file:

  1. (correct answer)

    after a small delay that does not depend on a clock signal

  2. nearly immediately if the clock signal is high; when the clock signal becomes high otherwise

  3. upon the next rising or falling edge of the clock signal

  4. upon the next rising edge of the clock signal

  5. upon the next falling edge of the clock signal

Question 4: According to the description of conceptual "stages" described in our textbook for a Y86-64 processor in section 4.3.1, which of the following instructions may need to perform work in the "execute" stage?
Select all that apply

  1. halt

  2. (correct answer)

    pushq

  3. (correct answer)

    call

  4. (dropped from quiz)

    irmovq

  5. (correct answer)

    addq

  6. (correct answer)

    mrmovq

Question 5: Our textbook's description of the conceptual "write back" stage for the Y86-64 processor notes that the processor may write "up to two results to the register file". Which instructions require two values to be written to the register file?
Select all that apply

  1. pushq

  2. (correct answer)

    popq

  3. rmmovq

  4. subq

  5. ret

  6. call

Quiz 06

Question 1 (0 points): If x and y are positive unsigned ints, then which of the following C expressions are always true?

  1. (dropped from quiz)

    (x & y) > (x | y)

  2. (dropped from quiz)

    (x >> 8) & 0xFF == (x & 0xFF80) >> 8 (key: ((x >> 8) & 0xFF) == ((x & 0xFF80) >> 8) would always be true)

  3. (dropped from quiz)

    ((x >> 2) | x) & y == (x & y) | ((x & y) >> 2)

  4. (dropped from quiz)

    (x + y) >> 7 == (x >> 7) + (y >> 7) (key: (127 + 1) >> 7 == 1, (127 >> 7) + (1 >> 7) == 0)

Question 2: If x is a signed short on our lab machines (two bytes, two's complement), then which of the following are possible results of x >> 3?
Select all that apply

  1. (correct answer)

    0

  2. (correct answer)

    -1

  3. (correct answer)

    7

  4. (correct answer)

    -7

  5. 10000

  6. -10000

Question 3: Given a signed integer x, which of the following C expressions results in x with the first and third least significant bits cleared?
Select all that apply

  1. ((x >> 1) << 1) | ((x >> 3) << 3)

  2. (((x >> 1) ^ 2) << 1)

  3. (correct answer)

    (((x >> 1) & (~2)) << 1)

  4. x ^ 5

  5. (correct answer)

    x & -6

For these questions, consider the processor "state" components we discussed in lecture --- the register file, ALU, instruction memory, data memory and PC register --- and the single-cycle processor designs we discussed based on connecting these with MUXes.

Question 4: (see above) Given the processor components we discussed, a processor executing the addq instruction would update the value stored in the result register in the register file ________ when it updates the value stored in the program counter register.

  1. (correct answer)

    at approximately the same time as (key: on the rising edge of the clock signal even though the values to store may be computed at different times)

  2. after

  3. before

  4. unknown --- it depends on how efficient the ALU and register file are

Question 5: (see above) If we were building a processor to implement the Y86 jmp, addq and mrmovq instructions only, which of the following inputs would need to be controlled by a MUX or similar logic depended on the instructions icode?
Select all that apply

  1. (correct answer)

    the input of the program counter register (key: jmp uses part of instruction memory output; others do not)

  2. the read register number inputs to the register file (key: harmless to read extra registers)

  3. (dropped from quiz)(correct answer)

    the write register number inputs to the register file (key: this or write value input needs something)

  4. the write enable input to the data memory (key: no instructions write, can be hard-wired to 0)

  5. (dropped from quiz)

    the inputs to the data memory (key: probably harmless to read memory when unneeded)

  6. the address input to the instruction memory (key: always equal to PC register output)

  7. (correct answer)

    the inputs to the ALU (key: mrmovq adds value from instruction; addq adds value from register)

Quiz 07

Question 1: In HCL (our flavor or the one described in the book), for which of the following values of x and y is the result 42?

    [ x > y : 41 ; x > y + 2 : 42 ; x + y < 0 : 43 ; 1 : x + y ]

Select all that apply

  1. (correct answer)

    x = 21, y = 21

  2. x = 23, y = 19

  3. (correct answer)

    x = 0, y = 42

  4. (dropped from quiz)

    x = 43, y = -1 (dropped because depends on how signed/unsigned comparisons work)

Question 2: Suppose you are debugging a processor you wrote in HCLRS and discover that %rax is being written by an instruction that should not modify any register. Which of the following are likely causes of this bug?
Select all that apply

  1. mem_writebit is 1 when it should be 0

  2. (correct answer)

    reg_dstE has the wrong value

  3. (correct answer)

    reg_dstM has the wrong value

  4. reg_inputE has the wrong value

  5. reg_srcB has the wrong value

  6. mem_addr has the wrong value

Question 3: In the processor design in the book, some instructions require a constant that is not part of the instruction's machine code to be fed to the ALU. These include:
Select all that apply

  1. mrmovq

  2. (correct answer)

    pushq

  3. (correct answer)

    ret

  4. xorq

Question 4: During the execution of the ret instruction in the processor design discussed in the book, the input to the program counter register should be equal to:

  1. (correct answer)

    the output of the data memory

  2. part of the output of the instruction memory

  3. one of the outputs of the register file

  4. the output of the ALU

  5. the output of an PC register plus a small constant

  6. different kinds of values depending on the condition codes

Question 5: During the execution of a popq %rax instruction in the processor design discussed in the book, which operations occur significantly before the value of %rsp stored in the register file changes?
Select all that apply

  1. the value of %rax stored in the register file changes

  2. (correct answer)

    the instruction is read from the instruction memory

  3. (correct answer)

    the value to be stored in %rax is read from the data memory

  4. (correct answer)

    the prior value of %rsp is read from the register file

Quiz 08

Question 1: The textbook's processor design performs an addition by 0 in the ALU for irmovq instruction to produce the value for the register file to write. What is an alternative to this design?

  1. (correct answer)

    add a MUX in front of one of the register file "value to write" inputs and connect one of the inputs to this MUX to part of the output of the instruction memory

  2. add an additional input to the MUX controlling one of the "register number to read" inputs to the register file and connect it to part of the output of the instruction memory

  3. add a MUX in front of the data memory "address to read or write" input and connect it to part of the output of the instruction memory

  4. none of the above

Question 2: Our textbook divides its single-cycle processor design into six stages. What is true about these stages?
Select all that apply

  1. the stages represent the order in which this processor executes an instruction

  2. (correct answer)

    some instructions have no useful operations in some stages

  3. the length of the current instruction is computed in the logic of the "decode" stage.

  4. (correct answer)

    the data memory access, if any, is part of the "memory" stage

Question 3: During execution of popq instruction, the address that is passed to the data memory is equal to one of the outputs of

  1. the ALU

  2. (correct answer)

    the register file

  3. the PC register

  4. the instruction memory

  5. it doesn't matter; the popq instruction doesn't use the data memory

Question 4: Given the following HCLRS code

    register xY {
        a : 64 = 0;
        b : 64 = 0;
    }
    x_a = Y_a + Y_b;
    x_b = Y_b + 1;

During the initial clock cycle Y_a and Y_b are 0. What is the value of Y_a after three rising clock edges?

  1. 0

  2. 1

  3. 2

  4. (correct answer)

    3

  5. 4

  6. 5

  7. 7

Quiz 09

Quiz 10

Question 1: A circuit that can be easily broken into multiple, approximately parts that take approximately equal amounts of time is a ________ candidate for pipelining than a circuit that can be easily broken into one complex, slow part and many fast, simple parts.

  1. (correct answer)

    better

  2. worse

  3. about as good

Question 2: Adding pipelining to a processor will usually ________ the amount of time it takes to execute a single function of several hundred instructions.

  1. increase

  2. (correct answer)

    decrease

  3. not change

Question 3: Adding pipelining to a processor will usually ________ the amount of time it takes to execute a single instruction.

  1. (correct answer)

    increase

  2. decrease

  3. not change

Question 4: Which of the following usually increases as the number of pipeline stages in a procesor design increases?
Select all that apply

  1. (correct answer)

    the portion of the cycle time devoted to register delays

  2. the complexity of each pipeline stage

Quiz 11

Quiz 12

Question 1: Consider the following Y86 assembly snippet running on the PIPE processsor described in our textbook:

    andq %rax, %rbx
    subq %rcx, %rdx
    rrmovq %r8, %r9
    xorq %r10, %r11

When the andq instruction is in its memory stage, then the rrmovq instruction is in the ______ stage.

  1. fetch

  2. (correct answer)

    decode

  3. execute

  4. memory

  5. writeback

  6. none; the instruction is not being run anywhere in the pipeline

Question 2: Consider the following Y86 assembly snippet running on the PIPE processor described in our textbook:

    andq %rax, %rbx
    call foo
    xorq %r10, %r11
    halt
foo:
    subq %rcx, %rdx
    ret

When the xorq instruction is in its fetch stage, then the subq instruction is in the _____ stage.

  1. fetch

  2. decode

  3. execute

  4. memory

  5. writeback

  6. (correct answer)

    none; the instruction is not being run anywhere in the pipeline

Question 3: Assume we are executing a program on the PIPE processor design described in the textbook. If one instruction in this program writes to the register %rax and a later instruction in the program reads that value from %rax, then
Select all that apply

  1. (correct answer)

    there must be data dependency between the two instructions

  2. there must be a data hazard involving these two instructions

  3. if the two instructions are in the same function, there must be a control hazard involving the two instructions

Question 4: Which of the following statements are true about forwarding, stalling, and branch prediction are true?
Select all that apply

  1. Forwarding can handle data hazards that stalling alone cannot.

  2. (correct answer)

    A combination of stalling and forwarding can be used together to handle a single data hazard.

Question 5: When branch prediction predicts the wrong instruction,
Select all that apply

  1. the processor needs to fetch the mispredicted conditional jump instruction twice.

  2. the processor does not fetch new instructions until the mispredicted conditional jump instruction's result is known

Quiz 13

Question 1 (2 points): Which of the following assembly snippets exercises a data hazard in the PIPE processor design described in our textbook?
Select all that apply

  1. (correct answer)
    addq %rax, %rbx
    addq %rbx, %rcx
    
  2. rmmovq %rcx, 0(%rdx)
    nop
    addq %rcx, %rdx
    
  3. (correct answer)
    addq %rcx, %rdx
    nop
    rmmovq %rcx, 0(%rdx)
    
  4. (correct answer)
    addq %rax, %rbx
    nop
    nop
    addq %rbx, %rcx
    

Question 2: If we want to squash the instruction currently in the execute stage, what should we do in HCL?

  1. set the "bubble" signal to 1 on the decode to execute pipeline registers

  2. set the "stall" signal to 1 on the decode to execute pipeline registers

  3. (correct answer)

    set the "bubble" signal to 1 on the execute to memory pipeline registers (preferred answer; this replaces the instruction with a nop)

  4. (correct answer)

    set the "stall" signal to 1 on the execute to memory pipeline registers (answer accepted, but this assumes we are also stalling the instruction currently in the memory stage)

  5. none of the above

Question 3: Consider the following assembly code:

    mrmovq 0(%rbx), %rcx
    jne foo
    addq %rcx, %rdx
...
foo:

On the PIPE processor described in the textbook and in lecture (that uses forwarding and branch prediction), if the jne is not taken (control goes to addq), the addq instruction will be fetched when the mrmovq instruction is in the ____ stage.

  1. fetch

  2. decode

  3. execute

  4. memory

  5. (correct answer)

    writeback

  6. none of the above, the instruction is not in the pipeline

Consider splitting up the "decode" and "writeback" stages of our PIPE processor in order to accommodate a slower, internally pipelined register file. This register file accepts register numbers to read every cycle, and outputs a result near the end of the following clock cycle. It also accepts register numbers and value pairs to write every cycle, but doesn't complete the write until the end of the following clock cycle.

In the resulting processor, the pipeline stages will be:

  1. Fetch
  2. Decode 1
  3. Decode 2
  4. Execute
  5. Memory
  6. Writeback 1
  7. Writeback 2

For one instruction to read a value another wrote from the register file, the Writeback 2 stage of the writing instruction needs to complete by the time the reading instruction starts its Decode 1 stage.

Question 4: (see above) Suppose this modified processor is executing the following assembly snippet:

    irmovq $0x1234, %rax
    mrmovq 4(%rax), %rbx
    subq %rbx, %rcx

If this processor resolves hazards in this assembly with stalling only, how many cycles of stalling will be required?

  1. 0

  2. 1

  3. 2

  4. 3

  5. 4

  6. 5

  7. more than 5 and less than 10

  8. (correct answer)

    10 or more

Question 5: (see above) Suppose this modified processor is executing the following assembly snippet:

    irmovq $0x1234, %rax
    mrmovq 4(%rax), %rbx
    subq %rbx, %rcx

If this processor resolves hazards in this assembly with stalling and forwarding, how many cycles of stalling will be required?

  1. 0

  2. (correct answer)

    1

  3. 2

  4. 3

  5. 4

  6. 5

  7. more than 5 and less than 10

  8. 10 or more

Quiz 14

Question 1: Which of the following are examples of temporal locality?
Select all that apply

  1. (correct answer)

    the instructions in a loop are fetched multiple times

  2. most consecutively executed instructions in a loop are adjacent in memory

  3. (correct answer)

    when searching a binary search tree repeatedly (for many different keys), the children of the root node are accessed repeatedly

  4. when accessing a hashtable repeatedly, accesses are evenly distributed throughout the hashtable

  5. when looking for a newline in a C-style string, the string is scanned from beginning to end

Question 2: Which of the following are examples of spatial locality?
Select all that apply

  1. the instructions in a loop are fetched multiple times

  2. (correct answer)

    most consecutively executed instructions in a loop are adjacent in memory

  3. when searching a binary search tree repeatedly (for many different keys), the children of the root node are accessed repeatedly

  4. when accessing a hashtable repeatedly, accesses are evenly distributed throughout the hashtable

  5. (correct answer)

    when looking for a newline in a C-style string, the string is scanned from beginning to end

Question 3: If a cache line is storing the values in memory around address X, then the tag in this cache line contains

  1. (correct answer)

    part of the address X

  2. part of the bytes in memory around address X

  3. information about when memory address X was last accessed

  4. information about whether the cache's value for memory address X is consistent with the value in memory

Question 4: In order to identify whether a memory access to the byte at address X hits in a cache, it is necessary to access:
Select all that apply

  1. (correct answer)

    the valid bits for the lines in the set of the cache corresponding to address X

  2. (correct answer)

    the tag bits for the lines in the set of the cache corresponding to address X

  3. the valid bits of sets adjacent to the set of the cache corresponding to address X

  4. the block offset bits of the address X

  5. (correct answer)

    the set index bits of the address X

In section 6.4.7, our textbook talks about miss rate, hit time and miss penalty. Together these determine the effective performance of a cache.

Question 5: (see above) Which of the following are likely to be improved by switching from a direct-mapped to set-associative cache of the same size?
Select all that apply

  1. hit time

  2. (correct answer)

    miss rate

  3. miss penalty

Question 6: (see above) Which of the following are likely to be improved by decreasing the size of the cache (but leaving the block size and associativity the same)?
Select all that apply

  1. (correct answer)

    hit time

  2. miss rate

  3. (dropped from quiz)(correct answer)

    miss penalty (I was thinking that it would be faster to write to a smaller cache, but it is reasonable to assume the cache completes the write to the cache while the processor is using the returned value, so no extra time is required)

Quiz 15

Consider a 1.5 MB 3-way set associative cache with 64B cache blocks, an LRU replacement policy, and a write-allocate, write-back policy.

Question 1: (see above) Values for which of the following addresses would be stored as part of the same cache block as 0x12345678 in this cache?
Select all that apply

  1. (correct answer)

    0x12345670

  2. 0x00045678

  3. 0x12345680

  4. 0x12300078

Question 2: (see above) Which of the following addresses have the same tag in this cache as 0x12345678 in this cache?
Select all that apply

  1. (correct answer)

    0x12345670

  2. 0x00045678

  3. (correct answer)

    0x12345680

  4. (correct answer)

    0x12300078

Question 3: (see above) How many sets are in this cache?

  1. 1024

  2. 2048

  3. 4096

  4. (correct answer)

    8192

  5. 16384

  6. 32768

  7. none of the above

Consider a program that reads a byte at each of the following addresses in the following order:

Question 4: (see above) How many cache misses will a 32B direct-mapped cache with 16B blocks experience on this access pattern, assuming the cache is initially empty?

  1. 0

  2. 3

  3. 6

  4. (correct answer)

    7

  5. 8

  6. 9

  7. none of the above

Question 5 (0 points): (see above) How many cache misses will a 4-way set associative 32B cache with 2B blocks experience on this access pattern, assuming the cache is initially empty?

  1. 0

  2. (correct answer)

    3

  3. 6

  4. 7

  5. 8

  6. 9

  7. none of the above

Question 6: Which of the following cache design changes increase the amount of metadata (non-data storage) required for a cache?
Select all that apply

  1. (correct answer)

    using a write-back policy instead of a write-through policy

  2. (correct answer)

    using an LRU replacement policy instead of a random replacement policy

  3. (correct answer)

    making the cache 8-way set associative instead of 4-way (without changing the block size or total number of data bytes stored)

  4. doubling the cache block size (without changing the associativity or total number of data bytes stored)

Quiz 16

Question 1: In section 5.1, the textbook discusses how memory aliasing can prevent compilers from performing some optimizations. Which of the following optimizations might be prohibited due to the possiblity of memory aliasing? Assume that pa and pb and declared as unsigned long *s, and x and y are declared as a unsigned longs.

[Dropped two options because the textbook says "two pointers point to the same place" rather than a more general description of aliasing. Also, the compiler can observe that the two droppped options would be an infinite loop if there were aliasing and still optimize them. (C allows compilers to remove infinite loops without side effects.)]

[I should have included setting y or *pb to 0 on the options below, too, though I think it was clear enough for most for the remaining options.]
Select all that apply

  1. (correct answer)

    converting while (*pb > 0) { *pa += 1; *pb -= 1; } into *pa += *pb;

  2. (dropped from quiz)(correct answer)

    converting while (y > 0) { *pa += 1; y -= 1; } into *pa += y;

  3. (dropped from quiz)(correct answer)

    converting while (*pb > 0) { x += 1; *pb -= 1; } into x += pb;

  4. converting while (y > 0) { x += 1; y -= 1; } into x += y;

Question 2: Which of the following optimizations are likely to increase the size of an executable?
Select all that apply

  1. (correct answer)

    function inlining

  2. (correct answer)

    loop unrolling

  3. accumulating results in registers instead of in memory

  4. relocating a function call outside of a loop

Question 3: Suppose I try to optimize code by using function inlining, but discover that the resulting code is slower. Which of the following are likely causes of this?
Select all that apply

  1. inlining created more memory aliasing, making it harder for the compiler to perform other optimizations

  2. (correct answer)

    inlining increased the number of instruction cache misses

  3. (accepted any answer)

    inlining increased the number of instructions in a loop, causing it to run slower [intended intrepretation was "number of instructions executions in a loop", but plausible explanation if you're thinking about the amount of space required to store all the instructions a loop contains/executes]

  4. inlining required the compiler to store values on the stack that were previously stored in registers

Question 4: Section 5.7.2 and FIgure 5.12 indicates that that the Intel Haswell microarchitecture has a fully pipelined functional unit that can issue one multiplication per cycle, but with a 3 cycle latency. Given this, how many cycles does it take to use this functional unit to compute (x * y) * z? (Give the total latency of the computation. Assume no extra time is required to provide inputs to or read outputs from the functional unit.)

  1. 2 cycles

  2. 3 cycles

  3. 4 cycles

  4. 5 cycles

  5. (correct answer)

    6 cycles

  6. 7 cycles

  7. 8 cycles

Quiz 17

Consider the following three pieces of C code:

/* version 1 */
for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j)
        A[i * N + j] = B[j * N + i];

/* version 2 */
for (int j = 0; j < N; ++j)
    for (int i = 0; i < N; ++i)
        A[i * N + j] = B[j * N + i];

/* version 3 */
for (int ii = 0; ii < N; ii += 64)
    for (int j = 0; j < N; ++j)
        for (int i = 0; i < ii + 64; ++i)
            A[i * N + j] = B[j * N + i];

[Version 3 was meant to start the innermost loop from ii. Given that, version 2 would have better spatial locality in B. Without that, it's ambiguous whether version 2 has better spatial locality in B in than version 3. Also, given that version 3 has better temporal locality in A.]

Assume N is a large multiple of 64.

Question 1: (see above) In which version do accesses to A have the best temporal locality?

  1. version 1

  2. version 2

  3. (correct answer)

    verison 3 [only one that accesses any element of A twice]

  4. (correct answer)

    two or more versions are tied for best spatial [copy/paste error, meant: temporal] locality for accesses to A

Question 2: (see above) In which version do accesses to A have the best spatial locality?

  1. (correct answer)

    version 1

  2. version 2

  3. verison 3

  4. two or more versions are tied for best spatial locality for accesses to A

Question 3: (see above) In which version do accesses to B have the best spatial locality?

  1. version 1

  2. (correct answer)

    version 2

  3. (correct answer)

    verison 3

  4. (correct answer)

    two or more versions are tied for best spatial locality for accesses to B

Consider the following C code:

long array[1024 * 1024];
for (int x = 0; x < 10; ++x)
    for (int i = 0; i < 1024 * 1024; i += 1024)
        array[i] += 1;

Assume that all variables but array are placed into registers and that array is placed in memory such that the address of array[0] is a multiple of the cache block size (i.e. all of its block offset bits are 0). Also, assume that longs are 8 bytes and all caches are initially empty.

[The numbers in parenthesis were meant to correspond to the numbers before, but due to some silly mathematical errors, one did not.]

Question 4: (see above) How many data cache misses will this C code experience on a 64KB (64 * 1024 byte) direct-mapped cache with 64B cache blocks?

  1. (correct answer)

    10240 [values at i and i + 1024 * 64 / 8 map to same cache set]

  2. 1024

  3. 9664 (1024 + (1024 - 64) * 9)

  4. 9600 ((1024-64) * 10)

  5. 5632 (1024 + (1024 - 512) * 9)

  6. 6144 ((1024 - 512) * 10)1024 + (1024 - 512) * 10

  7. none of the above

Question 5 (0 points): (see above) [Dropped, because I needed to specify the replacement policy for you to answer this question.]

How many data cache misses will this C code experience on a 1.5MB (1536 * 1024 byte) 3-way set associative cache with 64B cache blocks?

  1. (correct answer)

    10240 [values at i and i + 1024 * 512 / 8 and i + 1024 * 512 / 8 * 2 and i + 1024 * 512 / 8 * 3 and i + 1024 * 512 / 8 * 4 and up to * 6 map to same cache set, so this is what would happen with an LRU replacement policy]

  2. 1024 [if it was an array of bytes]

  3. 9664 (1024 + (1024 - 64) * 9)

  4. 9600 ((1024-64) * 10)

  5. 5632 (1024 + (1024 - 512) * 9)

  6. 6144 ((1024 - 512) * 10)1024 + (1024 - 512) * 10

  7. (correct answer)

    none of the above

Question 6: Which of the following optimizations are often difficult for compilers to do because of aliasing?
Select all that apply

  1. function inlining

  2. (dropped from quiz)

    loop unrolling [I think aliasing is not a big issue here, but it is reasonable to think that the loop bound being aliased would cause lots of problems and not be a rare occurrence.]

  3. (correct answer)

    cache blocking

  4. (correct answer)

    storing an repeatedly accessed array element in a register instead of loading it repeatedly from memory

Question 7: Which of the following optimizations reduce the number of instructions a program executes?
Select all that apply

  1. (correct answer)

    function inlining

  2. (correct answer)

    loop unrolling

  3. cache blocking

  4. changing loop orders

Quiz 18

Quiz 19

Question 1: Using multiple accumulators increases performance by
Select all that apply

  1. decreasing the number of instructions a program executes

  2. decreasing the number of cache misses a program experiences

  3. (correct answer)

    increasing the amount of the processor's parallelism a program can use

  4. decreasing the amount of pointer aliasing in a program

Question 2: Which of the following assembly snippets should be faster on a modern processor (like that described in section 5.7.1-2 of the book)?

  1. addq %r8, %rax
    addq %r9, %rax
    addq %r10, %rax
    
  2. (correct answer)
    addq %r8, %rax
    addq %r9, %r10
    addq %r10, %rax
    
  3. addq %r8, %r9
    addq %r9, %rax
    addq %r10, %rax
    

In section 8.1.2, our book classifies hardware exceptions into four categories:

For this question, indicate what kind of exception each scenario is, based on the book's terminology. None of the below are aborts, so we have omitted that answer.

Question 3: (see above) If a process requests the operating system read some data from a file into the process's memory, this will trigger what kind of exception?

  1. interrupt

  2. (correct answer)

    trap

  3. fault

Question 4: (see above) If a process accesses memory that is not valid, this will trigger what kind of exception?

  1. interrupt

  2. trap

  3. (correct answer)

    fault

Question 5: Each process has:
Select all that apply

  1. (correct answer)

    its own view of the contents of memory

  2. its own exception table base register

  3. (correct answer)

    its own program counter

  4. (correct answer)

    its own values for general-purpose registers like %rax

Question 6: Exception handlers (of the kind discussed in our textbook) are:

  1. (correct answer)

    a special kind of function that could be written in assembly

  2. hardware components built-in to a processor

  3. part of various I/O devices, like keyboards

  4. a special kind of process the operating system runs

Quiz 20

Mutlquestion Consider the following code:

u = a * b
v = c * d
w = u + v

x = a + b
y = x + c
z = y * d

Question 1: (see above) If this is run on a processor with the following functional units:

  • several pipelined multipliers which accept one multiply per cycle and have 3 cycle latency;
  • several pipelined adders which accept one add per cycle and have 2 cycle latency;

Then, how long should the computations take? (Assume there is no penalty for forwarding the results of a functional unit to another calculation.)

  1. 2 cycles

  2. 3 cycles

  3. 4 cycles

  4. 5 cycles

  5. 6 cycles

  6. (correct answer)

    7 cycles

  7. 8 cycles

  8. 9 cycles

  9. 10 cycles

  10. none of the above

Question 2: Which of the following are helpful properties for a function to have if a programmer or compiler is going to have it take advantage of vector instructions?
Select all that apply

  1. it has a variety of kinds of operations it performs that can take advantage of the variety of functional units available in the processor

  2. (correct answer)

    it performs the same operation on many different pieces of data

  3. (correct answer)

    the data it operates on is contiguous in memory

  4. the result of each computation is used immediately by the next computation

Question 3: If process A performs a system call and the exception handler for the system call performs a context switch to process B, then the context of process A is stored where?

  1. in special registers in the processor

  2. (correct answer)

    in the operating system's memory

  3. on the stack of process A

  4. in the exception table

  5. in the system call wrapper

Question 4: The exception table contains:

  1. (correct answer)

    a list of pointers to machine code

  2. a list of recently triggered exceptions

  3. the context of the program that was interrupted by an exception

  4. the contexts of exception handlers

Question 5: Which of the following are likely to run in kernel mode?
Select all that apply

  1. (correct answer)

    the timer interrupt handler

  2. the C standard library function sprintf

  3. (correct answer)

    the code that sets up the exception table

Question 6: Which of the following are part of a process's context?
Select all that apply

  1. (correct answer)

    its stack pointer

  2. (accepted any answer)

    the values stored on its stack (the book says contexts "consists of ... the user's stack ...". I think this is not what is almost ever met by a "context" since it's not something that the OS would change as part of a context switch, but I'll certainly accept that some may have answered this based on having read the book closely enough...)

  3. (correct answer)

    its program counter value

  4. (correct answer)

    its condition codes

  5. the next time at which a timer interrupt will be triggered

Quiz 21

Question 1: Which of the following are ways in which Linux's signals are similar to exceptions?
Select all that apply

  1. both signal and exception handlers are run in kernel mode

  2. (correct answer)

    both signals and exceptions can happen as a result of an event external to the program

  3. (correct answer)

    both signals and exceptions can happen as a result of an intentional action within a program

  4. (correct answer)

    both signals and exceptions can happen as a result of an invalid memory access within a program

Question 2: Some Linux signals correspond to hardware exceptions. When the corresponding hardware exception occurs and a process previously registered a signal handler for the signal,

  1. the hardware directly runs the signal handler, because the operating system configured it to do so

  2. a library function in the process detects that it needs to run the signal handler by making a system call

  3. the operating system creates or forks a new process to run the signal handler

  4. (correct answer)

    the operating system causes the signal handler to be invoked as part of a context switch to the process

Question 3: If values are part of the same virtual page, then, if the values are loaded in memory, the values are

  1. (correct answer)

    stored on the same physical page

  2. stored on different physical pages

  3. may or may not be stored on different physical pages

Question 4: When virtual memory is used for caching, what entity is responsible for loading data in response to a cache miss?

  1. (correct answer)

    the operating system

  2. the memory management unit (MMU)

  3. the L2 or L3 cache

  4. the DRAM

Question 5: A page table entry contains
Select all that apply

  1. (correct answer)

    physical page numbers

  2. virtual page numbers

  3. page contents

Quiz 22

Question 1: A typical operating system may disable interrupts

  1. to allow programs running in user-mode to run faster

  2. (correct answer)

    to more easily/safely manipulate data structures used by exception handlers

  3. while running a (Linux-style) signal handler in case it manipulates global data

  4. as the last step of a context switch

Question 2: Signal handlers
Select all that apply

  1. (correct answer)

    follow the same calling convention as ordinary C functions

  2. run in a seperate address space from the rest of their program

  3. (correct answer)

    trigger a fault if they access an invalid memory address

  4. (correct answer)

    can be interrupted by timer interrupts

  5. (correct answer)

    can invoke a system call

Consider a processor which has:

Question 3: (see above) How large is a physical page number on this processor?

  1. 3 bits

  2. 4 bits

  3. 5 bits

  4. 6 bits

  5. (correct answer)

    7 bits

  6. 8 bits

  7. 9 bits

  8. 10 bits

  9. 24 bits

  10. none of the above

Question 4: (see above) Which of the following virtual addresses are part of the same virtual page as 0x12345678 on this processor?
Select all that apply

  1. (correct answer)

    0x1234FFFF

  2. 0xFFFF5678

  3. (correct answer)

    0x12345FFF

  4. 0x12300678

Question 5: Suppose a processor has 4096-byte pages and 4 byte page table entries, and its page table base register is set to physical byte address 0x1234000. Assume the processor uses the scheme we discussed in lecture to store its page tables as a contigous array of page table entries in memory. At what physical address will the processor look for the page table entry for the virtual page number 3?

  1. 0x1234000

  2. 0x1234003

  3. (correct answer)

    0x123400C

  4. 0x1234C00

  5. 0x1234300

  6. 0x1240000

  7. none of the above

Quiz 23

Question 1: When a page fault occurs, and the operating system loads the virtual page that caused the fault from disk and updates the page table accordingly (the scenario depicted in Figure 9.13(b)), it will

  1. continue running the program starting with the instruction after the one which caused the fault

  2. (correct answer)

    continue running the program starting with the instruction which caused the fault

  3. restart the program from the beginning of its current method

  4. send the something like the SIGSEGV signal to the program

Question 2: In a multi-level page table like described in section 9.6.3, the page table entries for the first level contain

  1. the virtual addresses or virtual page numbers containing the second-level page tables

  2. the physical addresses of linked lists of page table entries

  3. (correct answer)

    the physical addresses or physical page numbers of the second-level page tables

  4. the physical page numbers where the program's actual data is stored

Question 3: Multi-level page tables are most efficient at representing

  1. (correct answer)

    address spaces with large, contiguous regions of unallocated memory

  2. address spaces where the allocated addresses are evenly distributed throughout the address space

  3. address spaces where several virtual pages map to each allocated physical page

  4. address spaces with smaller pages

Question 4: The Core i7 address translation described by the textbook uses a four-level page table. When the processor successfully fetches an one-byte instruction using this page table, that can require up to ___ memory accesses.

  1. 1

  2. 2

  3. 3

  4. 4

  5. (correct answer)

    5

  6. 6

  7. 8

Quiz 24

Quiz 25

Question 1: The TLB index is computed from

  1. (correct answer)

    part of the virtual page number

  2. part of the physical page number

  3. part of the page offset

  4. part of the page offset and/or part of the virtual page number

  5. part of the page offset and/or part of the physical page number

Question 2: When the processor tries to access a memory address X and a TLB miss happens, in the design described in the book, the MMU

  1. triggers a fault

  2. (correct answer)

    retrieves a page table entry from cache or memory

  3. retrieves a page from disk

  4. accesses memory address X in the L1 cache

Question 3: In the design described in the example in section 9.6.4 of the book, when performing a memory access the result of the TLB access is needed to
Select all that apply

  1. (correct answer)

    compute the cache tag to compare with the value stored in the cache

  2. compute the location of a page table entry

  3. compute the base address of a page table

  4. compute the physical page offset, which is part of the cache index

Quiz 26

Consider a system with:

Question 1: (see above) When the processor is accessing the virtual address 0x00CAFECAFE, what is the index of the TLB set being accessed? (The index of the first TLB set is 0.)

  1. 0

  2. 1

  3. 2

  4. (correct answer)

    3

  5. 4

  6. 5

  7. 6

  8. 7

  9. none of the above

Question 2: (see above) If the page table base register contains the byte address 0x100000, then where will the processor look for the first-level page table entry for the virtual address 0x0012345678

  1. 0x100000

  2. 0x100001

  3. 0x100004

  4. (correct answer)

    0x100010

  5. none of the above

Question 3: (see above) Suppose a process has the following virtual page numbers allocated in its page tables:

  • VPN 0x10
  • VPN 0xFF
  • VPN 0x5000
  • VPN 0x5001

All other pages are invalid. How many pages of page tables does this process need on this processor?

  1. 1

  2. 2

  3. (correct answer)

    3

  4. 4

  5. 5

  6. more than 5

Question 4: (see above) Suppose the system wants to have an L1 cache organized based on physical addresses (so, e.g., a given physical address always maps to the same set), but lookup the appropriate set of the L1 cache before it completes the TLB lookup. Which of the following L1 data cache designs would allow this?
Select all that apply

  1. 32KB, direct-mapped with 64B blocks

  2. (correct answer)

    32KB, 2-way set associative cache with 128B blocks

  3. 128KB, 4-way set associative cache with 32B blocks