All quizzes (key)

All quizzes (key)

Quiz 00

Question 1: When statically linking an executable, which of the following operations are done by the linker?
Select all that apply

  1. converting a string literal into its ASCII representation

  2. interpreting the machine code in object files to identify where each instruction begins and ends

  3. (correct answer)

    replacing parts of machine code that correspond to memory addresses

  4. (correct answer)

    deciding where to place each object's file data segment in memory

Question 2: Suppose the register %rax contains 0x100, and the register %rbx contains 0x10. What value is stored in %rcx as a result of movq 2(%rax,%rbx,4), %rcx?

  1. the integer 0x142

  2. the integer 0x112

  3. the integer 0x124

  4. the 64-bit value that was in memory at address 0x124

  5. (correct answer)

    the 64-bit value that was in memory at address 0x142

  6. the 64-bit value that was in memory at address 0x112

  7. not enough informtion is provided

  8. none of the above

Question 3: Consider the following C code:

long foo(long x, long y) {
    while (x >= 0) {
        y -= 2;
        x += y;
    }
    return x;
}

Which of the following are valid translations of it to assembly? (You may ignore what the function does in cases of integer overflow or underflow.)
Select all that apply

  1. (correct answer)
    foo:
        cmpq $0, %rdi
        jl end
        subq $2, %rsi
        addq %rsi, %rdi
        jmp foo
    end:
        movq %rdi, %rax
        ret
    
  2. foo:
        cmpq $0, %rdi
        jge end
    start:
        subq $2, %rsi
        addq %rsi, %rdi
        jge start
    end:
        movq %rdi, %rax 
        ret
    
  3. (correct answer)
    foo:
        addq $0, %rdi
        jmp middle
    start:
        subq $2, %rsi
        addq %rsi, %rdi
    middle:
        jge start
    end:
        movq %rdi, %rax 
        ret
    

Question 4: What is the value of foo after the following C code runs?

int foo[5] = {1, 3, 5, 7, 9};
int *p = &foo[1];
p += 1;
*p -= 1;
p += 2;
*p -= 1;
p -= 1;
  1. {1, 3, 5, 7, 9}

  2. {0, 3, 4, 7, 9}

  3. {0, 2, 4, 7, 9}

  4. {0, 2, 5, 7, 9}

  5. (correct answer)

    {1, 3, 4, 7, 8}

  6. {1, 3, 4, 7, 7}

  7. none of the above

Question 5: What are the possible values of the condition codes ZF and SF after the following is executed?

leaq (%rax, %rbx), %rcx
subq %rax, %rcx
subq %rbx, %rcx

Select all that apply

  1. ZF = 0, SF = 0

  2. (correct answer)

    ZF = 1, SF = 0

  3. ZF = 0, SF = 1

  4. ZF = 1, SF = 1

Quiz 01

Question 1: In order to tell the mnemonic (e.g. addq, rrmovq, jle) for a Y86-64 instruction from its machine code, one needs to know:
Select all that apply

  1. (correct answer)

    the value of the first byte of its machine code

  2. the value the condition codes will have when the instruction is executed

  3. the value of the second byte of its machine code, if the instruction takes a register operand

  4. how many bytes of machine code the instruction uses

Question 2: Which of the following are possible versions of the addq instruction on Y86-64? (The answer is different than for x86-64.)
Select all that apply

  1. addq $1, %rax

  2. (correct answer)

    addq %rcx, %r14

  3. addq (%rcx), %r14

  4. addq (%rcx), (%rdx)

Question 3: The textbook claims that each of the following attributes is typical of a RISC instruction set. Which of them is true about Y86-64?
Select all that apply

  1. (correct answer)

    having less than 100 instructions

  2. having fixed-length encodings

  3. (correct answer)

    supporting very few ways of specifying memory addresses

  4. (correct answer)

    memory is accessed by seperate "load" or "store" instructions that don't do other arithmetic or logic

Quiz 02

Question 1: Which of the following are parts of ISA (as opposed to microarchitecture)?
Select all that apply

  1. (correct answer)

    The set of instructions that exist

  2. The implementation of each instruction

  3. The time taken to run each instruction

  4. (correct answer)

    The semantics of each instruction

Question 2: RISC processors are typically described as having:
Select all that apply

  1. (correct answer)

    Fewer, simpler instructions compared to CISC processors

  2. (correct answer)

    Fixed-length instructions

  3. (correct answer)

    Few addressing modes compared to CISC processors

  4. Instructions closer to high-level languages

Question 3: What is the operation performed by mrmovq is Y86-64?

  1. Move a register value to a memory location

  2. Move an immediate value to a a memory location

  3. Move a value from a memory location to another memory location

  4. (correct answer)

    Move a value from a memory location to a register

Question 4: Given the declaration int array[100];, which of the following C expressions are always true?
Select all that apply

  1. sizeof(array) == sizeof(&array[0])

  2. sizeof(array[0]) / sizeof(array) == 100

  3. array[10] == *((int*) &array[0] + sizeof(int) * 10)

Quiz 03

Question 1: If x is an int, then the result of x & 0 in C is

  1. (correct answer)

    always zero

  2. zero if x is 0 or 1; otherwise non-zero

  3. zero if x is 0; otherwise non-zero

  4. zero if x is non-negative; otherwise non-zero

  5. dependent on the size of an int

  6. none of the above

Question 2: As described in section 4.2.5 of our textbook, if the (non-clock) input to a register changes from 11 to 99 and remains at 99 indefinitely, then the output of the register will change to 99

  1. (correct answer)

    the next time the clock signal rises

  2. the next time the clock signal falls

  3. after the clock signal falls and rises again

  4. after a fixed delay equal the interval between rising clock edges

  5. after an amount of time that does not depend on the clock signal

Question 3: In the register file described in section 4.2.5 of our textbook, if one of the read ports of the register file changes, then the corresponding output signal will change (to the value of the register being read)

  1. the next time the clock signal rises

  2. the next time the clock signal falls

  3. after the clock signal falls and rises again

  4. after a fixed delay equal the interval between rising clock edges

  5. (correct answer)

    after an amount of time that does not depend on the clock signal

Question 4: In the data memory described in section 4.2.5 of our textbook, when the "write" signal is set to 1, then the value stored in memory will change based on the "data input" and "address" inputs

  1. (correct answer)

    the next time the clock signal rises

  2. the next time the clock signal falls

  3. after the clock signal falls and rises again

  4. after a fixed delay equal the interval between rising clock edges

  5. after an amount of time that does not depend on the clock signal

Question 5: In the processor design described in the textbook, which of the following instructions do not use the result of the ALU?
Select all that apply

  1. rmmovq

  2. (correct answer)

    halt

  3. subq

  4. pushq

  5. (correct answer)

    jmp

Quiz 04

Question 1: If x and y are signed 4-byte integers whose absolute value is less than 9 million, which of the following C expressions are always true?
Select all that apply

  1. (x & y) < (x | y)

  2. ((x ^ y) & 0xFFF) < ((x | y) & 0xFFF)

  3. (accepted any answer)

    (x & 0x1F80) >> 7 == (x >> 7) & 0x3F

Note on dropped part: (x & 0x1F80) >> 7 == (x >> 7) & 0x3F parses as something like ((x & 0x1F80) >> 7 == (x >> 7)) & 0x3F, but this sort of precedence trivia is not what I wanted to ask about. I intended to write ((x & 0x1F80) >> 7) == ((x >> 7) & 0x3F).

Question 2: Given a signed integer x, which C expression results in that integer with the 11 through 16th (inclusive) least significant bits (bits 10-15 if counting the least significant bit as bit 0) set to 1?
Select all that apply

  1. ((x >> 10) | 0x1F) << 15

  2. (accepted any answer)

    x | (0x1F << 10)

  3. (accepted any answer)

    ((x >> 15) << 15) | 0x7C00 | (x & 0xFFF)

I should have written 11 to 15/10-14, but I didn't so none of them are right. But careful reading of selecting ranges of bits isn't really what I wanted to be testing here, so I'll just not deduct points for it.

Question 3: The addq instruction in our processor adds two values from registers and writes back the result in the destination register. Which of the following components does it use in the processor?
Select all that apply

  1. (correct answer)

    PC

  2. (correct answer)

    Register files

  3. (correct answer)

    ALU

  4. (correct answer)

    Instruction memory

  5. Data Memory

Question 4: The nop instruction updates the PC with some value. What is the correct value?

  1. PC

  2. (correct answer)

    PC + 1

  3. PC - 1

  4. PC + 8

Quiz 05

Question 1: Given the case expressions described in section 4.2.3 of our book, for which of the follow values of x and y is the result of the following expression 100

    [
        x > 90 : x + y,
        y < 50 : x - y,
        1 : x + x,
    ]

Select all that apply

  1. (correct answer)

    x = 50, y = 50

  2. x = 140, y = 40

  3. (correct answer)

    x = 99, y = 1

  4. x = 50, y = 40

Question 2: Suppose you are writing an HLCRS program and discover that values in memory are changing while running an instruction that is not supposed to update memory at all. Which of the following are likely causes of this?
Select all that apply

  1. mem_readbit is set to 1 when it should be 0

  2. (correct answer)

    mem_writebit is set to 1 when it should be 0

  3. reg_dstM has the wrong value

  4. reg_inputM has the wrong value

  5. mem_addr has the wrong value

  6. mem_input has the wrong value

Question 3: The SEQ processor design uses conceptual stages. Which of the following stages are used by the addq instruction?
Select all that apply

  1. (correct answer)

    Fetch

  2. (correct answer)

    Decode

  3. (correct answer)

    Execute

  4. Memory

  5. (correct answer)

    Writeback

Question 4: The popq instruction pops a value from the top of the stack and moves it to a register. Which of the following statement is true?

  1. It computes the starting location of the stack during the execution stage

  2. It computes the current location of the top of the stack during the execution stage

  3. (correct answer)

    It computes the new location of the top of the stack during the execution stage

  4. It does not use the execution stage

Quiz 06

Question 1: Consider the following HCLRS code:

    register xY {
        a : 32 = 0;
        b : 32 = 0;
    }
    x_a = Y_a + Y_b;
    x_b = Y_b + 1;

Assume we call the cycle in which Y_a and Y_b take on their initial value, 0, cycle number 1. Given this cycle numbering, what is the value of Y_a during cycle number 3?

  1. 0

  2. (correct answer)

    1

  3. 2

  4. 3

  5. 4

  6. none of the above

Question 2: Consider the following HCLRS code:

    reg_inputE = mem_output;

Which of the following is this snippet most likely to help accomplish?

  1. reading a value from a register in the register file and writing it into memory

  2. (correct answer)

    reading a value from memory and writing it into a register in the register file

  3. reading a value from memory and writing it into the program counter register

  4. storing a memory address inside a register in the register file

Question 3: Our textbook divides its single-cycle processor design into six stages. What is true about these stages?

  1. All stages are used in every instruction

  2. The "memory" stage is responsible for reading instructions from memory

  3. The "writeback" stage is responsible for writing a value in memory

  4. (correct answer)

    The "execute" stage is responsible for ALU operations

Question 4: Which stage is responsible for generating the data memory address?

  1. PC Update

  2. (correct answer)

    Memory

  3. Decode

  4. (correct answer)

    Execute

We think "Execute" is the better address, but the textbook's design has a MUX it considers part of the memory stage which chooses between the value computed by the ALU (used most of the time) and the value output by the register file (for cases when memory is read from %rsp directly).

Quiz 07

Question 1: Consider the following Y86 code

    rrmovq %rax, %rbx
    subq %rbx, %rcx
    irmovq $10, %rcx
    jle foo
    xorq %rcx, %rbx

Which of the following statements about data dependencies in this code are true?
Select all that apply

  1. (correct answer)

    there is a data dependency between the rrmovq and subq

  2. there is a data dependency between the rrmovq and irmovq

  3. there is a data dependency between the subq and the xorq

  4. (correct answer)

    there is a data dependency between the rrmovq and the xorq

Question 2: Adding more pipeline stages to a processor usually increases
Select all that apply

  1. (correct answer)

    the amount of time from when an instruction starts executing until when it finishes

  2. (correct answer)

    the number of instructions completed per unit time

  3. (correct answer)

    the number of components in the processor design

Question 3: Consider a pipelined processor that executes an mrmovq instruction by computing the address in one cycle, then reading the data memory from this address in the next cycle. To pass the address between these two cycles, the processor will

  1. store the address on the stack

  2. (correct answer)

    store the address in a pipeline register

  3. connect the ALU that computes the address directly to the data memory input

  4. store the address in the PC register

Question 4: When deciding on how to divide instruction execution into two pipeline stages, which of the following is the best strategy?

  1. have stage that does the fastest part of instruction execution and another which handles the rest

  2. have stage that handles one kind of instructions (e.g. arithmetic instructions) and another pipeline stage that handles all other kinds

  3. (correct answer)

    divide the instruction execution into two stage such that each stage takes about the same amount of time

Quiz 08

Question 1: Consider choosing between several options for designing a pipelined processor based on a single cycle design with a cycle time of 10 nanoseconds. Which of these options is likely to have the best performance?

  1. a processor with six pipeline stages: five pipeline stages that each require 1 nanosecond and one pipeline stage that requires 6 nanoseconds

  2. a processor with two pipeline stages that each require 5.5 nanoseconds

  3. (correct answer)

    a processor with three pipeline stages that each require 4 nanoseconds

  4. the original single cycle processor (with no pipelining added)

Question 2: Suppose one wants to find the processor design which will get the best performance on a benchmark program that takes many hours to run. From our possible processor designs, we should choose the one with

  1. the smallest average instruction latency

  2. the largest average instruction latency

  3. (correct answer)

    the largest average instruction throughput

  4. the largest number of pipeline stages

  5. the least amount of register delay

Question 3: To execute the mrmovq D(rB), rA instruction in the pipeline design we described in lecture, which of the following values must appear in the pipeline regsiters between the execute and memory stages?
Select all that apply

  1. (correct answer)

    the 4-bit register number for rA

  2. the 4-bit register number for rB

  3. the 64-bit register value read from register rB

  4. the 64-bit register value read from register rA

  5. (correct answer)

    the 64-bit computed address D(rB)

  6. the 64-bit value read from memory at address D(rB)

Question 4: Consider a five-stage pipelined processor with the following pipeline stages:

  • fetch
  • decode
  • execute
  • memory
  • writeback

Assume this processor reads registers during the middle of the decode stage of an instruction and writes registers at the end of the writeback stage. If the processor resolves hazards with stalling only, how many cycles of stalling will it require to execute the following assembly correctly?

addq %rcx, %rdx  
subq %rcx, %rax  
xorq %rax, %rdx  
  1. 0

  2. 1

  3. 2

  4. (correct answer)

    3

  5. 4

  6. 5

  7. 6

  8. 7

  9. 8

  10. none of the above

Quiz 09

Question 1: With branch prediction, the instruction fetched immediately after a conditional jump instruction

  1. does not start running until the conditional jump's condition codes are known

  2. computes the condition codes that will be used by the condtional jump

  3. (correct answer)

    might not be allowed to complete normally

  4. might cause the program to behave incorrectly if it alters the condition codes

  5. might cause the program to behave incorrectly if it is another branch

Question 2: Forwarding
Select all that apply

  1. (correct answer)

    requires less clock cycles than stalling

  2. requires the register file to support writing more values at a time

  3. (correct answer)

    requires register numbers to be compared between different instructions

  4. (correct answer)

    can replace register file reads

Question 3: Forwarding alone can't resolve the load/use hazard our textbook describes because

  1. (correct answer)

    memory reads occur in a later stage than register reads

  2. the data memory only performs writes on the rising edge of the clock signal

  3. the load instruction (e.g. mrmovq) might write into the same register it used to compute the address to load

  4. the load instruction might be immediately followed by a store of a different value to the same memory location

Question 4: A program has good temporal locality if
Select all that apply

  1. (accepted any answer)(correct answer)

    it accesses a very small amount of data many times

  2. it accesses its data at regular, predictable interval

  3. (correct answer)

    after it accesses any piece of data, it tends to access that piece of data many times before accessing other pieces of data

(dropped "very small amount of data many times" answer due to lack of specification about lack of intervening accesses, etc.)

Question 5: A program has good spatial locality if
Select all that apply

  1. it spreads its accesses out evenly over its data

  2. (correct answer)

    whenever it accesses an array element, it accesses the following array element next

  3. (correct answer)

    whenever it accesses a field of an object, it tends to access the other fields of the object shortly afterwards

Quiz 10

Consider the following assembly snippet

    mrmovq (%rax), %rcx
    addq %rcx, %rax
    irmovq $0x1234, %rsp
    popq %rcx
    ret

Question 1: (see above) To execute this on a five-cycle pipelined processor like the one described in lecture with minimum stalling what values will be forwarded?
Select all that apply

  1. (correct answer)

    %rcx from mrmovq to addq

  2. %rax from mrmovq to addq

  3. %rcx from mrmovq to popq

  4. (correct answer)

    %rsp from irmovq to popq

  5. %rax from addq to ret

  6. (correct answer)

    %rsp from popq to ret

Question 2: (see above) Consider a three-stage pipelined processor with the following stages:

  • Fetch and Decode
  • Execute
  • Memory and Writeback

How many cycles of stalling will this processor require to execute the above snippet on this three-stage processor?

  1. 0

  2. (correct answer)

    1

  3. 2

  4. 3

  5. (correct answer)

    4 or more

1 with forwarding, 4+ without (or if you consider the stalling for ret part of the ret and not the next instruction). This question was accidentally listed as a select-all-that-apply when the quiz was live.

Consider the five-stage pipelined processor described in lecture with forwarding and branch prediction that assumes all branches are taken. Suppose this processor executes the following assembly snippet in which the branch is not taken:

    popq %rax
    andq %rax, %rax
    jle foo
    mrmovq %rcx, %rcx

(note: This was the question as written, but I meant to write mrmovq (%rcx), %rcx for the last instruction.)

Question 3: (see above) How many cycles of stalling will be required to execute these instructions on this processor?

  1. it depends on what the instructions near the label foo are

  2. 0

  3. (correct answer)

    1

  4. 2

  5. 3 or more

Question 4: (see above) The mrmovq will perform its execute stage while

  1. not enough information; it depends on what the instructions near the label foo are

  2. popq performs its writeback stage

  3. andq performs its writeback stage

  4. jle performs its writeback stage

  5. (correct answer)

    none of the above

Question 5: To perform a pipeline stall -- that is, keeping an instruction from advancing to the next stage in the pipeline while letting the instructions in later stages proceed as usual -- using the stall and bubble signals provided by HCLRS, one needs to

  1. set the stall signals to true for all pipeline registers

  2. set the bubble signal to true for the register bank containing the PC

  3. (correct answer)

    set the bubble signal to true for one set of pipeline registers, and stall to true for the pipeline registers before it.

  4. set the stall signal to true for one set of pipeline registers, and bubble to true for the pipeline registers before it.

  5. set the stall signal to true for the PC and leave the stall and bubble signals false for all other pipeline registers

Quiz 11

Question 1: In order to determine if a request at address X hits/misses the cache, it is necessary to access:
Select all that apply

  1. (correct answer)

    The valid bits for the lines in the set of the cache corresponding to address X

  2. (correct answer)

    The tag bits for the lines in the set of the cache corresponding to address X

  3. The block offset bits of the address X

  4. The block offset bits of the address X+1

Question 2: The set index bits are part of which component?

  1. Tag bits

  2. Valid bit

  3. Data blocks

  4. (correct answer)

    Address

Question 3: Increasing the size of a cache without changing its block size or the number of blocks per set will usually decrease
Select all that apply

  1. (correct answer)

    the miss rate

  2. the hit time

Question 4: In section 6.3.1, our book divides cache misses into compulsory misses, conflict misses, and capacity misses. Switching from a direct-mapped cache to a set-associative cache (with a similar number of blocks and similar block size) would primarily address which kind of misses?

  1. compulsory

  2. capacity

  3. (correct answer)

    conflict

Quiz 12

Question 1: Which bits in the address are used to determine, after accessing a set, which way, if any, has data for the address when cache has the following properties: 32KB, 8-way set associative, 64B block size, uses 32-bit addresses?

  1. bit 0-5

  2. bit 6-11

  3. (correct answer)

    bit 12-31

  4. None of the above

Question 2: Which of the following cache design changes increase the amount of metadata (non-data storage) required for a cache?

  1. (correct answer)

    Increasing associativity without changing the cache size or block size

  2. Making a fully associative cache a direct mapped cache without changing the cache size and block size

  3. Doubling the cache block size without changing the associativity or the cache size

  4. Increasing the number of sets without changing the cache size and block size

Question 3: Recall the formula for average memory access time (AMAT):

average memory access time, AMAT = hit time + miss penalty × miss rate

Which of the following statement is true?

  1. AMAT will likely increase if we increase associativity (with or without changing the cache size)

  2. Miss penalty will likely decrease if we double the cache size

  3. (correct answer)

    Miss rate will likely decease if we double the cache size

  4. None of the above

Question 4: Suppose your program sums up every 4th element of an array. Each element of the array takes 4 bytes of space. Your cache block size is 16B. Initially your cache is empty.

    int sum = 0;
    for(int i = 0; i < N; i += 4)
            sum += v[i];

Which of the following statements are true?

  1. Every access will be a hit

  2. (correct answer)

    Every access will be a miss

  3. The pattern will be hit, miss, hit, miss, …

  4. The pattern will be miss, miss, hit, hit, miss, miss, hit, hit, …

Quiz 13

Question 1 (0 points): Consider the following two functions:

    void versionA(int *A, int *B, int n) {
        for (int i = 0; i < n; ++i)
            for (int j = 0; j < 8; ++j)
                A[i] = A[i] * B[j];
    }

    void versionB(int *A, int *B, int n) {
        for (int j = 0; j < 8; ++j)
            for (int i = 0; i < n; ++i)
                A[i] = A[i] * B[j];
    }

Assuming n is very large and the compiler does not reorder or omit memory accesses to the arrays A or B (or keep values from A or B in registers), which is likely to experience fewer cache misses?

  1. (correct answer)

    versionA

  2. versionB

  3. They will have about the same number of cache misses.

Question 2 (0 points): Consider two functions foo and bar:

    void foo(int *px, int *py, int *pz) {
        int t = *px;
        *px = *py;
        *py = *pz;
        *pz = t;
    }
    void bar(int *px, int *py, int *pz) {
        int t = *py;
        *py = *pz;
        *pz = *px;
        *px = t;
    }

Compilers are not allowed to generate the same code for foo and bar. Why is this?
Select all that apply

  1. (correct answer)

    the two functions should behave differently if px == py

  2. the two functions should behave differently if *px == *py and px == pz

  3. the two functions should generate different cache misses

Question 3 (0 points): Consider the following two versions of a loop in assembly:

Version A:

    movq $1023, %rax
    movq $1, %rbx
loop:
    imulq (%rcx,%rax,8), %rbx
    sub $1, %rax
    jne loop

Version B:

    movq $1022, %rax
    movq $1, %rbx
loop:
    imulq 8(%rcx,%rax,8), %rbx
    imulq (%rcx,%rax,8), %rbx
    sub $2, %rax
    jne loop

Version A performs at least one hundred more ______ than version B.
Select all that apply

  1. (dropped from quiz)

    additions

  2. (correct answer)

    subtractions

  3. multiplications

  4. (correct answer)

    conditional jumps

Quiz 14

Question 1: Consider the following code:

    for (int k = 0; k < N; ++k) {
        for (int i = 0; i < N; ++i) {
                #A[i][k] loaded once in this loop
                for (int j = 0; j < N; ++j) {
                    #B[i][j], A[k][j] loaded each iteration (if N is big)
                    B[i*N+j] += A[i*N+k] * A[k*N+j];
                    }
        }
        }

Assuming n is very large and the compiler does not reorder or omit memory accesses to the arrays, which of the following statements are true: (In the options below, "loaded" means loaded into a cache with a typical organization.)
Select all that apply

  1. (correct answer)

    Values from A[i*N+k] are loaded N^2 times

  2. (correct answer)

    A[i*N+k] is used N times after loading

  3. (correct answer)

    Values from B[i*N+j] are loaded N^3 times

  4. Values from A[k*N+j] are loaded N^2 times

Question 2: Consider the following codes: Version 0

    for (int k = 0; k < N; ++k) {
        for (int i = 0; i < N; ++i) {
                for (int j = 0; j < N; ++j) {
                    B[i*N+j] += A[i*N+k] * A[k*N+j];
                }
        }
    }

Version 1

    for (int kk = 0; kk < N; kk+=2) {
    for (int i = 0; i < N; ++i) {
            for (int j = 0; j < N; ++j) {
            for (int k = kk; k < kk + 2; ++k) {
                    B[i*N+j] += A[i*N+k] * A[k*N+j];

            }
        }
    }
   }

Assuming n is very large and the compiler does not reorder or omit memory accesses to the arrays, which of the following statements are true:
Select all that apply

  1. B[i*N+j] has better spatial locality in version 1

  2. (correct answer)

    B[i*N+j] has better temporal locality in version 1

  3. (correct answer)

    A[i*N+k] has better spatial locality in version 1

  4. A[k*N+j] has better spatial locality in version 1

Question 3: Aliasing often makes it difficult for compilers to perform which of the following optimizations?
Select all that apply

  1. loop unrolling

  2. (correct answer)

    cache blocking

  3. function inlining

  4. (correct answer)

    removing redundant memory accesses

Question 4: Which of the following optimizations will result in a program that executes less instructions to perform the same amount of work? (For the purposes of this question, count each time an instruction is run as a seperate instruction executed; for example, a loop of ten instructions run five times executes fifty instructions.)
Select all that apply

  1. (correct answer)

    loop unrolling

  2. cache blocking

  3. (correct answer)

    function inlining

  4. (accepted any answer)(correct answer)

    removing redundant memory accesses

(removing redundant accesses may save instructions, e.g. add (%rax), %rbx versus mov %rbx, (%rcx); add (%rax), %rbx, but it might not, e.g., add %rax, %rbx versus add (%rax), %rbx)

Quiz 15

Question 1: In order for multiple accumulators to significantly improve performance the performance of a loop
Select all that apply

  1. (correct answer)

    the processor must be able to perform multiple calculations in parallel

  2. (correct answer)

    more intermediate results must be stored between loop iterations

  3. the loop must have an even number of iterations

  4. the loop must have experienced many data cache misses before being optimized

Question 2: Extending the multiple accumulator optimization to a very large number of accumulators generally results in slower performance primarily because

  1. (correct answer)

    some of the accumulators end up being stored on the stack instead of registers

  2. the code with more accumulators has more branches, which are often mispredicted

  3. each accumulator is accessed less frequently, so a pipelined processor will forward their values less often

Question 3: On an out-of-order processor that can perform many additions per cycle in parallel, which of the following orders of operations is likely to be most efficient to compute the sum of a, b, c, d, e, and f:

  1. (correct answer)

    ((a+b)+(c+d))+(e+f)

  2. (((a+b)+c)+(d+e))+f

  3. ((((a+b)+c)+d)+e)+f

  4. (a+((b+c)+d))+(e+f)

Question 4: When an exception (in the sense of section 8.1) occurs while executing an application's code, code that is part of __________ is called by logic located inside ____________.

  1. (correct answer)

    the operating system / the processor

  2. that application / the operating system

  3. that application / the processor

  4. the operating system / that application

Quiz 16

Question 1: SIMD arithmetic instructions improve performance primarily by
Select all that apply

  1. (correct answer)

    performing more calculations given one instruction

  2. requiring less stalling than normal arithmetic instructions

  3. reducing the number of comparisons performed in loops

Question 2: Suppose a loop unrolled 8 times executes at the same speed as a loop unrolled 4 times on a processor. Which of the following changes to the processor is likely to make the loop unrolled 8 times execute faster than the 4 times unrolled loop instead?
Select all that apply

  1. increasing the performance of the data cache

  2. increasing the number of execution units by providing duplicate copies of execution units used by the loop

  3. (correct answer)

    increasing the throughput of execution units used by the loop by converting unpipelined execution units into pipelined execution units

  4. (correct answer)

    decreasing the latency of execution units used by the loop

Suppose an out-of-order processor has a single pipelined multiplier that takes 3 cycles to perform each multiply and an independent single pipelined adder that takes 2 cycles to perform each add. Assume that the result of a multiply or add can be used in as input ot multiply or add immediately after it is computed.

Question 3: (see above) What is the minimum number of cycles

    addq %rax, %rbx
    mulq %rax, %rcx
    mulq %rdx, %rdx
    addq %rcx, %rbx

takes to execute on this processor?

  1. 4 or less

  2. (correct answer)

    5

  3. 6

  4. 7

  5. 8

  6. 9 or more

addq %rax, %rbx, mulq %rax, %rcx start during cycle 1, finish during 2 and 3 respectively; mulq %rdx, %rdx starts during cycle 2, finishes during cycle 4; addq %rcx, %rbx starts during cycle 4 (needs result of earlier add+mul), finishes during cycle 5.

Question 4: (see above) What is the minimum number of cycles

    addq %rcx, %rax
    addq %rdx, %rbx
    mulq %rax, %rbx

takes to execute on this processor?

  1. 4 or less

  2. 5

  3. (correct answer)

    6

  4. 7

  5. 8

  6. 9 or more

first addq starts during cycle 1, finishes during 2; second starts in cycle 2, finishes in 3; mulq starts in cycle 4 finishes in 6.

Quiz 17

Question 1: A process running in user mode is prevented from directly
Select all that apply

  1. (correct answer)

    executing privileged exceptions, like those that manipulate the exception table

  2. triggering an exception

  3. having the same value for %rsp as any concurrently executing process

Question 2: The number of programs a system can run concurrently is determined primarily based on

  1. the number of processors on the system

  2. (correct answer)

    the amount of memory (or similar storage) in the system

  3. the number of entries in the exception table on the system

  4. the size of an address space on the system

Question 3: A page table entry contains
Select all that apply

  1. the contents of the corresponding physical page

  2. (correct answer)

    the physical address of the corresponding physical page

  3. the virtual address of the corresponding physical page

  4. (correct answer)

    a bit to indicate whether or not a physical page is allocated for the corresponding virtual page

Question 4: As described in our book, when virtual memory is used, _______ is/are a cache for ______.

  1. (correct answer)

    main memory / data on disk

  2. virtual pages / physical pages

  3. physical pages / virtual pages

  4. virtual pages / data on disk

Question 5: As described in our book, when virtual memory is used to implement a cache, the replacement policy of that cache is determined primarily by __________.

  1. (correct answer)

    the operating system

  2. the processor

  3. the currently running process

Quiz 18

Question 1: Suppose the operating system contains a function that creates a new process, with its own new address space, and context switches to it. This function does not work outside of kernel mode. If a program tries to run this function from user mode, what will most likely occur? (For this question, consider our textbook's terminilogy for exceptions.)

  1. an interrupt

  2. (correct answer)

    a fault

  3. an abort

  4. a trap

  5. an infinite loop

Question 2: When the operating system performs a context switch between two processes, which of the following values are likely to change?
Select all that apply

  1. (correct answer)

    the stack pointer

  2. (correct answer)

    the condition codes

  3. (correct answer)

    the value of %rax

  4. (correct answer)

    the page table base register

  5. the exception table base register

Question 3: Suppose we are running program A, which then makes a system call to read from disk. While waiting for the disk to return the read data, the operating system runs program B for 5 milliseconds, then runs program C until the disk completes the read. Finally, it returns to running program A.

How many exceptions must have occured during this entire process?

  1. 0

  2. 1

  3. 2

  4. (correct answer)

    3

  5. 4 or more

Question 4: A page table is a map of <vpn, ppn>, where vpn is a virtual page number and ppn is a physical page number. The virtual address can be split as <vpn, po>, where po is the page offset. Given this information, how would you form the physical address?

  1. <vpn, po>

  2. (correct answer)

    <ppn, po>

  3. <ppn, vpn>

  4. <ppn>

Quiz 19

Question 1: Which of the following statements are true about the translation lookaside buffer (TLB)?
Select all that apply

  1. (correct answer)

    a small hardware cache

  2. part of the virtual address space

  3. (correct answer)

    used for address translation

  4. (correct answer)

    stores page table entries

Question 2: What do we use to index the TLB?

  1. part of the physical page number

  2. part of the physical page offset

  3. (correct answer)

    part of the virtual page number

  4. part of the virtual page offset

Question 3: What is the content of the first-level page table in a multi-level page hierarchy?

  1. the translated addresses of the recently accessed virtual addresses

  2. (correct answer)

    the physical page numbers of the second-level page tables

  3. the virtual page numbers of the second-level page tables

  4. the page table base register of the first-level table

Question 4: Which of the following statements are true?

  1. multi-level page table uses less space than a single-level table when program uses 100% of the virtual address space

  2. (correct answer)

    multi-level page table uses less space than a single-level table when program uses only a small fraction of the virtual address space

  3. the space for multi-level page table does not depend on the program data use

  4. the space for multi-level page table depends on the size of the TLB

Quiz 20

Question 1: Which of the following statement is true?

  1. (correct answer)

    TLB hits make address translation faster

  2. TLB caches first-level page table entries

  3. TLB misses are caused by L1 cache hits

  4. TLB entries are L1 data blocks

Question 2: Let’s assume there is a two-level page table in your system. The virtual address is split as <vpn part1, vpn part2, page offset>. Which of the following statement is true?

  1. vpn part 1 is used to index the physical page that contains data

  2. vpn part 1 is used to get the base address of the first-level table

  3. page offset is used to index the TLB

  4. (correct answer)

    vpn part 2 is used to index the PTE in the second-level page table

Core i7 memory divies virtual page numbers into four parts because it's MMU has a four-level page table hierarchy. Consider translating a single virtual address into a physical address.

Question 3: (see above) How many page table accesses will occur during that translation if there a TLB hit?

  1. (correct answer)

    0

  2. 2

  3. 4

  4. 5

Question 4: (see above) What is the maximum number of page table accesses that can occur during that translation if there a TLB miss?

  1. 0

  2. 2

  3. (correct answer)

    4

  4. 5

Question 5: You find that one particular configuration of TLB uses no TLB index bits during a lookup; Which of the following statement is true?

  1. TLB is 2-way set-associative

  2. TLB is 4-way set-associative

  3. TLB is direct-mapped

  4. (correct answer)

    TLB is fully associative