Return to course page

Unless we explicitly mention the compiler, assume the compiler performs few optimizations (it uses registers to store local variables, but not much else).

Question 1: The textbook discusses how compilers cannot optimize this code

void twiddle1(long *xp, long *yp) { 
    *xp += *yp;
    *xp += *yp;
}

into this code

void twiddle2(long *xp, long *yp) { 
    *xp += 2 * (*yp);
}

Which of the following changes to would allow the optimization?
Select all that apply

  1. making yp a long ** instead of a long *

  2. making yp a long instead of a long *

  3. making xp a long instead of a long * and returning a long instead of a void

  4. making xp a long ** instead of a long *

  5. using int instead of long throughout

  6. making both xp and yp longs instead of long *s and returning a long instead of a void

  7. making both xp and yp long **s instead of long *s

Question 2: If the runtime of a particular program given input size n is f(n), then the "cycles per element" number the textbook refers to is

  1. the slope of f

  2. the asymptotic class of f

  3. the additive constant of f

  4. a combination of the above

  5. none of the above

Question 3: Consider the code

int rfind(const char *s, char c) {
    int ans = -1;
    for(int i = 0; i < strlen(s); i += 1) {
        if (s[i] == c) ans = i;
    }
    return ans;
}

If we pull strlen(s) out of the loop, like so

    int N = strlen(s);
    for(int i = 0; i < N; i += 1) {

how will the runtime of the code change?

  1. it will be basically the same it was before

  2. it will be faster by an additive constant (e.g., 1 microsecond faster or the like)

  3. it will be faster by a multiplicative factor (e.g., 2× faster or the like)

  4. it will be slower by an additive constant (e.g., 1 microsecond slower or the like)

  5. it will be slower by a multiplicative factor (e.g., 2× slower or the like)

  6. it will change to a faster asymptotic complexity class (changing big-O)

  7. it will change to a slower asymptotic complexity class (changing big-O)

Question 4: Consider the code

struct foo {
    int x,y,z;
}
int sum(foo x) {
    return x.x + x.y + x.z;
}
int megasum(const foo *bits, const int N) {
    int ans = 0;
    for(int i = 0; i < N; i += 1) {
        ans += sum(bits[i]);
    }
    return ans;
}

If we remove calls to sum, like so

        ans += bits[i].x + bits[i].y + bits[i].z;

how will the runtime of the code change?

  1. it will be basically the same it was before

  2. it will be faster by an additive constant (e.g., 1 microsecond faster or the like)

  3. it will be faster by a multiplicative factor (e.g., 2× faster or the like)

  4. it will be slower by an additive constant (e.g., 1 microsecond slower or the like)

  5. it will be slower by a multiplicative factor (e.g., 2× slower or the like)

  6. it will change to a faster asymptotic complexity class (changing big-O)

  7. it will change to a slower asymptotic complexity class (changing big-O)

Question 5: Which of the following are sufficient indications to know that your code has unneeded memory references?
Select all that apply

  1. it reads every value of a large array, then reads them all again

  2. it writes every value of a large array, then writes them all again

  3. it reads the same address in every pass of a loop

  4. it writes the same address in every pass of a loop

Question 6: Consider loop 1:

for(int i = 0; i < N; i += 1) {
    baz(a[i]);
}

and loop 2:

for(int i = 0; i < N; i += 2) {
    baz(a[i]);
    baz(a[i+1]);
}

Which of the following are true?
Select all that apply

  1. loop 1 computes fewer comparisons than loop 2

  2. loop 1 computes more comparisons than loop 2

  3. loop 1 executes fewer conditional jumps than loop 2

  4. loop 1 executes more conditional jumps than loop 2

Question 7: Figure 5.12 suggests integer multiplication has latency 3, issue 1. Assuming x, y, z, and w are integers, how many more cycles will it take to run (x + y) + z + w than to run x + y + (z + w)? Answer as a normal (possibly negative) base-10 integer.

Answer:

Question 8: In general if I unroll a loop more I expect it to perform faster, but this is not always true in practice. Suppose I find that unrolling 50× is not faster than unrolling 15×. Which of the following hardware changes would be most likely to cause the more-unrolled version be faster?

  1. a bigger register file

  2. a bigger L1 cache

  3. a faster L1 cache

  4. using power-of-two size unrolling (e.g., 64 and 16 instead of 50 and 15)

  5. better branch prediction

Return to main page
Return to course page