CS3130 Spring 2023 quizzes



quiz for week 2

Answer the following question about the lecture material for this week.

Question 1 (5 points)

Suppose we have a program which is compiled from C code, but some of the C code is very repetitive. Rather than having the program's source code include a copy of the repetitive C code with the program, the program's authors want to include a Python script that generated that repetitive code, some hand-written C files, and a Makefile to build them all.

Suppose the Python script is called generate_lookup.py and when run it outputs the generated C code to its output. The Makefile writes the output of the Python script to lookup.c. The hand-written C files are called lookup.h and main.c.

Which of the following rules would be reasonable to include in the corresponding Makefile? ((tab) represents a tab character.) Select all that apply.

Suppose a single-core system is running three single-threaded processes A, B, and C.

The following events happen in the following order:

  1. Process A asks to read from the keyboard. No input is available initially.
  2. Process B does some computation.
  3. Process B sends a signal to process C.
  4. A key is pressed on the keyboard, causing process A to run.
  5. Process A accesses an invalid memory location and is terminated as a result.
  6. Process C's signal handler runs and prints a message to the screen.
Question 2 (4 points) (see above)

Which of the above steps are likely to involve making one or more system calls?

Write the numbers of the corresponding steps above; for example, if 1 and 5 involve system calls, write "15".

Answer:
Question 3 (4 points) (see above)

A non-system-call exception must occur during which of the steps above?

Write the numbers of the corresponding steps above; for example, if 1 and 5 must involve non-system-call exception, write "15".

Answer:

Consider the following C code snippet running on a Linux system:

char buffer[1024];
int x, y;
fgets(buffer, sizeof buffer, stdin);
x = atoi(buffer);
fgets(buffer, sizeof buffer, stdin);
y = atoi(buffer);
printf("%d / %d = %d\n", x, y, x / y);
Question 4 (3 points) (see above)

Suppose x / y results in a division by 0. This will cause an exception to occur. While the handler for that exception is running, most likely ____. Select all that apply.

Question 5 (5 points) (see above)

Suppose the program above is run so that it takes input from the keyboard and shows output on a screen. When no division by zero occurs, which of the following operations occur in kernel mode when the code above runs? Select all that apply.



quiz for week 3

Answer the following question about the lecture material for this week.

Consider the following C code snippet running on a Linux system:

char buffer[1024];
int x, y;
fgets(buffer, sizeof buffer, stdin);
x = atoi(buffer);
fgets(buffer, sizeof buffer, stdin);
y = atoi(buffer);
printf("%d / %d = %d\n", x, y, x / y);
Question 1 (4 points) (see above)

On x86-64 Linux, dividing by 0 results in a process being sent the SIGFPE (Floating point exception — yes, even for integer division) exception. Suppose a signal handler for that signal was registered before the code above ran and divided by 0. Which of the following would be true about how the signal handler is run? Select all that apply.

Question 2 (3 points)

Consider the following a signal handler for SIGINT (triggered by Ctrl+C) that requires Ctrl+C to be pressed twice before actually exiting:

int sigint_counter = 0;
void handle_sigint(int signum) {
    if (sigint_counter == 0) {
        const char *message = "Press Ctrl+C again to exit!\n";
        write(1, message, strlen(message));
        sigint_counter = 1;
    } else {
        _exit(1);
    }
}

(The _exit function is like exit except that it does not run functions registered with atexit() or flush pending output to stdout and other stdio streams.) The signal handler is installed using the following code:

struct sigaction sa;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;
sa.sa_handler = &handle_sigint;
sigaction(SIGINT, &sa, NULL);

Which of the following is true about the above signal handler when running on x86-64 Linux? Select all that apply.

Question 3 (4 points)

POSIX access control lists like described in lecture would be suitable for enforcing which of the following policies on a system shared between students and an instructor? Select all that apply.

Question 4 (3 points)

In lecture, we mentioned the much more restricted permissions systems that Unix supports more commonly than ACLs. In this system, files have an owner user ID, a group ID, and 9 permission bits (read/write/execute for that user, that group, and others). This permission bit system is much less capable than ACLs. Which of the following policies can be achieved using these permission bits? (Ignore that the special administrator user root/superuser would have more access in all of these cases.) Select all that apply.

Consider the following page table:

virtual page #valid?physical page #
00---
110x2
210x3
310x9
40---
510x5
610x1
710x6

Suppose this page table is used on a system with 4096 byte pages. On this system, virtual addresses are divided into a 3-bit page number and a 12-bit page offset and physical addresses are divided into a 4-bit page number and a 12-bit page offset.

Question 5 (4 points) (see above)

If a program reads from virtual address 0x1234, what physical address will it read from?

Write your answer as a hexadecimal number. If an exception would occur instead, write "exception".

Answer:


quiz for week 4

Answer the following questions about the lecture material for this week.

Consider a system with a virtual memory system with:

Question 1 (3 points) (see above)

A program tries to read a value from virtual address 0x12. The page table base register is set to physical byte address 0x100, so the processor looks up a page table entry from physical address ___. Write your answer as a hexadecimal number. Remember to take into account that page table entries are two bytes.

Answer:
Question 2 (3 points) (see above)

Suppose the value of the page table entry (from the previous question) is 0x4307 (when interpreted as an 2-byte integer). The program will end up accessing physical address _____. Write your answer as a hexadecimal number.

Answer:

Consider a system with a virtual memory system with:

Question 3 (4 points) (see above)

If a program can access 1000 distinct pages of memory (without exceptions occuring), then its page tables must take up at least ____ bytes. Remember to include both its first-level page tables and its second-level page tables.

Answer:
Question 4 (4 points) (see above)

When a program tries to access a value from memory, the processor:

  • reads its page table base register and gets the physical address 0x120000, then
  • reads a page table entry from physical address 0x120008 and that page table entry is valid and contains physical page number 0x123, then
  • reads a page table entry from physical address 0x123040 and that page table entry is valid and contains physical page number 0x6, then
  • reads the value from physical address 0x6010

What was the virtual address the program tried to access? Remember to take into account the page table entry size. Write your answer as a hexadecimal number. If not enough information is supplied, write "unknown" and explain in comments.

Answer:

Suppose a system uses 4096-byte pages, so addresses have 12-bit page offsets.

Assume the system running this program is as lazy as possible in setting up page table entries -- it waits until a program accesses a virtual page to setup the page table entry for that page, following the allocate/load-on-demand strategy we discussed in lecture. (Perhaps a better strategy would be somewhere in between this lazy strategy and the strategy of loading everything in advance.)

The program starts, with all its pages initially unallocated (marked invalid in its page table). It then runs a function, whose machine code is loaded at addresses 0x2040-0x2072, which writes 3 8-byte values to the stack at addresses 0xFFF8, 0xFFF0, and 0xFFE8.

Question 5 (4 points) (see above)

As a result of the above operations, how many page faults will occur? (If not enough information is given write unknown and explain in the comments.)

Answer:


quiz for week 5

Answer the following questions about the lecture material for this week.

Suppose a system has:

Question 1 (4 points) (see above)

Suppose when looking up the virtual address 0x8001 on this system, the first-level page table entry the processor looks up has the value 0x1231. What is the (physical) address of the first byte of the second-level page table this processor will use?

Answer:

Consider a network which provides the best-effort "mailbox" model gaurentees we discussed in lecture.

Suppose we try to send data from machine A to machine B that doesn't fit in a single message in parts using the following broken protocol:

Question 2 (4 points) (see above)

Suppose we send the data "ABCD" by splitting it into the four messages, "A", "B", "C", and "D". If the network does not corrupt, delay, or reorder messages, but does lose some messages, which of the following is it possible for machine B to think it was sent when it tries to reconstruct the data? Select all that apply.

Question 3 (4 points) (see above)

If the network loses every fourth message a machine sends, then how many messages will machine end up sending?

Question 4 (4 points) (see above)

In the layers we discuss, a message sent on the link layer will have a destination address for the link layer in its frame. In Ethernet, this destination address is a MAC address. This destination address will identify where the message should go for the purpose of the link layer's functionality.

Suppose a program sends a frame containing message that is going to be sent to a faraway machine. In order to get to that machine, the data will first be received by a router on the local network, then that router will send it to another router, which will send it to another router, which will send it a final router. That final router will then send the data to the final machine.

When the message is initially sent (before it is forwarded by the first router), what best describes what the link-layer destination address that accompanies it will contain?

Question 5 (4 points) (see above)

Consider the scenario of the previous question.

In addition to the destination address for the link layer, the message will also contain a destination address for the network layer. The network layer also has an destination address in its packets. For example, in IP version 4, this would be an IP version 4 address (like 128.148.63.2).

When the message is initially sent (before it is forwarded by the first router), what best describes what the network-layer destination address that accompanies it will contain?



quiz for week 6

Answer the following questions about the lecture material for this week.

Question 1 (4 points)

Consider the following code where socket_fd is the file descriptor for a UDP socket (that has already been configured, including having its default destination address set with connect):

write(socket_fd, "FIRST", strlen("FIRST"));
write(socket_fd, "SECOND", strlen("SECOND"));
write(socket_fd, "THIRD", strlen("THIRD"));

Assuming the writes above do not fail, when code running on the other end runs:

char buffer1[100] = "initial value";
char buffer2[100] = "initial value";
read(other_socket_fd, buffer1, sizeof buffer1);
read(other_socket_fd, buffer2, sizeof buffer2);

Assuming messsage data is not corrupted on the network (despite the lack of end-to-end checksums), then, which of the following are possible outcomes in buffer1 and buffer2? Select all that apply.

Suppose www.foo.com's DNS server when queried returns a record containing the IP address for www.foo.com specifying a time-to-live of 10000 seconds.

Question 2 (4 points) (see above)

We mentioned in lecture that an ISP's DNS server could obtain this record by first querying one of the "root" DNS servers. This root server would most likely ____.

Question 3 (4 points) (see above)

www.foo.com is accessed many times over the course of 5 hours by each of 1000 users. These 1000 users a split evenly across 10 different ISPs.

Each of these ISPs runs their own DNS server for their users that caches the returned records for as long as possible. About how many queries for the IP address of www.foo.com will www.foo.com's DNS server get?

Answer:
Question 4 (4 points)

Suppose A and B each prepare to perform secure communications with each other as follows:

After this preparation is complete, we would expect A to have ____. Select all that apply.

Question 5 (4 points)

Consider the following (broken) scheme for allowing users to run a command on a remote server:

The system administrator sets up MAC keys for every user of the system. These keys are distributed securely to those users when their account is setup and are used in addition to passwords.

When a user asks to login to the server remotely:

Which of the following are scenarios this protocol prevents? Select all that apply.



quiz for week 7

Answer the following questions about the lecture material for this week.

Question 1 (4 points)

In lecture, we discussed certificates used to verify website identities for TLS.

As typically used in TLS, the web server will have a certificate, which ___. Select all that apply.

In lecture, we mentioned key agreement protocols where two parties A and B would send each other key shares derived from secret values, then use the other's key share and their secret value to construct a shared secret value (known by A and B, but not anyone else).

Question 2 (4 points) (see above)

Suppose A and B use key agreement as follows:

  • first, they send each other key shares over an insecure network,
  • then, use their shared secret value to construct a key for symmetric encryption and a key for message authentication codes.

If other steps are not taken, this design would allow an attacker with sufficient access to the network to ___. Select all that apply.

Consider two strategies for eliminating duplicates from an array of values:

  1. constructing a hashtable from the list of values, discarding any values found that were already in the hashtable
  2. sorting the array of values, then removing consecutive items that have the same value
Question 3 (2 points) (see above)

Which of the two approaches would one expect to exhibit better spatial locality?

Question 4 (4 points) (see above)

In approach 1, at least when there are many duplicates to eliminate, the steps where the values are looked up in the hashtable would likely have significant ____. Select all that apply.

Question 5 (3 points) (see above)

In approach 1, a hash function that better distributes values in the hashtable and better avoids hash collisions would likely ____. Select all that apply.

Suppose a 8-set direct mapped cache with 4B blocks has the following contents:

set indextag (written in binary)validdata bytes (in hexadecimal; lowest address first)
00
1001101102 FF 41 02
2001101101 11 23 45
3011000133 55 77 9A
4000000100 01 02 03
50000011F0 80 90 01
60
70
Question 6 (4 points) (see above)

With the above cache accessing a two-byte integer from an address with tag 001101 (written in binary), set index 4, and offset 2 will result in a cache miss. As a result of the cache miss, what bytes currently stored in the cache will be replaced?



quiz for week 9

Answer the following quesitons about the lecture material for this week.

Consider the following C code:

unsigned char array[12];
...
a = array[0]; // (1)
b = array[0]; // (2)
sum += a * b;
a = array[4]; // (3)
b = array[1]; // (4)
sum += a * b;
a = array[8]; // (5)
b = array[2]; // (6)

For the following questions assume that:

  1. array starts at an address that is a multiple of 4096 (2 to the 12th power)
  2. only accesses to array use the data cache
  3. the compiler does not reorder or optimize the code to reduce the number of memory accesses for array
  4. the data cache is initially empty (all blocks invalid) when the code labeled (1) starts.
Question 1 (4 points) (see above)

With a direct-mapped 4-byte data cache with 2-byte blocks, the accesses labeled (1), (3), (4), (5) and (6) above will all be cache misses. The access labeled (1) is a compulsory miss also known as a cold miss. Which other misses are also compulsory or cold misses? Select all that apply.

Question 2 (4 points) (see above)

Consider 2-way (fully associative) 8-byte data cache with 4-byte blocks and an least-recently-used replacement policy. The array access labeled (1) above will be cache miss and the access labeled (2) will be a cache hit. Which other accesses will be cache hits? Select all that apply.

Question 3 (4 points) (see above)

Consider 2-way (fully associative) 8-byte cache with 4-byte blocks and first-in, first-out replacement policy The array access labeled (1) above will be cache miss and the access labeled (2) will be a cache hit. Which other accesses will be cache hits? Select all that apply.

(In a first-in, first-out replacement policy, the block replaced longest ago ("first-in") in a set is the next one replaced.)

Consider the following C snippet:

unsigned char array1[4096];
unsigned char array2[4096];
...
for (int i = 0; i < 2048; ++i) {
    array1[i] = array2[i + 2048] * array2[i];
}

For the following questions, assume:

  1. the snippet uses a 32768-byte 4-way set-associative cache with 64-byte blocks and an LRU replacement policy;
  2. the data cache is initially empty when the for loop starts;
  3. array1 and array2 are both located at addresses that are multiples of 32768.
Question 4 (4 points) (see above)

If the data cache has a write-through policy, during the execution of for loop, how many times will the cache read from or write to the next level of cache (or memory if there is no next level of cache) in order to read or write values in array1?

Answer:
Question 5 (4 points) (see above)

If the data cache has a write-back policy, during the execution of the for loop, how many times will the cache read from or write to the next level of cache (or memory if there is no next level of cache) in order to store values in array1?

Answer:


quiz for week 10

Consider a system with:

Question 1 (4 points) (see above)

Suppose this system runs an instruction like:

movq 0x12345, %rax

which loads 8 bytes from 0x12345 into %rax. Suppose while this instruction runs, all related TLB and cache accesses are hits. As part of running this instruction the system will ____. Select all that apply.

Question 2 (4 points) (see above)

Accessing the virtual address 0x12345 will use the same TLB set as accessing which of the following virtual addresses? Select all that apply.

Question 3 (4 points) (see above)

The amount of storage required to implement the data TLB is ____.

Question 4 (4 points)

Consider the following C code that uses the POSIX API:

printf("1"); fflush(stdout);
pid_t pid1 = fork();
pid_t pid2 = fork();
if (pid1 > 0 && pid2 > 0) {
    printf("2"); fflush(stdout);
    waitpid(pid1, NULL, 0);
    waitpid(pid2, NULL, 0);
    printf("3"); fflush(stdout);
}
printf("4"); fflush(stdout);
exit(0);

Which of the following are possible outputs of the above? Select all that apply.

Question 5 (4 points)

Consider the following C code that uses the POSIX API:

int pipe_fds[2];
pipe(pipe_fds);
int out_fd = open("foo.txt", O_WRONLY | O_CREAT);
dup2(pipe_fds[1], out_fd);
dup2(pipe_fds[0], STDIN_FILENO);

Which (if any) of the following are true after this code snippet runs, assuming pipe, open, and dup2 do not fail? Select all that apply.



quiz for week 11

Answer the following questions about the lecture material for this week.

Consider the following C code:

void *thread_func(void *arg) {
    int *p;
    p = (int*) arg;
    *p = 1;         /* <----------------------------------- LINE A */
    return (void*) (p + 2);
}

void foo() {
    int array[4] = {0, 0, 0, 0};
    pthread_t thread_ids[2];
    for (int i = 0; i < 2; i += 1) {
        int *q = &array[i];
        pthread_create(&thread_ids[i], NULL, thread_func, (void*) q);
    }
    for (int i = 0; i < 2; i += 1) {
        int *r = NULL;
        pthread_join(&thread_ids[i], (void**) &r);
        *r = 2;     /* <----------------------------------- LINE B */
    }
    printf("%d %d %d %d\n", array[0], array[1], array[2], array[3]);
}
Question 1 (3 points) (see above)

When the function foo() runs, the line labeled LINE A will most likely modify a value stored _____.

Question 2 (3 points) (see above)

When the function foo() runs, the line labeled LINE B will most likely modify values stored _____.

Question 3 (6 points)

Consider the following C code:

struct info {
    int *array;
    int start;
    int end;
};

void *thread_func(void *arg) {       
    struct info *info;
    info = (struct info*) arg;
    for (int i = info->start; i < info->end; i += 1) {
        info->array[i] *= 2;
    }
    return NULL;
}

void double_all(int *array, int size) {
    pthread_t threads[2];
    for (int i = 0; i < 2; i += 1) {
        struct info cur_info;
        cur_info.array = array;
        cur_info.start = i * (size / 2):
        cur_info.end = (i + 1) * (size / 2);
        pthread_create(&threads[i], NULL, thread_func, &cur_info);
    }
    for (int i = 0; i < 2; i += 1) {
        pthread_join(&threads[i], NULL);
    }
}

double_all is intended to multiple every element of its input array by 2, where the array is specified using a pointer to its first element and its size. The implementation tries to use two threads to take advantage of multiple cores to do this, computing a range of the array (specified by start and end) for each thread to work on. But the above code is buggy. Which of the following are bugs in the code? Select all that apply.

Consider the following C code:

struct dynamic_array {
    int *data;
    size_t size;
    pthread_mutex_t lock;
};

void append_to_array(struct dynamic_array *dest, struct dynamic_array *to_add) {
    pthread_mutex_lock(&dest->lock);
    pthread_mutex_lock(&to_add->lock);
    int *new_data;
    size_t new_size = (dest->size + to_add->size) * sizeof(int);

    /* BEFORE REALLOC */
    new_data = realloc(dest->data, new_size);
        /* was incorrectly new_data = realloc(dest, new_size)  in quiz as released */

    if (!new_data) { abort(); /* out of memory */ }
    dest->data = new_data;
    /* AFTER REALLOC */

    memcpy(dest->data + dest->size, to_add->data, to_add->size * sizeof(int));
        /* was incorrectly dest->new_data + dest->size  in quiz as released */
    dest->size += to_add->size;
    pthread_mutex_unlock(&to_add->lock);
    pthread_mutex_unlock(&dest->lock);
}
Question 4 (4 points) (see above)

For this question, assume that all code follows the convention that dynamic_array structs are only read or modified while holding the lock in the struct. (That is, only after a locking the lock and without unlocking the lock until after the read/modification is done.)

In the append_to_array function, the lock for to_add is kept locked even while the realloc operation is run, even though that operation does not involve to_add. An idea one might have to take advantage of this is to temporarily release the lock on to_add by inserting a pthread_mutex_unlock(&to_add->lock) in place of the comment /* BEFORE REALLOC */ and inserting pthread_mutex_lock(&to_add->lock) in place of the coment /* AFTER REALLOC */. It might seem that this would allow more threads to work in parallel.

However, this change would allow for some incorrect behavior. Which of the following problems could we expect as a result of the attempt to temporarily release the to_add lock described above? Select all that apply.



quiz for week 12

Answer the following questions about the lecture material for this week.

Consider the following C code:

struct dynamic_array {
    int *data;
    size_t size;
    pthread_mutex_t lock;
};

void append_to_array(struct dynamic_array *dest, struct dynamic_array *to_add) {
    pthread_mutex_lock(&dest->lock);
    pthread_mutex_lock(&to_add->lock);
    int *new_data;
    size_t new_size = (dest->size + to_add->size) * sizeof(int);
    new_data = realloc(dest->data, new_size);
    if (!new_data) { abort(); /* out of memory */ }
    dest->data = new_data;
    memcpy(dest->new_data + dest->size, to_add->data, to_add->size * sizeof(int));
    dest->size += to_add->size;
    pthread_mutex_unlock(&to_add->lock);
    pthread_mutex_unlock(&dest->lock);
}
Question 1 (4 points) (see above)

It is possible for two simulatenous calls to append_to_array to hang indefinitely. Which, if any, of the following are examples of sets of simulatenous calls where this could happen? (Assume A, B, C, D, etc. represent distinct dynamic_array struct pointers.) Select all that apply.

Question 2 (4 points) (see above)

Which of the following changes to the above code would prevent the deadlock? Select all that apply.

Suppose we want to implement a variant of the producer-consumer pattern where producing threads can produce two types of values (type A and type B), and consumer threads can either wait for a value of a particular type A OR they can wait for a value of both type A and type B. When the threads wait, they should wait without consuming a lot of processor time.

To implement this, we'll use a Queue implementation that provides queue_enqueue, queue_dequeue, and queue_size operations.

The following quesitons ask about the following incompete implementation. Blanks (formatted like _____________) represent omitted code:

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

Queue a_values;
pthread_cond_t a_available = PTHREAD_COND_INITIALIZER;
Queue b_values;
pthread_cond_t both_available = PTHREAD_COND_INITIALIZER;

void EnqueueA(Item item) {
    pthread_mutex_lock(&lock);
    queue_enqueue(&a_values, item);
    _________________________________ /* BLANK 1 */
    pthread_mutex_unlock(&lock);
}

void EnqueueB(Item item) {
    pthread_mutex_lock(&lock);
    queue_enqueue(&b_values, item);
    pthread_cond_signal(&both_available);
    pthread_mutex_unlock(&lock);
}

Item DequeueA() {
    pthread_mutex_lock(&lock);
    while (queue_size(a_values) == 0) {
        pthread_cond_wait(&a_available, &lock);
    }
    Item result = queue_dequeue(&a_values);      /* was b_values when quiz originally released */
    pthread_mutex_unlock(&lock);
}

void DequeueBoth(Item *a_item_ptr, Item *b_item_ptr) {
    pthread_mutex_lock(&lock);

    while (_______________________________) { /* BLANK 2 */

        pthread_cond_wait(_________________ /* BLANK 3 */, &lock);
    }
    *a_item_ptr = queue_dequeue(&a_values);
    *b_item_ptr = queue_dequeue(&b_values);
    pthread_mutex_unlock(&lock);
}
Question 3 (4 points) (see above)

Which of the following would be most appropriate to place in the blank labeled "BLANK 1"?

Question 4 (4 points) (see above)

Which of the following would be most appropriate to place in the blank labeled "BLANK 2"?

Suppose we have a system where three threads are running in parallel and we want a function to send a value from thread 0 to thread 1, and another value from thread 1 to thread 2, and another value from thread 2 to thread 0. To do this, each thread calls a function like:

other_threads_value = SendAndReceiveValue(thread_id, my_value);

where my_value is the current thread's value and thread_id is 0, 1, or 2, depending on which thread is making the call.

Functions can call SendAndReceiveValue multiple times; each time they call it they should receive values from the following call to SendAndReceiveValue.

Consider the following implementation of this function using semaphores:

int values[3];
sem_t value_ready[3];
sem_t value_empty[3];

int SendAndReceiveValue(int thread_id, int value) {
    int other_thread_id = (thread_id + 1) % 3;
    sem_wait(&value_empty[other_thread_id]);
    values[other_thread_id] = value;
    sem_post(&value_ready[other_thread_id]);
    sem_wait(&value_ready[thread_id]);
    int received_value = values[thread_id];
    sem_post(&value_empty[thread_id]);
    return received_value;
}
Question 5 (2 points) (see above)

What should the initial value of the semaphore value_ready[0] be? (By "value", we mean the non-negative integer stored by the semaphore.)

Answer:
Question 6 (2 points) (see above)

What should the initial value of the semaphore value_empty[0] be? (By "value", we mean the non-negative integer stored by the semaphore.)

Answer:


quiz for week 13

Answer the following questions about the lecture for this week.

Question 1 (4 points)

Suppose we build a pipelined processor with the five-stage design we discussed in lecture and, when run alone:

If the above processor is running an assembly program that includes the instruction addq %r8, %r9 (and there are no relevant data hazards or control hazards), then this instruction would start its writeback stage approximtely ___ ps after it starts its fetch stage.

Answer:

For this question, consider a processor uses pipelining with 6 pipeline stages and a cycle time of 600 picoseconds (.000 000 6 milliseconds).

Question 2 (4 points) (see above)

Suppose we execute a program with 100000000 (100 million) instructions on this processor and the program experiences no data or control hazards. How long in microseconds, rounded to the nearest millisecond, would the program take to execute on the pipelined processor?

Answer:
Question 3 (4 points) (see above)

Suppose we modified this processor to have 5 pipeline stages by combining its third and fourth pipeline stages. Which of the following would likely be true about the modified processor? Select all that apply.

Question 4 (4 points)

Suppose when running a benchmark program on a pipelined processor we discover that:

The pipelined processor has a 1000 ps cycle time.

On average, when running the benchmark program, the processor will complete an instruction approximately every ____ ps.

Answer:

Consider a nine-stage pipelined processor with the following pipeline stages:

This processor uses forwarding combined with (when necessary) stalling to resolve data hazards.

Question 5 (4 points) (see above)

When running the assembly:

addq %r8, %r9
xorq %r8, %r11
subq %r9, %r10

on this nine-stage processor, the subq instruction will complete ____ cycles after the addq instruction completes.

(Assume that any instructions the processor is running not included in the snippet above:

  • do not use %r8, %r9, %r10, or %r11; and
  • do not require stalling

)

Answer:


quiz for week 14

Answer the following questions about the lecture material for this week.

For the following question, consider a six-stage pipelined processor with the following pipeline stages:

For the two execute stages:

Suppose the processor is running the following assembly snippet:

    movq (%r8), %r9  
    subq $128, %r9   
    jle foo          
    addq %r9, %r10   

and the jle is mispredicted.

Question 1 (4 points) (see above)

During the execution of the above assembly snippet, the addq instruction will finish ___ cycles after the movq instruction.

Answer:

Consider the following assembly snippet:

    movq $4, %rax       // rax <- 4
start_loop:
    movq %r8, %r9       // r9 <- r8
    andq $3, %r9        // r9 <- r9 & 3
    cmpq $0, %r9        // set condition codes using r9 - 0
    jne skip            // if (r9 != 0) goto skip
    addq $1, %r8        // r8 <- r8 + 1     ***
skip:
    subq $1, %rax       // rax <- rax - 1
    cmp $0, %rax        // set condition codes using rax - 0
    jne start_loop      // if (rax != 0) goto start_loop
end_loop:

For the questions below, assume %r8 has an initial value of 4 every time the snippet above is run, in this case:

Question 2 (4 points) (see above)

If the processor running this snippet uses the forwards-not-taken, backwards-taken branch prediction strategy discussed in lecture, then the conditional jumps (jne skip and jne start_loop) will be predicted correctly ___% of the time. Round to the nearest percent.

Answer:
Question 3 (4 points) (see above)

Suppose the processor running this snippet predicts each conditonal jump as taken if it was taken the last time it was run, and predicts it as not taken if it was not taken the previous time it was run.

Assume the processor just ran the snippet and then runs the snippet a second time (that is, there is some loop surrounding the snippet). During the second execution, the conditional jumps in the snippet will be predicted correctly ____% of the time. Round to the nearest percent.

Answer:
Question 4 (4 points)

Consider the following assembly snippet:

mov (%rax), %r9
add %r9, %r10
mov (%rbx), %r9
add %r9, %r11
mov (%rcx), %r9
add %r9, %r12
xor %r10, %r11

Consider an out-of-order processor design like we discussed in lecture. Assume that can perform multiple arithmetic operations at a time and can perform multiple loads at a time. (Its data cache is capable of doing multiple reads at a time, even if they are started in the same cycle.)

On that processor, which of the following is true about how this code could be executed? Select all that apply.

Question 5 (4 points)

Consider an out-of-order processor design similar to what we discussed in lecture, with execution units supporting running several arithmetic instructions at a time.

Suppose after register renaming, the end of the instruction queue for the processor contains:

add  %x10, %x19 -> %x20
sub  %x20, %x21 -> %x22
xor  %x18, %x22 -> %x24
imul %x18, %x18 -> %x25

where %x... represents a physical register name. Assume that there are also other instructions in the instruction queue, not shown above.

What of the following are true about these instructions and how they could be run on this processor? Select all that apply.



quiz for week 15

Answer the following questions about the lecture for this week.

Question 1 (5 points)

Consider the following assembly snippet:

addq %rcx, %rax
subq %rcx, %r9
xorq %rax, %r8
movq (%r8), %r10
imulq %r10, %r9
imulq %rax, %rax

On an out-of-order processor, the xorq instruction could do its computation at the same time as ____ is doing is computation. Select all that apply.

Question 2 (4 points)

Consider the following assembly snippet:

    cmpq %rax, %rbx
    jle other
    addq %r12, %rdx
    movq (%rdx), %r8
    subq %r8, %r9
    jmp after
other:
    addq %r13, %rcx
    movq (%rcx), %r8
    addq %r8, %r9
after:

Suppose this executes on an out-of-order processor, and the jle other is predicted as taken, but is actually not taken.

Then, ____. Select all that apply.

Question 3 (4 points)

Consider the following C code:

int IsPowerOfTwo(unsigned long x) {
    while (x > 1) {
        if ((x & 1) == 1) {
            return 0;
        }
        x = x >> 1;
    }
    return 1;
}

Assume this is translated to assembly in a manner that does omit any bitwise or arithmetic operations.

Suppose IsPowerOfTwo(a) takes substantially longer to execute than IsPowerOfTwo(b) for some non-zero a and non-zero b. (Note that IsPowerOfTwo's return value is not specified.)

Based on the time difference between IsPowerOfTwo(a) and IsPowerOfTwo(b), what is most likely true about a and b? Select all that apply.

Question 4 (4 points)

Consider the following C snippet:

t = lookupTable[x];

where lookupTable is an array of 16384 chars located in memory at address 0x20CD00.

Suppose using the PRIME+PROBE strategy discused in lecture, we determine that the access to lookupTable modifies set 23 of a 65536-byte 4-way set-associative cache with 256-byte blocks.

What is a possible value of x?

To simplify this problem, assume all addresses are physical.

Answer:

Consider the following C code:

if (x < 1024) {
    char y = array1[x];
    if (y < 64) {
        foo(array2[y*2048]);
    }
}

where array1 and array2 are arrays of chars.

If an attacker can control x, they can use the Spectre vulnerability to read from memory using this code.

Suppose array1 is located at address 0x1000000, array2 is located at address 0x2000000. To simplfiy the problem, assume all addresses are physical.

Question 5 (4 points) (see above)

The attacker will need to set x to a value related to the memory location that they want to read from. If they want to get information about the int at memory address 0x4000000, they should set x to ____. Write your answer as a hexadecimal.

Answer: