Changelog:

Your task

  1. For all of the tasks below, answer the corresponding question on the answer sheet. Remember that for the lab (but not the corresponding homework) you may collaborate with other students; if you do please, note this in the comments on the answer sheet.

Linking

  1. (answer sheet part 1) Using the supplied object file format description

    • translate these Y86 files sum-main.ys and sum.ys into “object files”, and copy-and-paste it to the answer sheet. (we originally linked a different, less useful version of sum.ys; you can submit an object file based on either version.)

    You may (and we would strongly recommend) use the yas (Y86 assembler tool) distributed with HCLRS to help you. See the instructions below.

ISA tradeoffs

  1. (answer sheet part 2) Consider a fixed-instruction-length ISA. Complete the table below describing options for instruction layout and register count:

    instruction size (bits) opcode size (bits) maximum register count (2 operands) maximum register count (3 operands)
    8 3 4 2
    16 4 64 16
    16 5 ~ ~
    16 8 ~ ~
    32 10 ~ ~

    (You may assume we only use power-of-two register counts.)

    Record your answers on the answer sheet.

  2. We have supplied some statistics from instructions in some benchmark programs described below, which will be used for couple of the below questions and some questions on the corresponding homework.

  3. (answer sheet part 3) For three of the benchmarks, we provide statistics about them compiled with 8 registers available and 16 registers available. Answer the following questions about these statistics:

    • (subpart a) Some of the programs have fewer total instructions executed when fewer registers are available. This is related to x86-64 allowing most instructions to compute on values in memory (unlike Y86-64, where they must be moved into a register first). Give an example of how a compiler would generate code with fewer instructions when computing on a value primarily kept in memory versus primarily kept in a register.

      [Hint: consider what the compiler has to do when it’s done computing on a value in a register.]

    • (subpart b) Suppose we are comparing two processors, one with 8 registers with a 2.2 GHz clock rate (2.2 billion clock cycles per second) where instructions that only access registers take 1 cycle and instructions that access memory take an average of 3 cycles and another with 16 registers, a 2GHz clock rate and the same cycles per instruction.

      For the benchmark programs, how much slower or faster would the benchmark programs be on the 16-register procesor based on our benchmark results? Describe your calculation.

using YAS

Our textbook authors have written a Y86 assembler, which we distribute with HCLRS. To use it:

  1. First download hclrs.tar.

  2. Extract hclrs.tar. You can do this from the command line using tar xvf hclrs.tar, which will create an hclrs directory.

  3. In the hclrs directory, run make.

  4. In a text editor, create an example Y86 assembly program simple.ys:

    start:
        irmovq $100, %rax
        irmovq $1, %rbx
        irmovq $3, %rcx
        irmovq $4, %rdx
    loop:
        subq %rbx, %rax
        addq %rdx, %rcx
        jg loop
        halt
    
  5. Run tools/yas simple.ys to produce simple.yo:

    0x000:                      |         start:
    0x000: 30f06400000000000000 |             irmovq $100, %rax
    0x00a: 30f30100000000000000 |             irmovq $1, %rbx
    0x014: 30f10300000000000000 |             irmovq $3, %rcx
    0x01e: 30f20400000000000000 |             irmovq $4, %rdx
    0x028:                      |         loop:
    0x028: 6130                 |             subq %rbx, %rax
    0x02a: 6021                 |             addq %rdx, %rcx
    0x02c: 762800000000000000   |             jg loop
    0x035: 00                   |             halt
    
  6. This file includes the machine code for the assembly program on the right-hand side. For example, reading the line 0x00a: 30f30100000000000000 tells us that the byte with index 0xa is 0x30, the byte with index 0xb is 0xf3, with 0xc is 0x01`, and so on.

    Note that this format is deliberately very similar to what our object file format expects, so you can copy-and-paste most of the instructions.

  7. yas only handles complete assembly programs, so if you try to assemble a file which references a label from elsewhere it will fail. To deal with this, I would recommend replacing the label with one that’s actually present in the assembly file so yas will work, and then manually determining where that label’s address was placed.

benchmark statistics

We have built a tool that captures on 64-bit x86 Linux programs some statistics about the instructions in that program. We both check how many times the instruction is present in the program’s executable and its libraries and how many times the instruction is actually issued. For example, given the program:

main: movq $0, %rax start_loop: addq $1, %rax cmp $20, %rax jle start_loop ret

we would say that the cmp instruction is in the program 1 time and is issued or executed 20 times (the number of iterations the loop is executed).

Our tool identifies each instruction fits into categories like “register-to-register unconditional mov”, and outputs both counts:

(I believe these statistics are mostly correct, but given the complexities of obtaining them and interpreting x86-64 instructions, it’s likely there are some inaccuracies.)

Our tool is based on processing a list of executed instructions, so it does not count instructions that are never executed, even if they are present in the executable/library.

You can download our tool here [last updated 2020-09-14]; it has only been tested on x86-64 Linux and requires valgrind and objdump (from GNU binutils) to be installed. (I believe both of these are present on department machines like portal.cs.virginia.edu.)

However, you do not have to run this tool to complete this lab; instead, we will supply several outputs you will use for later questions:

You can download these in this archive [last updated 2020-09-15 (to correct file format issue)]. The outputs each contain a table, whose columns are labeled in the second line of the output and are:

(In addition, we supply the source code for the custom programs involved.)

For the first three programs, we also supply statistics from them compiled in a way where only 16 registers (instead of 8) is available (for both general purpose and floating point/vector registers), by telling GCC that the other registers are reserved using the -ffixed-REG compiler option.

Note that some of the categories of statistics overlap; for example, we report both “instructions that write memory” and “non-mov/cmov/push instructions that write memory”.