Changelog:

14 Sep 2020 7:05 PM: fix syntax error (use of rrmovq instead of mrmovq) in sum.ys.
14 Sep 2020 10PM: in description of benchmark data, clarify that instructions that are never executed are not counted
15 Sep 2020: rename instr-reports.tar.gz to instr-reports.tar to reflect that it’s not compressed
15 Sep 2020 7PM: in part 2, be explicit that we only use power-of-two register counts, so we don’t have to consider complex register number encoding schemes
15 Sep 2020 7:10PM correct sum.ys to use registers that actually make sense; we will accept object files based on either version.
15 Sep 2020: state what columns in the benchmark outputs since it can be tricky to see where the column labels are in the output files; also reformat description of the two types of instruction counts earlier to make it clearer
15 Sep 2020: clarify “on a value in memory versus in register” to “on a value primarily kept in memory versus primarily kept in a register”
16 Sep 2020: remove table row from part 2 that we did not ask to be copied to the answer sheet to avoid confusion.

Your task

For all of the tasks below, answer the corresponding question on the answer sheet. Remember that for the lab (but not the corresponding homework) you may collaborate with other students; if you do please, note this in the comments on the answer sheet.

Linking

(answer sheet part 1) Using the supplied object file format description
- translate these Y86 files sum-main.ys and sum.ys into “object files”, and copy-and-paste it to the answer sheet. (we originally linked a different, less useful version of sum.ys; you can submit an object file based on either version.)
You may (and we would strongly recommend) use the yas (Y86 assembler tool) distributed with HCLRS to help you. See the instructions below.

ISA tradeoffs

(answer sheet part 2) Consider a fixed-instruction-length ISA. Complete the table below describing options for instruction layout and register count:

instruction size (bits)	opcode size (bits)	maximum register count (2 operands)	maximum register count (3 operands)
8	3	4	2
16	4	64	16
16	5	~	~
16	8	~	~
32	10	~	~

(You may assume we only use power-of-two register counts.)

Record your answers on the answer sheet.

We have supplied some statistics from instructions in some benchmark programs described below, which will be used for couple of the below questions and some questions on the corresponding homework.
(answer sheet part 3) For three of the benchmarks, we provide statistics about them compiled with 8 registers available and 16 registers available. Answer the following questions about these statistics:
- (subpart a) Some of the programs have fewer total instructions executed when fewer registers are available. This is related to x86-64 allowing most instructions to compute on values in memory (unlike Y86-64, where they must be moved into a register first). Give an example of how a compiler would generate code with fewer instructions when computing on a value primarily kept in memory versus primarily kept in a register.
  
  [Hint: consider what the compiler has to do when it’s done computing on a value in a register.]
- (subpart b) Suppose we are comparing two processors, one with 8 registers with a 2.2 GHz clock rate (2.2 billion clock cycles per second) where instructions that only access registers take 1 cycle and instructions that access memory take an average of 3 cycles and another with 16 registers, a 2GHz clock rate and the same cycles per instruction.
  
  For the benchmark programs, how much slower or faster would the benchmark programs be on the 16-register procesor based on our benchmark results? Describe your calculation.

using YAS

Our textbook authors have written a Y86 assembler, which we distribute with HCLRS. To use it:

First download hclrs.tar.
Extract hclrs.tar. You can do this from the command line using tar xvf hclrs.tar, which will create an hclrs directory.
In the hclrs directory, run make.

In a text editor, create an example Y86 assembly program simple.ys:

start:
    irmovq $100, %rax
    irmovq $1, %rbx
    irmovq $3, %rcx
    irmovq $4, %rdx
loop:
    subq %rbx, %rax
    addq %rdx, %rcx
    jg loop
    halt

Run tools/yas simple.ys to produce simple.yo:

0x000:                      |         start:
0x000: 30f06400000000000000 |             irmovq $100, %rax
0x00a: 30f30100000000000000 |             irmovq $1, %rbx
0x014: 30f10300000000000000 |             irmovq $3, %rcx
0x01e: 30f20400000000000000 |             irmovq $4, %rdx
0x028:                      |         loop:
0x028: 6130                 |             subq %rbx, %rax
0x02a: 6021                 |             addq %rdx, %rcx
0x02c: 762800000000000000   |             jg loop
0x035: 00                   |             halt

This file includes the machine code for the assembly program on the right-hand side. For example, reading the line 0x00a: 30f30100000000000000 tells us that the byte with index 0xa is 0x30, the byte with index 0xb is 0xf3, with 0xc is 0x01`, and so on.

Note that this format is deliberately very similar to what our object file format expects, so you can copy-and-paste most of the instructions.
yas only handles complete assembly programs, so if you try to assemble a file which references a label from elsewhere it will fail. To deal with this, I would recommend replacing the label with one that’s actually present in the assembly file so yas will work, and then manually determining where that label’s address was placed.

benchmark statistics

We have built a tool that captures on 64-bit x86 Linux programs some statistics about the instructions in that program. We both check how many times the instruction is present in the program’s executable and its libraries and how many times the instruction is actually issued. For example, given the program:

main: movq $0, %rax start_loop: addq $1, %rax cmp $20, %rax jle start_loop ret

we would say that the cmp instruction is in the program 1 time and is issued or executed 20 times (the number of iterations the loop is executed).

Our tool identifies each instruction fits into categories like “register-to-register unconditional mov”, and outputs both counts:

the number of times those instructions appear in the program and its libraries (assuming it was executed once); and
the number of times the instruction was issued during a particular execution.

(I believe these statistics are mostly correct, but given the complexities of obtaining them and interpreting x86-64 instructions, it’s likely there are some inaccuracies.)

Our tool is based on processing a list of executed instructions, so it does not count instructions that are never executed, even if they are present in the executable/library.

You can download our tool here [last updated 2020-09-14]; it has only been tested on x86-64 Linux and requires valgrind and objdump (from GNU binutils) to be installed. (I believe both of these are present on department machines like portal.cs.virginia.edu.)

However, you do not have to run this tool to complete this lab; instead, we will supply several outputs you will use for later questions:

from a matrix multiply of size 1024x1024
from a 1024-variable linear solve
from a program counting the number of solutions to the N-queens problem for N=13
from the chess engine LazyGull analyzing a particular chess position
from GCC compiling the N-queens program with optimizations enabled

You can download these in this archive [last updated 2020-09-15 (to correct file format issue)]. The outputs each contain a table, whose columns are labeled in the second line of the output and are:

category (description of type of instruction)
issued (number of times instructions of that type executed)
% of issued (number of times instructions of that type executed divided total number of times instructions exectued times 100)
in program (number of instructions of that type)
% of in program (number of instructions of that type divided by total number of instructions times 100)

(In addition, we supply the source code for the custom programs involved.)

For the first three programs, we also supply statistics from them compiled in a way where only 16 registers (instead of 8) is available (for both general purpose and floating point/vector registers), by telling GCC that the other registers are reserved using the -ffixed-REG compiler option.

Note that some of the categories of statistics overlap; for example, we report both “instructions that write memory” and “non-mov/cmov/push instructions that write memory”.

Contents

Your task

Linking

ISA tradeoffs

using YAS

benchmark statistics