HW4 - Binary Squaring

Update 10/6: Clarified the wording of function calls and calling conventions.

This is the second of two assignments in which you will write binary code for the simple machine you created a simulator for in lab. In this assignment, we will be building upon our multiplication code from hw3 and extending our ISA to provide better support for function calls.

Adding a Stack

Our initial machine did not directly support a stack. While we could have managed a stack in our ISA using icodes 1, 2, and 3, we would like to provide the functionality at the hardware level. To do that, we decided to introduce a new register: the stack pointer rsp. This register will contain the memory address of the top of our stack.

Aside: In our machine’s version of the stack, we will only store 8-bit values. However, in practice we will want to store data of different sizes in our stack. We will see more of this as we move to x86-64.

The (Updated) Instructions

Remember that when our reserved bit (the 7th bit) is 0, we used the following icodes:

icode Behaviors

0

rA = rB

1

rA += rB

2

rA &= rB

3

rA = read from memory at address rB

4

write rA to memory at address rB

5

do different things for different values of b:

`b`	action
0	`rA = ~rA`
1	`rA = -rA`
2	`rA = !rA`
3	`rA = pc`

6

do different things for different values of b:

`b`	action
0	`rA =` read from memory at `pc + 1`
1	`rA +=` read from memory at `pc + 1`
2	`rA &=` read from memory at `pc + 1`
3	`rA =` read from memory at the address stored at `pc + 1`

In all 4 cases, increase pc by 2, not 1, at the end of this
instruction

7

Compare rA (as an 8-bit 2’s-complement number) to 0;

if rA <= 0, set pc = rB
otherwise, increment pc like normal.

Now, we will add one more icode when the reserved bit is set to 1:

0

do different things for different values of b:

`b`	action
0	Decrement `rsp` and push the contents of `rA` to the stack
1	Pop the top value from the stack into `rA` and increment `rsp`
2	Push `pc+2` onto the stack, set `pc = M[pc+1]`
3	`pc =` pop the top value from the stack

If b is not 2, update the pc as normal.

The first 2 actions (b = 0, 1) allow us to interact with the stack. When we push a value onto the stack, we decrement rsp to make space, then write the value of register A to memory at the address stored in rsp. When a value is popped off the stack, we read the value from memory at position rsp into register A and then increment rsp. Note, this does still leave the old value in memory, and that’s okay. Also note that our stack grows downward in memory.

The next two actions (b = 2, 3) provide additional support for function calls.

The first (b=2) is the function call. It adds 2 to the current pc and writes that value to the stack (also updating rsp). Consider: why might we write pc+2 to the stack instead of pc? It then performs an unconditional jump to the address given as the immediate value following the instruction (i.e., the next byte of our program). This is essentially calling the function.
The second (b=3) is the function return. It reads the top value from the stack (also updating rsp) and sets that value as the new pc. This is an unconditional jump to the address stored at the top of the stack. Use this at the end of the function to return back to the code that called it.

Aside: One thing to note here is that we are assuming that the top value of the stack, when executing instruction 83, is the address that called the current function. If the called function pushes any values onto the stack, it must pop them before returning, or else we may return to an unexpected position in memory.

We will see more of how the stack works, including the handling of return addresses, parameters, and return values, as we discuss x86-64 assembly in greater detail.

Function Call Syntax

We need more than just the address of our function code and the address to return when the function is complete. We also need parameters. Consider the following function call (in python-like syntax):

z = mult(x,y)

We need to know:
- The location of our function mult, i.e., the address in memory where that code starts
- The location of where x and y are stored
- A place to put the result, z

Our ISA will define the following calling conventions for functions. Calling conventions determine where the parameters and return values are located and also include any requirements on the registers that the function call is not using.

Instructions may have up to 2 arguments. They will be placed in registers 2 and 3; with the first parameter in register 2 and the second in register 3.
The return value for the instruction is stored in register 0.
Registers 0, 2, and 3 are considered volatile; that means that if the code calling the function wants to keep those values, it should save them to memory or push them onto the stack before calling the function (using instruction 82).
Register 1 is considered nonvolatile; that means that if the function wants to use register 1, it must save the value to memory (i.e., push onto the stack) and restore it (i.e., pop from the stack) before returning to the caller (using instruction 83).

Updating our example above with these conventions, our function call would look something like the following (in python syntax):

r0 = mult(r2, r3)

Or more specifically:

r2 = x
r3 = y
jump to (address of mult() code)
z = r0

Therefore, when we want to make a function call, we will place the parameters in registers 2 and 3 before making the call with instruction 82. We cannot assume that the values in registers 2 and 3 will be the same when the function has completed and returns. When the function completes, it will jump back (using instruction 83) to the next instruction in our code to continue execution. Immediately after the function returns, we can safely assume that the function’s return value has been stored in register 0 (overwriting the value previously stored in that register before the function call).

What about r1? What does nonvolatile mean? This means that if we were using r1 before we jumped to our mult() code, then it should still be the same afterwards. So, for example, if we wanted to pre-load our destination address 0xA0:

r1 = 0xA0
r2 = x
r3 = y
jump to (address of mult() code)
z = r0
// r1 should still have A0

It appears that the function’s code is not allowed to use register 1. But it can! It must simply stash the value away somewhere for safe-keeping, then bring it back out before the function returns. We also now have a place to store values like this: the stack. So our function can do something like:

push r1
// all the code to multiply, r0 = r2 * r3,
// using r1 as needed
pop r1 // put r1's value back
return // use reserved=1, icode=0, b=3 to jump back

So remember: register 1 will have its same value after the function returns, but registers 0, 2, and 3 may have changed.

Your tasks

This homework is divided into two separate tasks:

Update the simulator from Lab 4 to provide the stack and new instructions
Use the new instructions to write a binary program that squares the input value

Updating Our Simulators

We must first update our simulators from Lab 4 to support the stack and this new instruction.

Add the rsp to your code. In our simulators, we will initialize the stack pointer to a location late in memory (0xFF) and create a stack that grows downward in memory. There are trade offs for this choice (if our stack gets too big, it might overwrite our program!).
1. If you are using Java, update the following section of SimBase.java:
```
 // memory and registers
 public byte[] M;
 public byte[] R;
 public byte rsp = (byte) 0xFF; // new!
```
2. If you are using Python, update the following section of sim_base.py:
```
 # initialize memory and registers
 R = [0 for i in range(4)]
 M = [0 for i in range(256)]
 rsp = 0xFF # new!
```
Add the new instruction defined above (when the reserved bit is 1 and the icode is 0) to your execute() method. Note: it is up to the binary that our simulators execute to properly read the function parameters from memory and write return values to register 0. Your simulator should correctly update rsp and provide the jumps.
If the reserved bit is 1 and the icode is not 0, then set the next pc to the current pc instead of advancing it and do nothing else.

Square

Next, you will write a binary program square.binary using this new icode and associated operations.

Modify your multiplication code from HW3 to be used as a function. You will include the updated version of this code at the end of your program below.
- Instead of reading immediate values at bytes 01 and 03, it should start with the two values already stored in registers 2 and 3 (respectively).
- You should save the contents of register 1, since it should be unchanged when your function returns. Just before returning, save the value back to register 1.
- Instead of storing the result at memory location A0, return the result to the caller. That is, store the result in register 0 and use the return instruction (83).
Write a new binary program that computes the square of a number (i.e., \(x^2 = x \times x\)) and stores the product at address 0xA0. Your new code should:
1. Load the value in memory at address 0x01 into a register
2. Call your original multiply code to compute \(x \times x\) (hint: you will need the address where your mult() code starts)
3. Store the result (the squared \(x\)) at address 0xA0
4. Halt once it is done.
Include your updated multiplication code (from step 1 above) at the end of your new instructions. Note: you will need to know the starting address of your multiplication code to call it.

You should ignore overflow, so since 0x79^2 = 0x3931, the answer should be 31. This is likely to happen automatically without your explicit planning for it. You may assume that the input value will not be negative, but it may be 0.

Thus, if square.binary begins __ 0A then when it is finished it should have 64 in address 0xA0; We should be able to change the second byte of your program to compute other squares too.

Testing Your Square Binary

To test your code, do one of

python3 sim_base.py square.binary

java SimBase square.binary

or going to our updated online simulator and click the file upload button at the top of the page to load your square.binary into the simulator’s memory.

Hints, tips, and suggestions

How to write binary

We suggest following these steps, carefully, saving the result of each in a file so you can go back and fix them if they were wrong:

Write pseudocode that does the desired task
Convert any for loops to while loops with explicit counters
Change any if or while guards to the form something <= 0
- a <= b becomes a-b <= 0
- a < b becomes a+1 <= b becomes a+1-b <= 0
- a >= b becomes 0 >= b-a becomes b-a <= 0
- a > b becomes 0 > b-a becomes b+1-a <= 0
- a == b becomes a-b == 0 becomes !(a-b) == 1 becomes !!(a-b) <= 0
- a != b becomes a-b != 0 becomes !(a-b) == 0 becomes !(a-b) <= 0
Add more variables to split multi-operation lines into a series of single-operation lines
Add more operations to convert ones not in the instruction set into ones in the instruction set
Change each loop into a pair of instructions, opening with “spot1 = pc” and closing with “if …, goto spot1”
Count the number of variables needed
- If¹ it is ≤ 4, skip to step 10
- else², continue with next step
Pick a memory address for each variable. Make these big enough your code is unlikely to get that big; for example, you might pick 0x80 though 0x80 + number of variables
Convert each statement that uses variables into
1. register ← load variable’s memory
2. original statement
3. store variable’s memory ← register
translate each instruction into numeric (icode, a, b) triples, possibly followed by a M[pc+1] immediate value
turn (icode, a, b) into hex
Write all the hex into square.binary

Debugging binary is hard. That’s part of why we don’t generally write code in binary. If you get stuck, you should probably try pulling just the part you are stuck on separate from the rest and test it until it works, then put it back in the main solution.

Submit

Submit both your square.binary and SimBase.java or sim_base.py files via Gradescope. Submissions will open on Thursday.

depending on how you write your original code, this is possible for this task … ↩
… some solutions are in this case instead. ↩