- compilation pipeline file contents - C/high-level language ---> - assembly [lists each instruction, in a human-readable-ish way] with placeholders for things from other files and names for things other files might want - object file --> machine code: actual bytes for the instructions that the HW wants actual bytes for data [constants, etc.] but with placeholders for addresses that we don't know yet found in other files "relocation entries" don't know where this file will be in memory indications of addresses other files might want symbol table e.g. main is here e.g. printf starts here - executables bytes ready to be loaded into memory to run the program linker combines relocations (addresses to fill in) and symbol table entries (where is something found in a file) to produce executable ready to run don't need names from symbol tables/relocatoin entries anymore but we might have them around for debugging - Spring 2018 Q9 encoding of pushq pushq in Y876: 0xA0 [rA][F] why don't we instead do 0xA[rA] options from Q about what this does: * harder to compute instruction lengths? --- no b/c this is exactly what we do for nop/ret versus other instructions * additional MUXes or MUX inputs/other logic between instruction memory and register file? --- yes inputs to the register file: four register numbers (srcA, srcB, dstE, dstM) usually come from instr memory when running push: srcA = rA, srcB = rsp; dstE = rsp ^^----- part of the second byte of th einstruction now we need to get rA from somewhre else (first byte instead of second) --> change the MUX input? can't just change it because add, etc. still need the second byte * additional kind of ALU operation? --- no ALu is just sub from RSP * register file with more inputs? --- no b/c we can use a MUX instead - Fall 2018 Q5, 6 about machine code simulation 0x000: 30 f4 30 00 00 00 00 00 00 00 | irmovq $0x30, %rsp RSP = 0x30 0x00a: 30 f0 25 00 00 00 00 00 00 00 | irmovq $0x25, %rax RAX = 0x25 ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ 8 bytes interpreted based on little-endian 0Xf0 is the least significant byte 0x25 is the next least significant every other byte is 0x00 0x25f0 -- final value of RAX 0x014: 61 04 | subq %rax, %rsp RSP = RSP - RAX RSP = 0x30 - 0x25 = 11 (0x0B) 0x016: b0 0f | popq %rax { RAX <- M[RSP] = M[0x0B] --- loading 8 bytes ^^^^^^--- 0x25f0 (8 bytes, little end) { RSP <- RSP + 8 = 0x0B + 8 = 19 (0x13) ^^^^ 0x018: 00 | halt - Spring 2018 Q3 about condition codes 0x000: 63 00 | xorq %rax, %rax <-- sets CC RAX <- RAX xor RAX = 0 0x002: 50 30 0b 00 00 00 00 00 00 00 | mrmovq 0xB(%rax), %rbx ^^ RBX <- M[0xB + RAX] = M[0xB] = 0x306200 = 0x0000000000306200 0x00c: 62 30 | andq %rbx, %rax <-- sets CC ^^ ^^ RAX <- RBX bitwise-and RAX = 0 ZF = 1 iff the result is 0 --> ZF = 1 SF = 1 iff the result is negative --> SF = 0 0x00e: 00 | halt ^^ ^^ ^^ ^^ ^^ - difference between testq and cmpq testq %rax, %rbx similar to pushq %rbx andq %rax, %rbx computes rax AND rbx = result ZF = 1 iff the reuslt (rax AND rbx) is 0 ... popq %rbx or "andq %rax, %rbx but only set condition codes (don't touch %rbx)" cmpq %rax, %rbx similar to pushq %rbx subq %rax, %rbx popq %rbx or "subq %rax, %rbx but only set condition codes (don't touch %rbx)" common uses of test --- figure out if value is zero or negative testq %rax, %rax computes rax AND rax = rax ZF = 1 iff the result(rax) is 0 SF = 1 iff the result(rax) is negative je only_if_rax_is_zero jump if equal (based on subtractoin) jump iff ZF = 1 jl only_if_rax_is_negative kump if less than (bsaed on subtraction) jump iff SF = 1 cmpq %rax, %rbx rbx - rax == 0 --> je rbx - rax < 0 --> jl - when in clock cycle do things happen - some operations are triggered by our clock signal (synchornizing signal) - writes to registers (incl cond codes), register files, memories on rising edge of clock - other operations happen as inputs are available - register file can't read the correct register until it gets that as input - it can't get it as input until instruction mmemory reads the instruction - physical dependencies determine order - components take a certain amount of time once they get inputs to produce stable outputs [note: not determined by how we divide things into stages in the single-cycle CPU, but there might a correlation] - pipelining throughput/latency - throughput: instructions per amount of time - latency: time from when instruction starts until when it finishes [------ 1000 ps of computation ------] single-cycle [---- 600 ps -------][register][---- 400 ps ----] two-stage pipeline latency: 600 ps + however long the register takes + 400 ps throughput: every 600 ps + however long the register takes we can insert a new instruction into the first part every 600 ps + however long the register takes, we'll get a new result from the sceond part 1 instruction / (600 ps + however long the register takes) --> throughput ----- - little endian versus big endian on exams when I write a number in English, I don't write 001 to mean one hundred when we write a number on an exam, we don't write 001 to mean one hundred or 0x001 to mean 1 * 16^3 endianness matters when we divide numbers into parts explicitly when we just write it out in English, we rely on the English conventoin that the most significant part is leftmost so doesn't matter if it's just one nunber written down but if we write a sequence of numbers to mean one number, you have to ask "how is the sequence ordered?" this is where endianness matters - F2018 question 9 about adding instr to SEQ adding pop2q rA, rB to singl-ecycle while still being single-cycle rA <- 64 bits from mmeory at RSP rB <- next 64 bits fro mmemory at RSP RSP <- RSP + 16 [*] modifying the data memory to read 128-bits --> yes [-] modifying the registrer file to read three registers at a time --> no, onyl need RSP [*] modifying the register fle to write three registers at a time --> yes, RSP, rA, and rB [-] increasing the size of each register in the register file to 128 bts --> no - F2017 question 2 circuits and combinatorial logic and clock sigs [drawing] - arrays and strings in C arrays are kinda like pointers, but: you can't assign to them (unlike pointers) and their sizeof() is different (the size of the actual data in the array, not the address of it) pointer arithmetic add X to a pointer --> add X * sizeof(what the pointer points to) to the address means that array[X] --> *(array + X) take the address of the beginning of the array add X times the sizeof each array element to that then access memory there strings in C string constants are character arrays with a 0 entry at the end printf("Hello") roughly the same as const char unnamed_constant[6] = {'H', 'e, 'l', 'l', 'o', '\0'}; ... printf(unnamed_constant); /* same as printf(&(unnmaed_constant[0])); */ sizeof(unnamed_constant) == 6 [bytes] based on how it is declared const char *pointer = unnamed_constant; sizeof(pointer) == 8 [bytes] --- b/c that's an address based on how pointer is declared int array[4] = {1,2,3,4}; sizeof(array) = sizeof(int) * 4 (16 on our machines) array[4] = 5; // <-- undefined behavior char not_terminated[5] = {'H', 'e', 'l', 'l', 'o'}; printf(not_terminated) // <-- undefined behavior // because printf is accessing not_terminated[5] which is out of bounds ---- - undefined behavior and the question on quiz 2 ~ definitoin is "compiler can do whatever it wants" ~ compiler producing different result for computation every time this is likely for some types of undefined behavior example: accessing an uninitialized value "easiest" thing for compiler to do --- use whatever's in that variable's memory/register location could be different each time (new addresses each time your program is run) e.g. your stack might be at a different place each time out-of-bounds of an array "easiest" thing for compiler to do --- access whatever's in memoyr after the array could be different each time (new addresses each time your program is run) --- - Fall 2018 Q4 -- calculation by ALU in textbook's design [-] jmp --- no, we just change the PC, don't care what the ALU does [*] pushq --- need to cmpute the new stack pointer [*] mrmovq --- need to compute the address to pass to the data memory [*] call --- need to compute the new stack pointer ---- - computed jump in Y86 not really a thing, but... pushq %rax ret computed jump to %rax (but changes the stack)