- powers of two K/M/G/etc. ~ we didn't talk about this yet ~ K = 2^10 M = 2^20 G = 2^30 T = 2^40 2^14 = 2^4 * 2^10 = 2^4 * K --> 16K 2G = 2^1 * 2^30 = 2^31 - pipeline question on quiz --- timing not pipelined took 10 ns pipelined: 2 ns per stage x 5 stages cycle# time op 1 2 3 4 5 ---- 1 #1 --- -- -- -- 2 #2 #1 -- -- -- 3 #3 #2 #1 -- -- 4 #4 #3 #2 #1 -- 5 #5 #4 #3 #2 6 #5 #4 7 #5 8 #5 9 #5 9 cycles * 2 ns per cycle = 18 ns 5 cycles to start everything, 4 cycles for the last to finish 5 cycles for the first to finish 1 cycle for each other one to finish - S2018 Q8 ~ LEA C declaration: long **x; x is a pointer to pointer to long x is stored in %r8 x += **x; suppose x=%r8 contains 0x1000 (pointer to pointer to long) *x is the value in memory at address 0x1000 suppose memory at 0x1000 contains 0x2000 (pointer to long) **x is the value in memory at addresss 0x2000 suppose memory at 0x2000 contains 42 (long, not a pointer) AKA 0x2a x += 42; "advance x by 42 of whatever it points to" x becomes 0x1000 + 0x2a * sizeof(what x points to) = 0x1000 + 0x2a * 8 [8 is the sizeof(long*)] dereference 0x2000 (find **x) then use lea to add it to x leaq offset(base, index, scale), output ---> output = offset + base + index * scale movq (%r8), %rax // rax <- *x movq (%rax), %rax // rax <- **x leaq (%r8, %rax, 8), %r8 // r8 <- r8 + rax * 8 movq ((%r8)), %rax --> assembler error - conditional jumps and CCs ZF and SF (Y86) / OF and CF (extra ones to handle overflow) jle jump if last result was <= 0 if no overflow: ZF = 1 --> result was 0 or SF = 1 --> result was negative jg jump if last result was > 0 ZF = 0 --> result was not 0 and SF = 0 --> result was positive if overflow OF then SF is wrong (if worried about signed overflow) SF ^ OF --> SF "corrected for signed overflow" if overflow CF then SF is wrong (if worried about unsigned overflow) last result: set by almost any arithmetic includes ALL OPq instructions in Y86 (addq, andq, subq, xorq)` not reset because - Y86 mrmovq encoding question (from quiz) - read the encoding table - we won't require you to memorize rA/rB ordering exceptions (we'll give the relevant part of the table) - constants are always little endian lowest address byte has least signigicant bit (1's place) textbook's table has the lowest address in the column labelled "0" - register file inputs and outputs - 2 "read ports" --- read a register during a cycle each read port (A, B) has: 4-bit source register # input (which register to read) 64-bit register value output (value read from that register) - 2 "write ports" --- write a register at the rising edge of the clock each write port (E, M) has: 4-bit destination register # input (which register to write) 15 means "no register" = 0xF = REG_NONE in HCLRS 64-bit register value input (value to write to that register) - timing for single-cycle processor - reads and computation happen between rising edges of the clock - writes happen at the rising edge of the clock ("end of the clock cycle") - everything instruction takes one clock cycle - everything written for the instruction is written all at once - stages in general - in single-cycle: don't tell you when things actually happen - organizational division --- make it easier to think about processor design? - stages for PUSH/POP - fetch [read instruction, split instruction into pieces, compute address of next instruction] PUSH/POP: read instruction, find icode extract rA compute PC + 2 (because PUSH/POP are both two bytes: icode/ifun byte + ra/rb byte) - decode [read registers] PUSH: read rA read RSP (so we know where to write rA. Also we need to update RSP based on its old value) POP: read RSP (so we know where to read new value of rA. Also to update RSP) - execute [use ALU: address AND "normal" arithmetic] PUSH: compute new RSP = old RSP - 8 POP: compute new RSP = old RSP + 8 "%rsp points to the most recently pushed value, not to the next unused stack address" - memory [use data memory] PUSH: write rA into memory at address the new RSP = old RSP - 8 = ALU output POP: read from memory at address the old RSP (= register file output != ALU output) - writeback [send values to register file to update] PUSH: RSP <- new RSP = ALU output POP: rA <- value read from data memory = data memory output RSP <- new RSP = ALU output - PC update PC register input <- PC + 2 (computed in the fetch stage) - what do we need to memorize about stages - you should know what each stage does (what components/operations are "part" of that) - you should be able to figure out a correct way of using the processor components to do something ------------ - format of the exam - similar to prior semesters - multiple choice or very short answer (one word/number) - 20-25Q - icode versus opcode - I've been sloppy about this - sometimes we use opcode to mean the first byte which has icode (4-bit number indicating which instruction it is, counting jXX, cmovXX and OPq each as one instruction) and ifunc (4-bit number indicating which OPq or jXX or cmovXX an instruction) - sometimes we use opcode to only mean the icode - compilation -- when do you know what - [C/C++/etc. source code] ---> [assembly] ---> [object file] ----[combined with other .o files]---> executable ^^^^^^^^^ decided what instructions to use ^^^^^^^^^^^^^ translated instructions to machine code, but don't know where anything is in memory also translated constants, etc. to bytes ^^^^^^^^^^^^ decided on the addresses of everything, and filled in any address fields in the machine code - to let filling address work, in the object file we hvae [relocations]: "at this part of the machine code, replace this with address of something" call printf ---> "replace bytes 2-10 with address of 'printf'" [symbol table entries]: "the name something is at this part of the machien code ... printf: pushq ... --> "'printf' is at byte 100 of the machine code of this object file" each object file has its own symbol table and own relocation table linker combines all the object file's symbol and relocation tables together - how do things like printf get linked in gcc -o file.exe main.o -- actually runs the linker with a lot of extra .o and .so files sometimes the linker can be told to only include .o or .so files if they're used - dynamic and static linking - dynamic linking --- we do what the linker does, but at runtime instead of when producing the executable - main advantage: smaller executables -- load, e.g., the C library from common file at runtime (don't have N copies of the library among N executables) - RISC versus CISC - RISC: simpler for the hardware maker (and who cares about the software) ~ simpler instructions ~ no accessing memory and computing someting in instruction ~ ... ~ RISC: typically more registers more registers is an easy way of using extra HW to speed things up - CISC: do whatever's convenient fo rthe software people (even if it makes it hard to make the hardware work) ~ adding instructions for common tasks, no matter how complicated tasks ("string in string" or "memcpy" or "push/pop")