- exam format - similar to prior semesters (which are posted, so you can practice) (link "Study Materials") - mostly multiple choice or very short answer (i.e. one number/word) ~20-25 questions - what is covered - everything in lecture up to and including Thursday - everything in labs/HW up to and including HCL2 - what do I need to remember about the Y86 instructions - we do not expect you to remember constants (register #s, instruction opcodes, etc.) - we expect you to know the general layout of Y86 instructions - we expect you to know the order stack operations happen (but note at top of exam says to remind you: "%rsp points to the most recently pushed value, not to the next unused stack address." ) - S2018 Q7 ~ powers of two - we didn't go over this this semester - K ~ 2^10 / M ~ 2^20 / G ~ 2^30 / T ~ 2^40 16K = 2^4 * 2^10 --> 2^14 2^32 -> 2^2*2^30 = 2^2G = 4G - memory and endianness - memory is an array of bytes - each byte has an address - sometimes, we want to get a multi-byte value from memory decision: which byte is which part of the value? - reading a 4-byte int from address 1000 read four byte from memory byte @ 1000: 0x34 byte @ 1001: 0x01 byte @ 1002: 0x02 byte @ 1003: 0x03 - little endian: byte at smallest address (1000) --> least significant (1's place) 0x 03 02 01 34 --> 0x03020134 - big endian: byte at largest address (1003) --> least significant (1's place) 0x 34 01 02 03 --> 0x34010203 - register and endianness --- doesn't matter -- byte 0 of a register doesn't have an address - S2018 Q2 ~ Y86 -- register values after running 0x000: 63 00 | xorq %rax, %rax 0x002: 50 30 0b 00 00 00 00 00 00 [00] | mrmovq 0xB(%rax), %rbx 0x00c: [62] [30] | andq %rbx, %rax 0x00e: [00] | halt [00] [00] [00] [00] 1 XOR 1 --> 0 0 XOR 0 --> 0 xorq %rax, %rax same as irmovq $0, %rax %rax = 0 mrmovq 0xB(%rax), %rbx read memory at 0xB + value of %rax, store in %rbx at 0xB bytes 0xB: 00 <--- 1's place 0xC: 62 0xD: 30 0xE: 00 0xF: 00 .. 00 0x00 00 ...00 30 62 00 = 0x306200 - stages - stages != order things happen in the single-cycle processor - stages for single-cycle are an organizational tool - fetch: use the instruction memory find next instruction and instruction length split instruction into fields (rA, rB, etc.) - decode: read registers (and figure out which registers to read) might need to read RSP even if not mentioned in instruction - execute: use the ALU both for computation which is the point of instruction (add, ...) and for address computation (mrmovq, push, ...) - memory: use the data memory - writeback: write registers (and figure out which registers to write) might need to write RSP even if not mentioned in instruction - PC update: update the PC (handling jumps, calls, return, etc.) - single-cycle processor ~ data flow - work backwards: what does the instruction change (register/memory/PC), figure out --- where does that value have to come from? --- and control signals (e.g. register numbers) come from? computation that's not PC + X ---> from ALU from memory that's not in the instruction ---> from data memory if from memory --- where does memory get its address - remmeber what has to happen in each "stage" and figure it out from that - Q2 on the most recent quiz ~ popq and ALU POPQ's ALU result is used for what? "%rsp points to the most recently pushed value, not to the next unused stack address." what does POPQ do read the old value of RSP read from memory at _______ [is this computed with the ALU?] RSP (address of most recently pushed value) NOT computed by the ALu write to the register file the value being popped (read from memory) a new value of RSP [is this computed with the ALU?] new RSP <- old RSP + 8 <-- yes, ALU operation update the PC to point to the next instruction -------------- - arithmetic versus logical shift - only applies to >> - arithemtic: shift in copies of the sign bit (-1) ARITHMETIC >> N --> -1 1111111...11 --> 11111..111 (-1) LOGICAL >> 2 --> big value 11111...11 --> 001111..11 (-2) ARITHMETIC >> 1 --> -1 11111..1110 --> 111111..1111 (-2) LOGICAL >> 1 --> big value 11111..1110 --> 011111..1111 ^ - object files - write down the informatoin the linker needs - plus machine code and data what's loaded in memory - but linker needs to: - fix up addresses in the machine code (problem: assembler doesn't know where machine code will go) - "relocations": holes with addresses to fill in call printf ---> turns into [CALL OPCODE] [HOLE] in machine code - "symbol table entries": information about "symbols" that might be memory locations that go in the hole printf: pushq %rbx ... symbol table: printf ---> starts 10054 bytes into this object file - linker: takes information about location in object file + holes --> fills in hole with where 'printf' ends up in memory - what happens in what part of the clock cycle - between rising edges: values are read (from memory, register files) and computations happen - at a rising edge: writes happen registers are updated based on their current inputs so output after the rising edge has the new value memories, register files are updated based on their current inputs - F2017 Q2 ~ circuit tracing see picture - order of operations in POP/PUSH single-cycle processor: all writes happen together RSP update happens at the same time as {register update for POP or memory update for PUSH} BUT: POP reads from old RSP, not new RSP, so we have to pass the OLD RSP to the data memory (even though we're computing the new one)` BUT: PUSH reads from new RSP, not old RSP, so we have to pass the NEW RSP to the data memory (and not use the old one, even though we have a way to do that) textbook's implementatoin of POP reads RSP twice you don't need to do this in HCLRS (can add more MUXes than textbook uses) - bit manipulation (some example?) "mask and shift" mask = value with 1s where you want to do something (or sometimes 0s where you want to do something) do something ~ "keep certain bits" (bitwise AND) "flip certain bits" (bitwise XOR) "set certain bits to 1" (bitwise OR) "clear certain bits" (bitwise AND with 0s) shift = move bits the part of the number you want example: swap last two bits of a number ABCDEF --> ABCDFE (if ABCDEF are bits of number) (x & 1) << 1 --> last bit moved to second-to-last place ^^^^^^^ mask with 0000001 to extract last bit (x & 2) >> 1 --> second-to-last bit moved ot last place ^^^^^^ mask with 0000010 to extract second-to-last bit (x & (~3)) ^^^^^^^^^^ mask with 11111100 to keep all but last and to second-to-last bits have 0 or correct bit set exactly once for every bit, so we can OR everything together ((x & 1) << 1) | ((x & 2) >> 1) | (x & (~3)) - extracting values from i10bytes 1's place for a little endian number is in this byte vvvvv byte 0 1 2 3 4 .... ic if ra rb .... ^ \---- 1's place is the least significant part of this byte ic = i10bytes[4..8] (NOT THE 1's PLACE of the first byte) byte 9 8 7 6 4 .... 1 0 ra rb ic if - pipelining and register delays - pipelining and figuring out how long things take - idea: compute cycle time compute time needed to be done every cycle (every stage runs in parallel --- time of the worst stage) add register delay (because we need to store the result of the comtpuation) - count cycles when in doubt make a chart with values + cycle # each input needs to go through every stage to get to output second input follows first input - malloc/free - malloc(# bytes) allocates memory on the heap - free frees up whatever was malloc'd - rule: always call free EXACTLY ONCE for each call to malloc - not followed: memory leaks or something that might crash