- topic list - my recommendation: look at lecture topics (non-exhaustive) - intro (powers of two, why C), compilation pipeline - assembly (ATT syntax, lea, condition codes) - bitwise - C (structs, pointer arithmetic, undefined behavior) - ISAs (tradeoffs, RISC/CISC, machine code formats) - SEQ * components (register files, registers, memory) * MUXes and how they're used * conceptual stages - pipelining (throughput, latency, idea of dividing up a circuit with regs) - things that we don't expect you to memorize - any opcodes - HCLRS: you may have read HCLRS to describe circuit - structs - structs are like classes in C++ but --- you have to write "struct" before the name or use a typedef --- everything is public and there are no methods - compilation pipeline - C (or other high-level language) --> assembly [what instructions, but human-readable] --> object files [machine code for instructions and data, but not "linked" with other files yet] ---> executables [machine code for instructions and data, but with all processing to handle combingin multiple source files done] - object files machine code data but we're missing some addresses in the machine code: for addresses that come from other files: what are we looking for "relocations" (e.g. "figure where printf is, then put its address here") for addresses that we want to provide to other files "symbol table" entries (e.g. the "main" function starts here) - what contains what` assembly file: contains all the instructions (but maybe not enough to figure out what the original C code was) and labels (human-readable-ish names) while (x > 0) { ... } and while (x >= 1) { ... } --> same assembly? object file: contains all the instructions (but in not human-readable --- just the raw bytes), and names that other object files might need in symbol + relocation tables contains all the data needed to run the code (constants, etc.) executable file: contains all the informatoin needed to run the program labels and other names not needed, but might be preserved for debugging - labels - stands in for a memory address within instructions movq $global_variable, %rax movq $0x12345678, %rax if global_variale is located at 0x12345678 - used to mark where we want a convenient name for a memory address 'foo: ...' ---> make 'foo' a name for whatever the address of '...' is - usually turn into symbol table entries (where defined) and relocation entries (where used) - RISC v CISC - Reduced Instruction Set --- simpler instructions - RISC: exposing what the hardware can do well to compilers/assembly writers/etc. - and spending time optimizing simple instruction execution e.g. more registers instead of more complex instructions - CISC: exposing what is convenient for assembly/compiler writers - in practice: almost never clear is architecture is really CISC or RISC e.g. Y86: RISC-like: * small number of addressing modes (ways of specifying operands) * small set of instructions * don't have any loop-like operations in in instructions * don't access data memory more than once in an instruction * don't require two+ ALU operations/instruction CISC-like: * push, pop, call, ret - machine code format - Y86 general format: [opcode byte][register byte][immediate] ^^^^^^^^^^^^^ | icode -- primary opcode ifun ("function code") -- secondary opcode distinguished between families of related instructions: cmovl, cmovg, rrmovq, ... jl, jg, jle, jmp, ... add, sub, xor, ... register byte --- only included if instructoin uses registers contains rA and/or rB (depending on the instruction) immediate value --- only included if the instruction uses an constant 8-byte little endian constant representing * a consytant (irmovq, rmmovq, mrmovq) * an address (jmp, call) - variable length because some instructions omit register byte/immediate value - each field is in the roughly the same place in each instruction e.g. wire up same values to register file register number inputs for many instructions - C to assembly with dereferencing - C pointer is represented in assembly by a register/something containing address int *p = &x; // p is pointer a 4-byte int p might be stored in %rax as the address of 'x'. - adding/subtracting to a pointer add/subs from the address, but in units of whatever the pointer points to: p += 4; --> add $(4 * sizeof(int)), %rax - derefernecing a pointer computes the appropriate address, then uses it *p = 10; --> movl $10, (%rax) ^--- 4 byte value *(p + 4) = 10 --> movq %rax, %r8 addq $(4 * sizeof(int)), %r8 // add 16 movl $10, (%r8) -or- movl $10, 16(%rax) *(p + x) = 10; ---> say x is in %rcx movq %rax, %r8 movq %rcx, %r9 imulq $4, %r9 addq %r9, %r8 /// rax + rcx * 4 movl $10, (%r8) -or- movl $10, (%rax, %rcx, 4) - testq testq %rax, %rbx is the same as pushq %rbx andq %rax, %rbx // to set condition codes popq %rbx or "and, but only set conditoin codes (don't change normal registers)" typical use is for testing positive/negative/zero testq %rax, %rax // ZF = 1 iff %rax = 0; SF = 1 iff %rax is negative je somewhere // je -- jump if ZF = 1 (subtraction yielded zero --> equal) // jump if %rax was zero jl somewhere // jl -- jump if ZF = 0 and SF =1 // jump if %rax was negative - cmp A, B computed B - A, and if B - A < 0, then jl will happen - conditional mov instructions cmovXX cmovg %rax, %rbx --> conditional move if greater than "if the flags say 'greater than, then copy %rax into %rbx" jle after_mov rrmovq %rax, %rbx after_mov: - when condition codes are set == the value becomes what the operation result says - cmp, test --- special cases - most arithmetic --- including all the "OPq" instructions on Y86 - not set by mov, lea, push, pop, etc. - not set by jumps, etc. - generally, we won't ask about really tricky cases b/c that's what Intel's manual is for - not cleared (unless you deliberately do arithmetic/etc. to do that) - register updating and clock cycles (single-cycle) - before rising edge, inputs to each register have a value - after rising edge, outputs reflect the input just before rising edge "copy input to output on rising edge" then wait for new input - order in single-cycle / stages and timing in single-cycle - timing in single-cycle: - writes (registers, memories) happen at the rising edge of the clock single synchronizatoin point causes us to move on to next instruction b/c the PC is a register - everything else happens as values are available based on dependencies can't read correct register until instruction memory outputs what the register number is - stages in single-cycle: fetch / decode / execute/ memory / writeback / PC update - sometimes correspond with timing, but not as a rule - organizing the processor - stages in pipelined: fetch (+PC update) /decode/ execute/ memory/ write compute PC early so we can fetch the next instruction immediately after - delays in physical hardware - registers take some amount of time write is not actually instaneous, so "register delay" for it to happen (turns out the register delay is before+after rising edge) - other components need time e.g. adder can't instaneously compute the sum of two numbers need to input the correct values for a while to get the result ---- - call/ret x86/Y86 call foo ---> i pushq $after_call // "the return address" // decrement the stack pointer jmp foo after_call: ret ---> popq tepmorary-location // "the return address" // increment the stack pointer jmp *temporary-location // go back there - ISA versus microarchitecture - ISA: interface from the point of view of assembly + assembler/linker writers in order to know what programs will do informed by possible implementations, but doesn't assume any particular one - microarchitecture: a particular implemnetation strategy usually ISA "suggests" some parts of the microarchitecture but still tons of flexibility: e.g. single-cycle versus pipelined e.g. how long do instructoins take in cycles? maybe don't do single-cycle, so you can make pop take more time in order to have only one write port on the registe file