- forbidden topics - we don't want you to memorize: HCL signal names the Y86 encoding figure but you should know what fields instructions have - we won't have you write HCL code - Y86 stages - this a way our textbook organizes the CPU - right now: they don't mean anything physically - fetch (should be called "fetch and decode"): read instructoin memory compute instruction length (valP -- next instructoin addr) split instruction - decode (should be called "register read") read the register file - execute perform an ALU operation - memory perform a data memory operation ^^^^^^^ --- setup the inputs so read happens or write will happen - writeback write the register file ^^^^^ --- set the inputs so the write will happen - PC update write the PC ^^^^^ --- setup the PC register input - push/pop/call/ret in SEQ stages - fetch: - decode: read RSP and maybe another register ^^^^^ push - execute compute the new RSP - memory --- setup inputs to the data memory read the stack at the OLD RSP ^^^^^^^ address NOT from ALU write the stack at the NEW RSP ^^^^^^^ adddress is from ALU if call: PC + 9 - writeback --- setup inputs to the register file write RSP and maybe another register ^^^^^ pop popq %rax --> writes %rax and %rsp - PC update if ret: PC = data memory output if call: PC = immediate from instruction otherwise: normal - when does the clock matter - writes to registers (any kind), memory happen at rising edge of clock "end of the cycle" - everything else happens as inputs are available - Y86 instructoins and register file, memory inputs - reading: - figure out all the registers we need to read - NOTE: not necessairily all the register numbers in the instructoin - e.g.: %rsp for call/ret/push/pop - e.g.: irmovq doesn't need to read any register - make sure all the corresponding register numbers are input to the register file (HCL: reg_srcA, reg_srcB) - corresponding 64-bit outputs are those values IN THE SAME CLOCK CYCLE (HCL: reg_outputA, reg_outputB) - data memory: - figure out the address (usually ALU result) - input the address (mem_addr), read the 64-bit output (mem_output) IN THE SAME CLOCK CYCLE - writing - figure out register #s we need to write - make sure the corresponding register file inputs are set to them (reg_dstE, reg_dstM) - set 64-bit value input to the value to write (e.g. ALU result for add) (reg_inputE, reg_inputM) - data memory: - set write enable input to 1 - figure out the address - input the address(mem_addr) and the 64-bit value(mem_input) - HCL tracing on most recent quiz register xY { a : 32 = 0; b : 32 = 0; } x_a = Y_a + Y_b; x_b = Y_b + 1; cycle | Y_a Y_B x_a x_b 1 | 0 0 0 1 --- rising edge of the clock happens -- 2 | 0 1 1 2 --- 3 | *1* 2 - encoding/decoding Y86 - we don't memorize the figure - the immediates need to appear in themachine code - the register numbers need to appear in the machine code - we try to have the in the same place for each instructoin - always use byte 0 to give icode + function code - Q9 S2017 - Y86 program and machine code -- outputs - endianness from mrmovq (and generally) 01 00 | vv vv 0x000: 30[f2 01 00 00 00 00 00 00]00 | irmovq $1, %rdx %rdx = 1 0x00a: 50 02 00 00 00 00 00 00 00 00 | mrmovq 0(%rdx), %rax ^ ^ %rax <- memory @ %rdx + 0 = memory @ addr 1 8 bytes 0x014: 00 | halt f2 is least significant 01 is second least sigiicant, etc. 0x000000000001F2 --> 0x1F2 - mrmovq and rB - ISA choice: **either D(rA) and D(rB) to compute memory locations** OR some memory instructions write to rA, some write to rB - ISA tradeoffs - is it easier to implement in some way? - is it easier for assemblers/compilers? - big set of tradeoffs: RISC [easier to implement] <------> CISC [closer to software needs] [but less knowledge of what code is actually doing --- less opportunties for special optimizations] - other tradeoffs: what kind of HW implementations? - lots of registers --- more HW but maybe faster? - variable-lenght instructions --- more complicated HW but maybe less space for machine code? - what is a microarchitecture - a particular implementation - example: SEQ is one microarchitecture for Y86 - chooses one cycle/instruction - chooses to use a register file with a REG_NONE option - later on, we'll have PIPE --- a different Y86 microarch - chooses ~five cycle/instructoin (but in parlllel) - could have made microarchiecture that does multiple cycles e.g. read one register/cyle - casting char to int and comparing - x == y --> convert both to the same type (if large enough) - rule of thumb: when in doubt, cast both to the same type - (int) 0xFFFFFFFF --> negative - typedef struct - struct Foo { }; struct Foo x; --- declares 'x' Foo x; -- ILLEGAL - typedef struct Foo { } Bar; struct Foo x; --- declares 'x' Bar x; --- declares 'x' (same type) struct Bar x; --- ILLEGAL Foo x; --- ILLEGAL - typedef struct { } Bar; Bar x; - typedef struct Foo { Foo * next; } Bar; --- ILLEGAL - typedef struct Foo { Bar * next; } Bar; --- ILLEGAL (Bar not declard in time) - typedef struct Foo { struct Foo * next; } Bar; - object file versus execeutable and linking - assembly file: instruction and register names labels - object file: machine code (not individual instructions) labels locations (in the file) of labels defined "symbol table" locations (in the file) of labels used "relocations" - executable machine code w/ labels used replcaed with actual memory addresses - bit masks