- topic list
    - my recommendation: look at lecture topics
        (non-exhaustive)
        - intro (powers of two, why C), compilation pipeline
        - assembly (ATT syntax, lea, condition codes)
        - bitwise
        - C (structs, pointer arithmetic, undefined behavior)
        - ISAs (tradeoffs, RISC/CISC, machine code formats)
        - SEQ 
            * components (register files, registers, memory)
            * MUXes and how they're used
            * conceptual stages
        - pipelining (throughput, latency, idea of dividing up a circuit with regs)

    - things that we don't expect you to memorize
        - any opcodes
    - HCLRS: you may have read HCLRS to describe circuit

- structs
    - structs are like classes in C++ but
        --- you have to write "struct" before the name or use a typedef
        --- everything is public and there are no methods


- compilation pipeline
    - C (or other high-level language) -->
      assembly [what instructions, but human-readable] -->
      object files [machine code for instructions and data, but not "linked" with other
                    files yet] --->
      executables [machine code for instructions and data, but with all processing to
                   handle combingin multiple source files done]
- object files
    machine code
    data

    but we're missing some addresses in the machine code:
        for addresses that come from other files: what are we looking for
                "relocations" (e.g. "figure where printf is, then put its address here")
        for addresses that we want to provide to other files
                "symbol table" entries
                    (e.g. the "main" function starts here)
- what contains what`
    assembly file: contains all the instructions (but maybe not enough to figure out
        what the original C code was) and labels (human-readable-ish names)
            while (x > 0) { ... }  and while (x >= 1) { ... } --> same assembly?
    object file: contains all the instructions (but in not human-readable --- just
        the raw bytes), and names that other object files might need in symbol + relocation
        tables
        contains all the data needed to run the code (constants, etc.)
    executable file: contains all the informatoin needed to run the program
        labels and other names not needed, but might be preserved for debugging

- labels
    - stands in for a memory address within instructions
            movq $global_variable, %rax
            movq $0x12345678, %rax if global_variale is located at 0x12345678
    - used to mark where we want a convenient name for a memory address
        'foo: ...' ---> make 'foo' a name for whatever the address of '...' is

    - usually turn into symbol table entries (where defined) and relocation entries
        (where used)

- RISC v CISC
    - Reduced Instruction Set --- simpler instructions
        - RISC: exposing what the hardware can do well to compilers/assembly writers/etc.
            - and spending time optimizing simple instruction execution
                e.g. more registers instead of more complex instructions
        - CISC: exposing what is convenient for assembly/compiler writers
    - in practice: almost never clear is architecture is really CISC or RISC
        e.g. Y86:
            RISC-like:
            * small number of addressing modes (ways of specifying operands)
            * small set of instructions
            * don't have any loop-like operations in in instructions
            * don't access data memory more than once in an instruction
            * don't require two+ ALU operations/instruction
            CISC-like:
            * push, pop, call, ret

- machine code format
    - Y86 general format:
        [opcode byte][register byte][immediate]
        ^^^^^^^^^^^^^
        |
        icode -- primary opcode
        ifun ("function code") -- secondary opcode
            distinguished between families of related instructions:
                cmovl, cmovg, rrmovq, ...
                jl, jg, jle, jmp, ...
                add, sub, xor, ...

        register byte --- only included if instructoin uses registers
            contains rA and/or rB (depending on the instruction)

        immediate value --- only included if the instruction uses an constant
            8-byte little endian constant representing
                * a consytant (irmovq, rmmovq, mrmovq)
                * an address (jmp, call)

    - variable length because some instructions omit register byte/immediate value
    - each field is in the roughly the same place in each instruction
        e.g. wire up same values to register file register number inputs for many instructions

- C to assembly with dereferencing
    - C pointer is represented in assembly by a register/something containing address
        int *p = &x;  // p is pointer a 4-byte int
        p might be stored in %rax as the address of 'x'.
    - adding/subtracting to a pointer add/subs from the address, but
        in units of whatever the pointer points to:
        p += 4; --> add $(4 * sizeof(int)), %rax
    - derefernecing a pointer computes the appropriate address, then uses it
        *p = 10; --> movl $10, (%rax)
                        ^--- 4 byte value
        *(p + 4) = 10 -->
            movq %rax, %r8
            addq $(4 * sizeof(int)), %r8  // add 16
            movl $10, (%r8)

            -or-

            movl $10, 16(%rax)

        *(p + x) = 10; --->  say x is in  %rcx
            movq %rax, %r8
            movq %rcx, %r9
            imulq $4, %r9
            addq %r9, %r8  /// rax + rcx * 4
            movl $10, (%r8)

            -or-
            movl $10, (%rax, %rcx, 4)

- testq
        testq %rax, %rbx  is the same as

        pushq %rbx
        andq %rax, %rbx      // to set condition codes
        popq %rbx

        or "and, but only set conditoin codes (don't change normal registers)"

        typical use is for testing positive/negative/zero

            testq %rax, %rax  // ZF = 1 iff %rax = 0; SF = 1 iff %rax is negative
            je somewhere    // je -- jump if ZF = 1 (subtraction yielded zero --> equal)
                                // jump if %rax was zero
            jl somewhere    // jl -- jump if ZF = 0 and SF =1
                                // jump if %rax was negative

- cmp A, B   computed B - A, and if B - A < 0, then jl will happen

- conditional mov instructions
    
        cmovXX

        cmovg %rax, %rbx --> conditional move if greater than
            "if the flags say 'greater than, then copy %rax into %rbx"

                      jle after_mov
                      rrmovq %rax, %rbx
            after_mov:

- when condition codes are set == the value becomes what the operation result says
    - cmp, test --- special cases
    - most arithmetic --- including all the "OPq" instructions on Y86
    - not set by mov, lea, push, pop, etc.
    - not set by jumps, etc.
    - generally, we won't ask about really tricky cases b/c that's what Intel's manual is for
    - not cleared (unless you deliberately do arithmetic/etc. to do that)

- register updating and clock cycles (single-cycle)
    - before rising edge, inputs to each register have a value
    - after rising edge, outputs reflect the input just before rising edge
            "copy input to output on rising edge"
            then wait for new input
- order in single-cycle / stages and timing in single-cycle
    - timing in single-cycle:
        - writes (registers, memories) happen at the rising edge of the clock
            single synchronizatoin point
            causes us to move on to next instruction b/c the PC is a register
        - everything else happens as values are available
            based on dependencies
                can't read correct register until instruction memory outputs what
                the register number is
    - stages in single-cycle:  fetch / decode / execute/ memory / writeback / PC update
        - sometimes correspond with timing, but not as a rule
        - organizing the processor
- stages in pipelined: fetch (+PC update) /decode/ execute/ memory/ write
    compute PC early so we can fetch the next instruction immediately after

- delays in physical hardware
    - registers take some amount of time
        write is not actually instaneous, so "register delay" for it to happen
            (turns out the register delay is before+after rising edge)

    - other components need time
        e.g. adder can't instaneously compute the sum of two numbers
            need to input the correct values for a while to get the result


----
- call/ret
        x86/Y86
            call foo --->   i
            
                            pushq $after_call  // "the return address"  
                                // decrement the stack pointer
                            jmp foo
                after_call:

            ret     --->
                            popq tepmorary-location // "the return address"
                                // increment the stack pointer
                            jmp *temporary-location // go back there

- ISA versus microarchitecture
    - ISA: interface from the point of view of assembly + assembler/linker writers
        in order to know what programs will do
                informed by possible implementations, but doesn't assume any particular one
                
    - microarchitecture: a particular implemnetation strategy
        usually ISA "suggests" some parts of the microarchitecture
        but still tons of flexibility:
            e.g. single-cycle versus pipelined
            e.g. how long do instructoins take in cycles?
                maybe don't do single-cycle, so you can make pop take more time
                in order to have only one write port on the registe file