- compilation pipeline file contents
    - C/high-level language ---> 
    - assembly [lists each instruction, in a human-readable-ish way] 
        with placeholders for things from other files
        and names for things other files might want
    - object file -->
        machine code: actual bytes for the instructions that the HW wants
        actual bytes for data [constants, etc.]
        but with placeholders for addresses that we don't know yet
            found in other files
                "relocation entries"
            don't know where this file will be in memory
        indications of addresses other files might want
            symbol table
                e.g. main is here
                e.g. printf starts here
    - executables
        bytes ready to be loaded into memory to run the program
            linker combines relocations (addresses to fill in) and symbol table entries
                (where is something found in a file) to produce executable ready to run
        don't need names from symbol tables/relocatoin entries anymore
            but we might have them around for debugging

- Spring 2018 Q9 encoding of pushq
    pushq in Y876:
        0xA0  [rA][F]

    why don't we instead do
        0xA[rA]

    options from Q about what this does:
        * harder to compute instruction lengths? --- no
            b/c this is exactly what we do for nop/ret versus other instructions
        * additional MUXes or MUX inputs/other logic between instruction memory and register file? --- yes
            inputs to the register file:
                four register numbers (srcA, srcB, dstE, dstM) usually come from instr memory
                when running push: srcA = rA, srcB = rsp; dstE = rsp
                                          ^^----- part of the second byte of th einstruction
                now we need to get rA from somewhre else (first byte instead of second)
                    --> change the MUX input?
                            can't just change it because add, etc. still need the second byte
        * additional kind of ALU operation? --- no
            ALu is just sub from RSP
        * register file with more inputs? --- no
            b/c we can use a MUX instead


- Fall 2018 Q5, 6 about machine code simulation
0x000: 30 f4 30 00 00 00 00 00 00 00 | irmovq $0x30, %rsp 
                                                                RSP = 0x30
0x00a: 30 f0 25 00 00 00 00 00 00 00 | irmovq $0x25, %rax 
                                                         RAX = 0x25
          ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^
          8 bytes interpreted based on little-endian
            0Xf0 is the least significant byte
            0x25 is the next least significant
            every other byte is 0x00
            0x25f0 -- final value of RAX

0x014: 61 04                         | subq %rax, %rsp 
                                                                RSP = RSP - RAX
                                                                RSP = 0x30 - 0x25 = 11 (0x0B)
0x016: b0 0f                         | popq %rax 
                                                              { RAX <- M[RSP] = M[0x0B] --- loading 8 bytes 
                                                                                ^^^^^^--- 0x25f0 (8 bytes, little end)
                                                              { RSP <- RSP + 8 = 0x0B + 8 = 19 (0x13)
                                                                                                ^^^^
0x018: 00                            | halt 
- Spring 2018 Q3 about condition codes
0x000: 63 00                         | xorq %rax, %rax           <-- sets CC
                                            RAX <- RAX xor RAX = 0
0x002: 50 30 0b 00 00 00 00 00 00 00 | mrmovq 0xB(%rax), %rbx
                                  ^^
                                            RBX <- M[0xB + RAX] = M[0xB] = 0x306200 = 0x0000000000306200
0x00c: 62 30                         | andq %rbx, %rax           <-- sets CC
       ^^ ^^
                                            RAX <- RBX bitwise-and RAX = 0
                                                ZF = 1 iff the result is 0 -->        ZF = 1
                                                SF = 1 iff the result is negative --> SF = 0
0x00e: 00                            | halt
       ^^ ^^ ^^ ^^ ^^

- difference between testq and cmpq

        testq %rax, %rbx similar to
            
                pushq %rbx
                andq %rax, %rbx
                        computes rax AND rbx = result
                            ZF = 1 iff the reuslt (rax AND rbx) is 0
                            ...
                popq %rbx

        or "andq %rax, %rbx  but only set condition codes (don't touch %rbx)"

        cmpq %rax, %rbx similar to

                pushq %rbx
                subq %rax, %rbx
                popq %rbx
        
        or "subq %rax, %rbx  but only set condition codes (don't touch %rbx)"

        
    common uses of test --- figure out if value is zero or negative

            testq %rax, %rax 
                    computes rax AND rax = rax
                        ZF = 1 iff the result(rax) is 0
                        SF = 1 iff the result(rax) is negative
            je only_if_rax_is_zero
                    jump if equal (based on subtractoin)
                    jump iff ZF = 1
            jl only_if_rax_is_negative
                    kump if less than (bsaed on subtraction)
                    jump iff SF = 1


            cmpq %rax, %rbx
                rbx - rax == 0  --> je
                rbx - rax < 0 -->   jl

- when in clock cycle do things happen
    - some operations are triggered by our clock signal (synchornizing signal)
        - writes to registers (incl cond codes), register files, memories on rising edge of clock
    - other operations happen as inputs are available
        - register file can't read the correct register until it gets that as input
        - it can't get it as input until instruction mmemory reads the instruction
            - physical dependencies determine order
    - components take a certain amount of time once they get inputs to produce stable outputs

    [note: not determined by how we divide things into stages
        in the single-cycle CPU, but there might a correlation]


- pipelining throughput/latency
    - throughput: instructions per amount of time
    - latency: time from when instruction starts until when it finishes

        [------ 1000 ps of computation ------]

        single-cycle 

        
        [---- 600 ps -------][register][---- 400 ps ----]

        two-stage pipeline
            latency: 600 ps + however long the register takes + 400 ps
            throughput: every 600 ps + however long the register takes we can insert a new instruction into
                        the first part

                        every 600 ps + however long the register takes, we'll get a new result from
                        the sceond part
            1 instruction / (600 ps + however long the register takes) --> throughput


-----

- little endian versus big endian on exams
    when I write a number in English, I don't write 001 to mean one hundred
    when we write a number on an exam, we don't write 001 to mean one hundred
                                        or 0x001 to mean 1 * 16^3

    endianness matters when we divide numbers into parts explicitly
        when we just write it out in English, we rely on the English conventoin that the
            most significant part is leftmost
    so doesn't matter if it's just one nunber written down
    but if we write a sequence of numbers to mean one number, you have to ask "how is the sequence ordered?"
        this is where endianness matters

- F2018 question 9 about adding instr to SEQ
    adding pop2q rA, rB to singl-ecycle while still being single-cycle
        rA <- 64 bits from mmeory at RSP
        rB <- next 64 bits fro mmemory at RSP
        RSP <- RSP + 16
        [*] modifying the data memory to read 128-bits --> yes
        [-] modifying the registrer file to read three registers at a time --> no, onyl need RSP
        [*] modifying the register fle to write three registers at a time --> yes, RSP, rA, and rB
        [-] increasing the size of each register in the register file to 128 bts --> no

- F2017 question 2 circuits and combinatorial logic and clock sigs
    [drawing]

- arrays and strings in C
    arrays are kinda like pointers, but:
        you can't assign to them (unlike pointers)
        and their sizeof() is different (the size of the actual data in the array, not the address of it)

    pointer arithmetic
        add X to a pointer --> add X * sizeof(what the pointer points to) to the address
        means that array[X] --> *(array + X)
            take the address of the beginning of the array
            add X times the sizeof each array element to that
            then access memory there

    strings in C
        string constants are character arrays with a 0 entry at the end
        printf("Hello") roughly the same as

            const char unnamed_constant[6] = {'H', 'e, 'l', 'l', 'o', '\0'};
            ...
            printf(unnamed_constant); /* same as printf(&(unnmaed_constant[0])); */

                sizeof(unnamed_constant) == 6 [bytes]
                    based on how it is declared

            const char *pointer = unnamed_constant;

                sizeof(pointer) == 8 [bytes] --- b/c that's an address
                    based on how pointer is declared

        int array[4] = {1,2,3,4};
        sizeof(array) = sizeof(int) * 4 (16 on our machines)

        array[4] = 5; // <-- undefined behavior


        char not_terminated[5] = {'H', 'e', 'l', 'l', 'o'};

        printf(not_terminated) // <-- undefined behavior
                        // because printf is accessing not_terminated[5] which is out of bounds

----

- undefined behavior and the question on quiz 2

    ~ definitoin is "compiler can do whatever it wants"
    ~ compiler producing different result for computation every time
        this is likely for some types of undefined behavior
        example:
            accessing an uninitialized value
                "easiest" thing for compiler to do --- use whatever's in that variable's memory/register location
                could be different each time (new addresses each time your program is run)
                    e.g. your stack might be at a different place each time
            out-of-bounds of an array
                "easiest" thing for compiler to do --- access whatever's in memoyr after the array
                could be different each time (new addresses each time your program is run)
                
---

- Fall 2018 Q4 -- calculation by ALU in textbook's design
    [-] jmp --- no, we just change the PC, don't care what the ALU does
    [*] pushq --- need to cmpute the new stack pointer
    [*] mrmovq --- need to compute the address to pass to the data memory
    [*] call --- need to compute the new stack pointer

----
- computed jump in Y86
    not really a thing, but...

        pushq %rax
        ret

        computed jump to %rax (but changes the stack)