- covered topics [POSSIBLY INCOMPLETE LIST]
    - early in the semester:
        C
        assembly
        Y86 and ISAs (RISC/CISC/etc.)
        bitwise operators
        SEQ
        Pipelining and PIPE
        Caching
        Locality in caches
        Cache blocking
    - later in semeseter (most of exam)
        Optimizations
            Loop unrolling, etc.
            Multiple accumulators
            Inlining
            Out-of-order processors / Functoinl units
        Exceptions
        Signals
        Processes
        Virtual Memory

- S2017 Q22 (page faults)
- lsat quiz, Q1 (TLB tag?) 
    - [    VPN   ][  PO  ]
      [ TLB ][TLB]
        tag   idx
             ^^^^ (what TLB set to use?)
    # TLB sets = 32 entry TLB / (4 entries/set) -->
        8 sets --> 3 set index bits
    0x00CAFECAFE 
        - remove the page offset (16KB --> 14 bits)
          and get the VPN
        - look at bottom 3 bits (TLB index bits)

- last quiz, Q2 (page table entry add)
    - [ VPN     ][  PO  ]
      [ VPN][VPN]
        pt1  pt2
       ^^^^

       4096 entries --> 12 bits to determine which entry
       4 byte page table entries

       used for first-level lookup
        - extract VPN pt 1 (12 bits)
            first 12 bits of the 38-bit VA
            0012
            [00]00 0000 0001 0010
             ^^ ***************
             ^^
             not part of virtual address
                (b/c it's only 38 bits)

        - VPN pt 1 = 4
        - first page table is an array of entries
            want index of 4
        0x10000 is index 0
        0x10004 is index 1
        0x10008 is index 2
        ...
        0x10010 is index 4

- SEQ instruction knowledge
    - we expect to know what all instructions are
      supposed to do
    - we won't ask questions that depend on whether
      or not ALU is used to add 0, etc.

- cache tag/offset formulas
    - there is a figure in the book

- fork
    - we covered this more in S2017 than this semester
    - won't be on exam


- last quiz -- L1 cache and TLB overlap
  VPN = virtual page number
  PPN = physical page number
  PO = page offset
  CT = cache tag
  CI = cache index
  CO = cache offset

  [VPN    ][   PO] 
    [PPN  ][   PO]
    [CT ][CI ][CO]
         ^^^^^
          needed to lookup the set
            first step of cache lookup
    Question: does the cache index part of
        the page offset or not?
    if yes ---> then we know the cache index
            without knowing the PPN
        [PPN   ][     PO ]
        [CT     ][CI ][CO]
            don't need PPN until later
            (but as if using the physical address)
    if no --> then we don't know the cache index
            until we find the PPN
        [PPN   ][     PO ]
        [CT  ][CI   ][CO ]

    2^(CI + CO) = size of a way
    log_2(size of a way) = # CI + CO bits
    size of a way = cache size / associativity

    32KB direct-mapped cache --> way is 32KB = 2^15 B
        CI + CO = 15 > 14 (PO size)
            CI overlaps with the PPN --> no
    32KB, 2 way --> way is 16KB = 2^14
        CI + CO = 14 <= 14
            CI does not overlap with PPN
    128KB, 4-way --> way is 32KB 

- S2017 Q10 (dividing virtual address)
    8KB pages --> 13 page offset bits
    32-bit VAs
    28-bit PAs
    32 KB cache --- can we overlap the accesses?
        [ PPN ][  PO   ]
        [CT    ][CI][CO]
    what associativty do we need?
        1-way associativity --> each way is 32 KB --> CI + CO = 15 not <= 13
        2-way                               16 KB --> CI + CO = 14 not <= 13
        4-way                                                   13 is <= 13
    (double associativity --> remove an index bit)
        
- S2017 Q11
    8KB pages --> 13 page offset bits
    32-bit VAs
    28-bit PAs
    2-level PT, 1KB page tables at level 1, 8KB at level 2
    4-byte page table entries

        1KB first-level page tables with 4B entries
        256 entries in the first-level page tables
        8 bits to identify index of entry 

    Which VAs (from list) share the same second-level PT?
        [    VPN    ][   PO ]

         2^8 = 256 = 1KB/4B
        <- 8 ->< 11 >< 13   > 
        [VPN  ][ VPN][   PO ]
         pt 1    pt2
         ^^^^^
         lookup in first-level page table to find second level table
                ^^^^
                lookup in second-level page table

    Which VAs share the same VPN pt 1 w/ 0x12345678?
        0x123FD000
          ^^ yes
        0x00046678
          ^^ no


- Q23 Q3 --- multi-level PTs
    Multilevel PTs most efficient at representing
        (yes) address spaces with large, contiguous regions of unalllocated memory
            - entry in 1st or 2nd or (other non-last) level table is invalid
            - no later page tables at all 
        (no) address spaces where the allocated addresses are evenly distributed
            - likely don't share the first part of the VPN
            - so, most first/second-level page table entries are valid
            - so, don't get to omit many page tables
                (need most possible second-level page tables)
        (no) address spaces with lots virtual pages using the same physical page
            - no savings because PPNs happen to be the same
        (no) address spaces with smaller pages
            - more page table entries


- S2017 Q12 (signals + fork)
    we won't talk about fork on the exam

- S2017 Q1-2 (bitwise ops and caches)
    64KB 4-way set associative cache w/ 256B blocks
    Q1: C expression to get the tag?
        64KB cache / 4 ways --> 16KB / way --> CI + CO = 14
            C = B * S * E ---  C / E = B * S
                               ^^^^^
                               16 KB
            log_2(C / E) = log_2(B * S) = 
                log_2(B) + log_2(S) = block offset bits + set index bits
            log_2(16 K) = 14

        CO = 8 (256 B blocks)
        64 KB / 256 B = # blocks 
        # blocks / 4 = # sets --> 2^CI

        tag is bits 14 and on
            look at bitwise ops see which
            makes bit 14 become bit 0

            make sure all other higher bits of
            address are shifted similarly
    Q2: expression to get the index?
        CO = 8
        CI = 6

        index --> shift over 8 and select 6 bits
            & 0x3F -- select bottom 6 bits


- S2017 Q18-9 (caching)
    32-byte 2-way set associative cache w/ 4-byte blocks and LRU replacement
        8 blocks --> 4 sets -- 2 set index bits
        4-byte blocks --> 2 block offset bits
        bottom 4 bits determine set + offset
            0 and 16 and 32 and 48 are the same locatoin
            1 and 17 and 33 and 49 are the same locatoin
            (because the share the bottom 4 bits)

    2-byte acceses to decimal address
        0  16  32 2 14 34 12 20 36
    capacity/compulsory/conflict misses?

        get address of block --- multiple of 4
        0  16  32  0  12  32  12  20  36
        0  0   0   0  3   0   3   1   1
compulsory:
        X  X   X      X           X   X --> 6 compulsory misses
conflict:
                   X
                   ^ conflict with 32
                    higher associativity would have prevented

        0   1     2     3     0      1
       [0-3][4-7][8-11][12-15][16-19][20

    set 0: 0[evicted for 32], 16[evicted for 0],  32[H], 0
    set 1: 20, 36
    set 2:
    set 3: 12[H]

    - 16 B cache
        0 2 4 6 8 10 12 14 16 18 0 2 4 6 ...
                                ^^^^^

- vector instructions (what to know)
    - basic idea: one instruction says to do multiple computations
        on matching pairs on values from fixed-sized arrays
    - versus: multiple accumulators/reassociation -- CPU discovers
        multiple parallel computations
            (can be combined with that, too)
    - won't ask about specific instrinsics/instructions without giving you a description
        of them on the exam

- pipelining and what each stage does
    - multiple pipelined CPU designs -- we talked about one
    - fetch --- read instruction, split up
        compute next predicted PC
    - decode --- read registers (probably a bad name)
    - execute --- use ALU, set/read condition codes
    - memory --- use data memory
    - writeback --- write registers