bit puzzles (finish) / ISAs + Y86

## Changelog

10 September: constructing masks: explicitly mention idea of AND'ing

10 September: fully multibit: clarify what ! and ! ! does
12 September: fully multibit: ...and correct typo of $\neq 04$ for $=0$

## last time

bitshifts
logical (0s) and arithmatic (copy sign bit) right shfit left shift
relationship to division, rounding
bitwise operations
array of gates, two bit input, one bit output mask: set certain bits to $1 / 0$

## complement ~ — flip all bits

today: strategies for harder bit-puzzles, including
some tricks with two's complement using bitwise operations to do things in parallel
(...and then about ISAs)

## change to schedule

next week was going to be HCL1 (lab)/HCL 2 (HW)
would likely require rushing lecture somewhat
new assignment on linking + ISA tradeoffs in its place new $=$ I'm less sure about the amount of work being right a lot more manual grading
not finalized yet, will be by Tuesday
(l'd like to have me + my TAs have a chance to review) needed changes - originally planned for later
we'll talk about ISAs+Y86 today + Tuesday

## bit-puzzles

assignments: bit manipulation puzzles
solve some problem with bitwise ops
maybe that you could do with normal arithmetic, comparisons, etc.
why?
good for thinking about HW design good for understanding bitwise ops
unreasonably common interview question type

## simple operation performance

typical modern desktop processor:
bitwise and/or/xor, shift, add, subtract, compare $-\sim 1$ cycle integer multiply — ~ 1-3 cycles integer divide $-\sim 10-150$ cycles
(smaller/simpler/lower-power processors are different)

## simple operation performance

typical modern desktop processor:
bitwise and/or/xor, shift, add, subtract, compare - $\sim 1$ cycle integer multiply — ~ 1-3 cycles integer divide $-\sim$ 10-150 cycles
(smaller/simpler/lower-power processors are different)
add/subtract/compare are more complicated in hardware!
but much more important for typical applications

## note: ternary operator

$$
\begin{aligned}
& w=(x \quad ? y: z) \\
& \text { if }(x)\{\text { w }=y ;\} \text { else }\{w=z ;\}
\end{aligned}
$$

## ternary as bitwise: simplifying

(x ? y : z) if (x) return y; else return z;
task: turn into non-if/else/etc. operators assembly: no jumps probably
strategy today: build a solution from simpler subproblems
(1) with $x, y, z 1$ bit: ( $x$ ? y : 0) and ( $x$ ? 0 : z)
(2) with $x, y, z 1$ bit: ( $x$ ? y : z)
(3) with x 1 bit: ( x ? y : z)
(4) (x ? y : z)

## one-bit ternary

(x ? y : z)
constraint: $x, y$, and $z$ are 0 or 1
now: reimplement in $C$ without if/else/ ||/etc.
(assembly: no jumps probably)

## one-bit ternary

(x ? y : z)
constraint: $x, y$, and $z$ are 0 or 1
now: reimplement in C without if/else/ ||/etc.
(assembly: no jumps probably)
divide-and-conquer:
$\begin{array}{lllll}(x & \text { ? } & \text { : } & 0) \\ (x & ? & 0 & : & z)\end{array}$

## one-bit ternary parts (1)

constraint: $x, y$, and $z$ are 0 or 1
$(x \quad$ ? $y=0)$

## one-bit ternary parts (1)

constraint: $x, y$, and $z$ are 0 or 1

$$
\begin{aligned}
& \text { (x ? y : 0) } \\
& \begin{array}{l|ll} 
& \mathbf{y}=\mathbf{0} & \mathbf{y = 1} \\
\mathbf{x}=\mathbf{0} & 0 & 0 \\
\mathbf{x}=\mathbf{1} & 0 & 1
\end{array} \\
& \rightarrow(x \& y)
\end{aligned}
$$

## one-bit ternary parts (2)

$$
(x \quad ? y: 0)=(x \& y)
$$

one-bit ternary parts (2)

```
(x ? y : 0) = (x & y)
(x ? 0 : z)
```

opposite $\mathrm{x}: \sim \mathrm{x}$
$((\sim x) \& z)$
one-bit ternary
constraint: $x, y$, and $z$ are 0 or 1

$$
\begin{aligned}
& (x \quad ? y: z) \\
& (x \text { ? } y: 0) \mid(x \quad ? 0: z) \\
& (x \& y) \mid((\sim x) \& z)
\end{aligned}
$$

## multibit ternary

constraint: x is 0 or 1
old solution ( $(x \& y) \mid(\sim x) \& z)$ only gets least sig. bit (x ? y : z)

## multibit ternary

constraint: x is 0 or 1
old solution ( $(x \& y) \mid(\sim x) \& z)$ only gets least sig. bit ( $x$ ? y : z)
$(x$ ? $y: 0) \mid(x \quad$ ? $0: z)$

## constructing masks

constraint: x is 0 or 1
(x ? y : 0)
turn into y \& MASK, where MASK = ???
"keep certain bits"

## constructing masks

constraint: x is 0 or 1
(x ? y : 0)
turn into y \& MASK, where MASK = ??? "keep certain bits"
if $x=1$ : want 1111111111...1 (keep $y$ )
if $x=0:$ want $0000000000 \ldots($ want 0$)$

## constructing masks

constraint: x is 0 or 1
(x ? y : 0)
turn into y \& MASK, where MASK = ??? "keep certain bits"
if $x=1$ : want 1111111111...1 (keep $y$ )
if $x=0:$ want $0000000000 \ldots$... $($ want 0$)$
a trick: $-x$ ( -1 is 1111 ... 1 )

## constructing other masks

constraint: x is 0 or 1

```
(x ? 0 : z)
if }x=\mathbb{K}0:\mathrm{ want 1111111111...1
if }x=\1: want 0000000000...0 
mask: -> 
```


## constructing other masks

constraint: x is 0 or 1

$$
\begin{aligned}
& (x \quad ? 0: z) \\
& \text { if } x=\mathbb{K} 0: \text { want } 1111111111 \ldots 1
\end{aligned}
$$

if $x=1$ : want $0000000000 \ldots 0$
mask: $-x-\left(x^{\wedge} 1\right)$
multibit ternary
constraint: x is 0 or 1
old solution ( $(x \& y) \mid(\sim x) \& z)$ only gets least sig. bit

$$
\begin{aligned}
& (x \text { ? } y: z) \\
& (x \text { ? } y: 0) \mid(x \text { ? } 0: z) \\
& ((-x) \& y) \mid((-(x \wedge 1)) \& z)
\end{aligned}
$$

## fully multibit

constraint. $x$ is 0 or 1
( $x$ ? y : z)

## fully multibit

constraint. $x$ is 0 or 1

$$
\begin{aligned}
& (\mathrm{x} ? \mathrm{y}: \mathrm{z}) \\
& \text { easy } C \text { way: }!\mathrm{x}=1(\text { if } x=0) \text { or } 0,!(!\mathrm{x})=0 \text { or } 1
\end{aligned}
$$

x86 assembly: testq \%rax, \%rax then sete/setne (copy from ZF)

## fully multibit

constraint. $x$ is 0 or 1
( x ? y : z)
easy $C$ way: ! $\mathrm{x}=1$ (if $x=0$ ) or $0,!(!\mathrm{x})=0$ or 1
x86 assembly: testa \%rax, \%r ax then sete/setne (copy from ZF)
$(x$ ? $y: 0) \mid(x \quad ? 0: z)$
$((-!!x) \& y) \mid((-!x) \& z)$

## problem: any-bit

is any bit of x set?
goal: turn 0 into 0 , not zero into 1
easy $C$ solution: ! (! (x))
another solution if you have - or + (bang in lab)
what if we don't have ! or - or +

## problem: any-bit

is any bit of $x$ set?
goal: turn 0 into 0 , not zero into 1
easy $C$ solution: ! (! (x)) another solution if you have - or + (bang in lab)
what if we don't have ! or - or +
how do we solve is $x$ is, say, four bits?

## problem: any-bit

is any bit of $x$ set?
goal: turn 0 into 0 , not zero into 1
easy $C$ solution: ! (! (x)) another solution if you have - or + (bang in lab)
what if we don't have ! or - or +
how do we solve is $x$ is, say, four bits?

$$
((x \& 1)|((x \gg 1) \& 1)|((x \gg 2) \& 1) \mid((x \gg 3) \& 1))
$$

## wasted work (1)

$((x \& 1)|((x \gg 1) \& 1)|((x \gg 2) \& 1) \mid((x \gg 3) \& 1))$
in general: $(x \& 1) \mid(y \& 1)==(x \mid y) \& 1$
distributive property

## wasted work (1)

$((x \& 1)|((x \gg 1) \& 1)|((x \gg 2) \& 1) \mid((x \gg 3) \& 1))$
in general: $(x \& 1) \mid(y \& 1)=(x \mid y) \& 1$
distributive property

$$
(x|(x \gg 1)|(x \gg 2) \mid(x \gg 3)) \& 1
$$

## wasted work (2)

4-bit any set: $(x|(x \gg 1)|(x \gg 2) \mid(x \gg 3)) \& 1$ performing 3 bitwise ors
...each bitwise or does 4 OR operations


## wasted work (2)

4-bit any set: $(x|(x \gg 1)|(x \gg 2) \mid(x \gg 3)) \& 1$
performing 3 bitwise ors
...each bitwise or does 4 OR operations
but only result of one of the 4!


## any-bit: looking at wasted work



$$
y=(x \mid x \gg 1)
$$

## any-bit: looking at wasted work



$$
\left(0 \mid x_{3}\right) \quad\left(x_{3} \mid x_{2}\right) \quad\left(x_{2} \mid x_{1}\right) \quad\left(x_{1} \mid x_{0}\right) \quad \mathrm{y}=(\mathrm{x} \mid \mathrm{x} \gg 1)
$$

## any-bit: looking at wasted work



$$
\left(0 \mid x_{3}\right) \quad\left(x_{3} \mid x_{2}\right) \quad\left(x_{2} \mid x_{1}\right) \quad\left(x_{1} \mid x_{0}\right) \quad \mathrm{y}=(\mathrm{x} \mid \mathrm{x} \gg 1)
$$

final value wanted: $x_{3}\left|x_{2}\right| x_{1} \mid x_{0}$ previously:

$$
\begin{aligned}
& \text { compute } \mathrm{x} \mid(\mathrm{x} \gg 1) \text { for } x_{1} \mid x_{0} \text {; } \\
& (\mathrm{x} \gg 2) \mid(\mathrm{x} \gg 3) \text { for } x_{3} \mid x_{2}
\end{aligned}
$$

observation: got both parts with just $x \mid(x \gg 1)$

## any-bit: divide and conquer



## any-bit: divide and conquer

four-bit input $x=x_{3} x_{2} x_{1} x_{0}$

$\mathbf{x} \mid(\mathrm{x} \gg 1)=\left(x_{3} \mid 0\right)\left(x_{2} \mid x_{3}\right)\left(x_{1} \mid x_{2}\right)\left(x_{0} \mid x_{1}\right)=y_{1} y_{2} y_{3} y_{4}$

## any-bit: divide and conquer

four-bit input $x=x_{3} x_{2} x_{1} x_{0}$

$\mathrm{x} \mid(\mathrm{x} \gg 1)=\left(x_{3} \mid 0\right)\left(x_{2} \mid x_{3}\right)\left(x_{1} \mid x_{2}\right)\left(x_{0} \mid x_{1}\right)=y_{1} y_{2} y_{3} y_{4}$
$\mathrm{y} \mid(\mathrm{y} \gg 2)=\left(y_{1} \mid 0\right)\left(y_{2} \mid 0\right)\left(y_{3} \mid y_{1}\right)\left(y_{4} \mid y_{2}\right)=z_{1} z_{2} z_{3} z_{4}$
$z_{4}=\left(y_{4} \mid y_{2}\right)=\left(\left(x_{2} \mid x_{3}\right) \mid\left(x_{0} \mid x_{1}\right)\right)=x_{0}\left|x_{1}\right| x_{2} \mid x_{3}$ "is any bit set?"

## any-bit: divide and conquer

four-bit input $x=x_{3} x_{2} x_{1} x_{0}$

$\mathbf{x} \mid(\mathrm{x} \gg 1)=\left(x_{3} \mid 0\right)\left(x_{2} \mid x_{3}\right)\left(x_{1} \mid x_{2}\right)\left(x_{0} \mid x_{1}\right)=y_{1} y_{2} y_{3} y_{4}$
$\mathrm{y} \quad \mid \quad(\mathrm{y} \gg 2)=\left(y_{1} \mid 0\right)\left(y_{2} \mid 0\right)\left(y_{3} \mid y_{1}\right)\left(y_{4} \mid y_{2}\right)=z_{1} z_{2} z_{3} z_{4}$
$z_{4}=\left(y_{4} \mid y_{2}\right)=\left(\left(x_{2} \mid x_{3}\right) \mid\left(x_{0} \mid x_{1}\right)\right)=x_{0}\left|x_{1}\right| x_{2} \mid x_{3}$ "is any bit set?"
unsigned int any_of_four(unsigned int $x$ ) \{ int part_bits $=(x \gg 1) \mid x ;$ return ((part_bits >> 2) | part_bits) \& 1;

## any-bit: divide and conquer



## any-bit-set: 32 bits

unsigned int any(unsigned int $x$ ) \{

$$
x=(x \gg 1) \mid x ;
$$

$$
x=(x \gg 2) \quad x
$$

$$
x=(x \gg 4) \mid x ;
$$

$$
x=(x \gg 8) \mid x ;
$$

$$
x=(x \gg 16) \mid x ;
$$

return x \& 1;
\}

## bitwise strategies

use paper, find subproblems, etc.
mask and shift

$$
(x \& 0 x F 0) \gg 4
$$

factor/distribute

$$
(x \& 1) \mid(y \& 1)==(x \mid y) \& 1
$$

divide and conquer
common subexpression elimination

$$
\begin{aligned}
& \text { return }((-!!x) \& y) \mid((-!x) \& z) \\
& \text { becomes } \\
& d=!x ; \operatorname{return}((-!d) \& y) \mid((-d) \& z)
\end{aligned}
$$

exercise
Which of these will swap last and second-to-last bit of an unsigned int $x$ ? (bits uvwxyz become uvwxzy)

```
/* version A */
    return ((x >> 1) & 1) | (x & (~1));
/* version B */
    return ((x >> 1) & 1) | ((x << 1) & (~2)) | (x & (~3));
/* version C */
    return (x & (~3)) | ((x & 1) << 1) | ((x >> 1) & 1);
/* version D */
    return (((x & 1) << 1) | ((x & 3) >> 1)) ^ x;
```


## version A

/* version A */

$$
\begin{aligned}
& \text { return ((x >> 1) \& 1) | (x \& (~1)); } \\
& \text { // ^^^^^^^^^^^^^^^ } \\
& \text { // uvwxyz --> 0uvwxy -> 000000y } \\
& \text { // uvwxyz --> uvwxy0 } \\
& \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \\
& \text { // 00000y | uvwxy0 = uvwxyy }
\end{aligned}
$$

## version B

/* version B */

$$
\begin{array}{ll}
\text { return }((x \gg 1) \& 1)|((x \ll 1) \&(\sim 2))|(x \&(\sim 3)) ; \\
/ / & \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \\
/ / & \text { uvWXyz --> } 0 \text { \& }
\end{array}
$$

// uvwxyz --> $\quad$ vwxyzo --> $\quad$ vwxy00
// ^^^^^^^^^
// uvwxyz --> uvwx00

## version C

/* version C */

| return | ( x \& ( $\sim 3)$ ) | $((x \& 1) \ll 1)$ | $((x \gg 1) \& 1) ;$ |
| :---: | :---: | :---: | :---: |
| // | ィ^^^^^^^^^^ |  |  |
| // | uvwxyz --> | uvwx 00 |  |


$\wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge$
// uvwxyz --> 0uvwxy --> 00000y

## version D

/* version D */

$$
\begin{array}{ll}
\text { return }(((x \& 1) \ll 1) \mid((x \& 3) \gg 1)) \wedge x ; \\
/ / & \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \\
/ / ~ & \text { uVWxyz --> } 00000 z-->0000 z 0
\end{array}
$$

$\wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge$<br>uvwxyz --> 0000yz --> 00000y

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// 0000zy ^ uvwxyz --> uvwx (z XOR y) (y XOR z)

## expanded code

```
int lastBit = x & 1;
```

int secondToLastBit $=x$ \& 2;
int rest $=x$ \& ~3;
int lastBitInPlace = lastBit $\ll 1$;
int secondToLastBitInPlace = secondToLastBit >> 1;
return rest | lastBitInPlace | secondToLastBitInPlace;

## ISAs being manufactured today

(ISA = instruction set architecture)
x86 - dominant in desktops, servers
ARM - dominant in mobile devices
POWER - Wii U, IBM supercomputers and some servers
MIPS - common in consumer wifi access points
SPARC - some Oracle servers, Fujitsu supercomputers
z/Architecture - IBM mainframes
Z80 - TI calculators
SHARC - some digital signal processors
RISC V — some embedded

## microarchitecture v. instruction set

microarchitecture - design of the hardware
"generations" of Intel's x86 chips
different microarchitectures for very low-power versus laptop/desktop changes in performance/efficiency
instruction set - interface visible by software what matters for software compatibility many ways to implement (but some might be easier)

## exercise

which of the following changes to a processor are instruction set changes?
A. increasing the number of registers available in assembly
B. decreasing the runtime of the add instruction
C. making the machine code for add instructions shorter
D. removing a multiply instruction
E. allowing the add instruction to have two memory operands (instead of two register operands))

## instruction set architecture goals

exercise: what are some goals to have when designing an instruction set?

## ISA variation

| instruction set | instr. <br> length | $\#$ normal <br> registers | approx. <br> $\#$ instrs. |
| :--- | :--- | :--- | :--- |
| x86-64 | $1-15$ byte | 16 | 1500 |
| Y86-64 | $1-10$ byte | 15 | 18 |
| ARMv7 | 4 byte* | 16 | 400 |
| POWER8 | 4 byte | 32 | 1400 |
| MIPS32 | 4 byte | 31 | 200 |
| Itanium | 41 bits* | 128 | 300 |
| Z80 | $1-4$ byte | 7 | 40 |
| VAX | $1-14$ byte | 8 | 150 |
| z/Architecture | $2-6$ byte | 16 | 1000 |
| RISC V | 4 byte* | 31 | $500^{*}$ |

## other choices: condition codes?

## instead of:

cmpq \%r11, \%r12
je somewhere
could do:
/* _B_ranch if _EQ_ual */
beq \%r11, \%r12, somewhere

## other choices: addressing modes

ways of specifying operands. examples:
x86-64: $10(\% r 11, \% r 12,4)$
ARM: \%r11 << 3 (shift register value by constant)
VAX: ( (\% r 11) ) (register value is pointer to pointer)

## other choices: number of operands

add src1, src2, dest ARM, POWER, MIPS, SPARC, ...
add src2, src1=dest x86, AVR, Z80, ...

VAX: both

## CISC and RISC <br> RISC — Reduced Instruction Set Computer reduced from what?

# CISC and RISC <br> RISC - Reduced Instruction Set Computer reduced from what? 

CISC - Complex Instruction Set Computer

## some VAX instructions

MATCHC haystackPtr, haystackLen, needlePtr, needleLen Find the position of the string in needle within haystack.

POLY $x$, coefficientsLen, coefficientsPtr
Evaluate the polynomial whose coefficients are pointed to by coefficientPtr at the value $x$.

EDITPC sourceLen, sourcePtr, patternLen, patternPtr
Edit the string pointed to by sourcePtr using the pattern string specified by patternPtr.

## microcode

MATCHC haystackPtr, haystackLen, needlePtr, needleLen Find the position of the string in needle within haystack.
loop in hardware???
typically: lookup sequence of microinstructions ("microcode")
secret simpler instruction set

## Why RISC?

complex instructions were usually not faster (even though programs with simple instructions were bigger)
complex instructions were harder to implement
compilers were replacing hand-written assembly
correct assumption: almost no one will write assembly anymore incorrect assumption: okay to recompile frequently

## typical RISC ISA properties

fewer, simpler instructions
seperate instructions to access memory
fixed-length instructions
more registers
no "loops" within single instructions
no instructions with two memory operands
few addressing modes

## ISAs: who does the work?

CISC-like (harder to make hardware, easier to use assembly) choose instructions with particular assembly language in mind? hardware designer provides operations compiler wants

RISC-like (easier to make hardware, harder to use assembly) choose instructions with particular HW implementation in mind? hardware designer exposes what it can do efficiently to compiler

## ISAs: who does the work?

CISC-like (harder to make hardware, easier to use assembly) choose instructions with particular assembly language in mind? hardware designer provides operations compiler wants

RISC-like (easier to make hardware, harder to use assembly) choose instructions with particular HW implementation in mind? hardware designer exposes what it can do efficiently to compiler

## ISAs: who does the work?

CISC-like
less work for assembly-writers more work for hardware
choose assembly, design instructions? harder to build/test CPU design new instrs for target apps?

RISC-like
more work for assembly-writers less work for hardware
design for particular kind of HW? easier to build/test CPU
spend more time optimizing HW?

## backup slides

## registers



## state in Y86-64



## state in Y86-64



## state in Y86-64



## state in Y86-64



## state in Y86-64



## memories

$$
\text { address } \rightarrow \begin{aligned}
& \text { Instr. } \\
& \text { Mem. }
\end{aligned} \rightarrow \text { data }
$$

## memories



## memories



## register file



## register file


register number input register value output

## register file



## register file



## ALUs



## instruction memory in HCL

built-in component
always present, with predefined wires
input wire (address): pc
64-bit value - address to read from
output wire (data): i10bytes
80-bits (size of largest instruction)
little-endian number
generally, can lookup these names on HCLRS README (course website)

## other choices: instruction complexity

instructions that write multiple values?
x86-64: push, pop, movsb, ...
more?

