## bitwise 2 / ISAs

## last time

## right shift

logical (unsigned): shift in zeroes
arithmetic (signed): shift in zero/one depending if negative equiv to division with different rounding
left shift $\approx$ multiplication by power of two
bitwise operators and masks
mask "selects" certain bits
bitwise and: certain bits to keep (1) / clear (0)
bitwise or: certain bits to set (1) / leave unchanged (0)
bitwise xor: certain bits to flip (1) / leave unchanged (0)
bitwise divide and conquer
ternary operator example: $x$ ? y: z
syntax for if $(x)$ y else $z$ in $C / C++/ J a v a$
result $=$ result-or-zero $\mid$ result-or-zero trick
(9am) -1 -> 1111.. 1111 trick

## note: ternary operator

$$
\begin{aligned}
& w=(x \quad ? y: z) \\
& \text { if (x) \{ w }=y ;\} \text { else }\{w=z ;\}
\end{aligned}
$$

## ternary as bitwise: simplifying

( x ? y : z) if (x) return y; else return z;
task: turn into non-if/else/etc. operators assembly: no jumps probably
strategy today: build a solution from simpler subproblems
(1) with $x, y, z 1$ bit: ( $x$ ? $y$ : 0) and ( $x$ ? 0 : $z$ )
(2) with $x, y, z 1$ bit: ( $x$ ? $y$ : $z$ )
(3) with $x 1$ bit: ( $x$ ? y : z)
(4) (x ? y : z)

## one-bit ternary

( x ? y : z) $=$ if $(\mathrm{x})$ y else z
constraint: $x, y$, and $z$ are 0 or 1
now: reimplement in $C$ without if/else/ ||/etc.
(assembly: no jumps probably)

## one-bit ternary

( x ? y : z) = if $(\mathrm{x}$ ) y else z
constraint: $x, y$, and $z$ are 0 or 1
now: reimplement in C without if/else/ | | /etc.
(assembly: no jumps probably)
divide-and-conquer:
$\begin{array}{lllll}(x & \text { ? } & \text { : } & 0 \\ (x & \text { ? } & 0 & : & z)\end{array}$

## one-bit ternary parts (1)

constraint: $x, y$, and $z$ are 0 or 1
( x ? y : 0)

## one-bit ternary parts (1)

constraint: $x, y$, and $z$ are 0 or 1

$$
\begin{aligned}
& \text { (x ? y : 0) } \\
& \begin{array}{l|ll} 
& \mathbf{y}=\mathbf{0} & \mathbf{y = 1} \\
\hline \mathbf{x}=\mathbf{0} & 0 & 0 \\
\mathbf{x}=\mathbf{1} & 0 & 1
\end{array} \\
& \rightarrow(x \& y)
\end{aligned}
$$

## one-bit ternary parts (2)

$$
(x \quad ? y: 0)=(x \& y)
$$

## one-bit ternary parts (2)

$$
(x \quad ? y \quad: 0)=(x \& y)
$$

(x ? 0 : z)
opposite $\mathrm{x}: \sim \mathrm{x}$
(( $\sim x) \& z)$
one-bit ternary
constraint: $x, y$, and $z$ are 0 or 1

$$
\begin{aligned}
& (x \quad ? y: z)=\text { if } x \text { then } y \text { else } z \\
& (x \quad y: 0) \mid(x \quad 0: z) \\
& (x \& y) \mid((\sim x) \& z)
\end{aligned}
$$

## one-bit ternary: evaluating example (1)

constraint: $x, y$, and $z$ are 0 or 1

$$
\begin{aligned}
& (x \quad ? y: z)=\text { if } x \text { then } y \text { else } z \\
& (x \& y) \mid((\sim x) \& z) \\
& x=1, y=0, z=1 \\
& (1 \& 0) \mid((\sim 1) \& 1)= \\
& (1 \& 0) \mid(11 \ldots 1110 \& 00 \ldots 0001)=0
\end{aligned}
$$

## one-bit ternary: not general yet

if $(x) y$ else $z$
constraint: $\mathrm{x}, \mathrm{y}$, and z are 0 or 1
DOES NOT WORK: $x=1, y=4, z=2$
$(1 \& 4) \mid((\sim 1) \& 2)=$
(..0001 \& ...0100) | (11...110 \& 00...0010) $=$
(0) $\mid(000 \ldots 0010)=2($ expected $y$, which is 4$)$

## multibit ternary

constraint: x is 0 or 1
old solution ( $(x \& y) \mid(\sim x) \& z)$ only gets least sig. bit (x ? y : z) (if (x) y else z)

## multibit ternary

constraint: x is 0 or 1
old solution ( ( $x$ \& $y$ ) | ( $\sim x)$ \& $z$ ) only gets least sig. bit

$$
\begin{aligned}
& (x \quad ? y: z)(\text { if }(x) y \text { else } z) \\
& (x \quad y: 0) \mid(x \quad 0 \quad 0: z)
\end{aligned}
$$

## constructing masks

constraint: x is 0 or 1
( x ? y : 0) (if ( x y else 0)
turn into y \& MASK, where MASK = ???
"keep certain bits"

## constructing masks

constraint: x is 0 or 1
( x ? y : 0) (if ( x y else 0)
turn into y \& MASK, where MASK = ??? "keep certain bits"
if $x=1$ : want 1111111111... 1 (keep $y$ )
if $x=0:$ want $0000000000 \ldots 0($ want 0$)$

## constructing masks

constraint: x is 0 or 1

```
(x ? y : 0) (if (x) y else 0)
turn into y & MASK, where MASK = ???
    "keep certain bits"
if }x=1\mathrm{ : want 1111111111...1 (keep y)
if }x=0:\mathrm{ want 0000000000...0 (want 0)
```

a trick: $-x$ ( -1 is 1111 ... 1 )

## constructing other masks

constraint: x is 0 or 1

$$
\begin{aligned}
& (x \quad ? \quad 0: \text { z) (if }(x) 0 \text { else } z) \\
& \text { if } x=\mathbb{K} 0: \text { want } 1111111111 . . .1
\end{aligned}
$$

if $x=1$ : want $0000000000 \ldots$
mask: - -

## constructing other masks

constraint: x is 0 or 1

$$
\begin{aligned}
& (x \quad ?: \quad z)(\text { if }(x) 0 \text { else } z) \\
& \text { if } x=\mathbb{K} 0: \text { want } 1111111111 \ldots 1 \\
& \text { if } x=1: \text { want } 0000000000 \ldots . .0 \\
& \text { mask: }-x-\left(x^{\wedge} 1\right)
\end{aligned}
$$

multibit ternary
constraint: x is 0 or 1
old solution ( $(x \& y) \mid(\sim x) \& z)$ only gets least sig. bit

$$
\begin{aligned}
& (x \quad ? y: z)(\text { if }(x) y \text { else } z) \\
& (x \text { ? y : 0) | }(x \text { ? } 0: z) \\
& ((-x) \& y) \mid((-(x \wedge 1)) \& z)
\end{aligned}
$$

## fully multibit

constraint. xis 0 or 1
( $x$ ? y : z)

## fully multibit

## constraint. $x$ is 0 or 1

$$
\begin{aligned}
& \left(x \quad \mathrm{x}^{\mathrm{y}}: \mathrm{z}\right) \\
& \text { easy } C \text { way: }!\mathrm{x}=1(\text { if } x=0) \text { or } 0,!(!\mathrm{x})=0 \text { or } 1
\end{aligned}
$$

x86 assembly: testq \%rax, \%rax then sete/setne (copy from ZF)

## fully multibit

## constraint. $x$ is 0 or 1

$$
\begin{aligned}
& (\mathrm{x} \cdot \mathrm{y}: \mathrm{z}) \\
& \text { easy } C \text { way: }!\mathrm{x}=1(\text { if } x=0) \text { or } 0,!(!\mathrm{x})=0 \text { or } 1
\end{aligned}
$$

x86 assembly: testq \%rax, \%rax then sete/setne (copy from ZF)

$$
\begin{aligned}
& (x \quad ? y: 0) \mid(x \quad ? 0: z) \\
& ((-!!x) \& y) \mid((-!x) \& z)
\end{aligned}
$$

## problem: any-bit

is any bit of $x$ set?
goal: turn 0 into 0 , not zero into 1
easy $C$ solution: ! (! (x))
another solution if you have - or + (bang in lab)
what if we don't have ! or - or +
more like what real hardware components to work with are

## problem: any-bit

is any bit of $x$ set?
goal: turn 0 into 0 , not zero into 1
easy $C$ solution: ! (! (x))
another solution if you have - or + (bang in lab)
what if we don't have ! or - or + more like what real hardware components to work with are
how do we solve is $x$ is, say, four bits?

## problem: any-bit

is any bit of $x$ set?
goal: turn 0 into 0 , not zero into 1
easy $C$ solution: ! (! (x)) another solution if you have - or + (bang in lab)
what if we don't have ! or - or + more like what real hardware components to work with are
how do we solve is $x$ is, say, four bits?

$$
((x \& 1)|((x \gg 1) \& 1)|((x \gg 2) \& 1) \mid((x \gg 3) \& 1))
$$

## wasted work (1)

$((x \& 1)|((x \gg 1) \& 1)|((x \gg 2) \& 1) \mid((x \gg 3) \& 1))$
in general: $(x \& 1) \mid(y \& 1)==(x \mid y) \& 1$
distributive property

## wasted work (1)

$((x \& 1)|((x \gg 1) \& 1)|((x \gg 2) \& 1) \mid((x \gg 3) \& 1))$
in general: $(x \& 1) \mid(y \& 1)==(x \mid y) \& 1$
distributive property
$(x|(x \gg 1)|(x \gg 2) \mid(x \gg 3)) \& 1$

## wasted work (2)

4-bit any set: $(x|(x \gg 1)|(x \gg 2) \mid(x \gg 3)) \& 1$ performing 3 bitwise ors
...each bitwise or does 4 OR operations


## wasted work (2)

4-bit any set: $(x|(x \gg 1)|(x \gg 2) \mid(x \gg 3)) \& 1$ performing 3 bitwise ors
...each bitwise or does 4 OR operations
but only result of one of the 4 !


## any-bit: looking at wasted work



$$
y=(x \mid x \gg 1)
$$

## any-bit: looking at wasted work



$$
\left(0 \mid x_{3}\right) \quad\left(x_{3} \mid x_{2}\right) \quad\left(x_{2} \mid x_{1}\right) \quad\left(x_{1} \mid x_{0}\right) \quad \mathrm{y}=(\mathrm{x} \mid \mathrm{x} \gg 1)
$$

## any-bit: looking at wasted work



$$
\left(0 \mid x_{3}\right) \quad\left(x_{3} \mid x_{2}\right) \quad\left(x_{2} \mid x_{1}\right) \quad\left(x_{1} \mid x_{0}\right) \quad \mathrm{y}=(\mathbf{x} \mid \mathbf{x} \gg 1)
$$

final value wanted: $x_{3}\left|x_{2}\right| x_{1} \mid x_{0}$ previously:

$$
\begin{aligned}
& \text { compute } \mathrm{x} \mid(\mathrm{x} \gg 1) \text { for } x_{1} \mid x_{0} \text {; } \\
& (\mathrm{x} \gg 2) \mid(\mathrm{x} \gg 3) \text { for } x_{3} \mid x_{2}
\end{aligned}
$$

observation: got both parts with just $x \mid(x \gg 1)$

## any-bit: divide and conquer



## any-bit: divide and conquer

four-bit input $x=x_{3} x_{2} x_{1} x_{0}$

$\mathbf{x} \mid(\mathbf{x} \gg 1)=\left(x_{3} \mid 0\right)\left(x_{2} \mid x_{3}\right)\left(x_{1} \mid x_{2}\right)\left(x_{0} \mid x_{1}\right)=y_{1} y_{2} y_{3} y_{4}$

## any-bit: divide and conquer

four-bit input $x=x_{3} x_{2} x_{1} x_{0}$

$\mathrm{x} \mid(\mathrm{x} \gg 1)=\left(x_{3} \mid 0\right)\left(x_{2} \mid x_{3}\right)\left(x_{1} \mid x_{2}\right)\left(x_{0} \mid x_{1}\right)=y_{1} y_{2} y_{3} y_{4}$
$\mathrm{y} \mid(\mathrm{y} \gg 2)=\left(y_{1} \mid 0\right)\left(y_{2} \mid 0\right)\left(y_{3} \mid y_{1}\right)\left(y_{4} \mid y_{2}\right)=z_{1} z_{2} z_{3} z_{4}$
$z_{4}=\left(y_{4} \mid y_{2}\right)=\left(\left(x_{2} \mid x_{3}\right) \mid\left(x_{0} \mid x_{1}\right)\right)=x_{0}\left|x_{1}\right| x_{2} \mid x_{3}$ "is any bit set?"

## any-bit: divide and conquer

four-bit input $x=x_{3} x_{2} x_{1} x_{0}$

$\mathbf{x} \mid(\mathbf{x} \gg 1)=\left(x_{3} \mid 0\right)\left(x_{2} \mid x_{3}\right)\left(x_{1} \mid x_{2}\right)\left(x_{0} \mid x_{1}\right)=y_{1} y_{2} y_{3} y_{4}$
$\mathrm{y} \mid(\mathrm{y} \gg 2)=\left(y_{1} \mid 0\right)\left(y_{2} \mid 0\right)\left(y_{3} \mid y_{1}\right)\left(y_{4} \mid y_{2}\right)=z_{1} z_{2} z_{3} z_{4}$
$z_{4}=\left(y_{4} \mid y_{2}\right)=\left(\left(x_{2} \mid x_{3}\right) \mid\left(x_{0} \mid x_{1}\right)\right)=x_{0}\left|x_{1}\right| x_{2} \mid x_{3}$ "is any bit set?"
unsigned int any_of_four (unsigned int $x$ ) \{
int part_bits $=(x \gg 1) \mid x ;$
return ((part_bits >> 2) | part_bits) \& 1;
\}

## any-bit: divide and conquer



## any-bit-set: 32 bits

unsigned int any(unsigned int x) \{

$$
x=(x \gg 1) \mid x ;
$$

$$
x=(x \gg 2) \quad x ;
$$

$$
x=(x \gg 4) \mid x ;
$$

$$
x=(x \gg 8) \mid x ;
$$

$$
x=(x \gg 16) \mid x
$$

return x \& 1;
\}

## bitwise strategies

use paper, find subproblems, etc.
mask and shift

$$
(x \& 0 x F 0) \gg 4
$$

factor/distribute

$$
(x \& 1) \mid(y \& 1)==(x \mid y) \& 1
$$

divide and conquer
common subexpression elimination

$$
\begin{aligned}
& \text { return }((-!!x) \& y) \mid((-!x) \& z) \\
& \text { becomes } \\
& d=!x ; \operatorname{return}((-!d) \& y) \mid((-d) \& z)
\end{aligned}
$$

exercise
Which of these will swap least significant and second least significant bit of an unsigned int $x$ ? (bits uvwxyz become uvwxzy)

```
/* version A */
    return ((x >> 1) & 1) | (x & (~1));
/* version B */
    return ((x >> 1) & 1) | ((x << 1) & (~2)) | (x & (~3));
/* version C */
    return (x & (~3)) | ((x & 1) << 1) | ((x >> 1) & 1);
/* version D */
    return (((x & 1) << 1) | ((x & 3) >> 1)) ^ x;
```


## version $A$

```
/* version A */
return ((x >> 1) & 1) | (x & (~1));
// ^^^^^^^^^^^^^^^^
// uvwxyz --> 0uvwxy -> 00000y
//
                                    \wedge\wedge ^^^^^^^^^
// uvwxyz --> uvwxy0
    \wedge^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// 00000y | uvwxy0 = uvwxyy
```


## version $B$

/* version $B$ */

$$
\left.\begin{array}{ll}
\text { return }((x \gg 1) \& 1)|((x \ll 1) \&(\sim 2))|(x \&(\sim 3)) ; \\
/ / \\
/ / & \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge \wedge
\end{array}\right)
$$

// uvwxyz --> $\quad$ vwxyzo --> $v w x y 00$
// ^^^^^^^^^
// uvwxyz --> uvwx00

## version C

```
/* version C */
    return (x & (~3)) | ((x & 1) << 1) | ((x >> 1) & 1);
// ^^^^^^^^^^^
// uvwxyz --> uvwx00
// uvwxyz --> 00000z --> 0000z0
//
// uvwxyz --> 0uvwxy --> 00000y
```


## version D

```
/* version D */
return (((x & 1) << 1) | ((x & 3) >> 1)) ^ x;
//
\wedge ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
// uvwxyz --> 00000z --> 0000z0
//
// uvwxyz --> 0000yz --> 00000y
//
```



```
// 0000zy ^ uvwxyz --> uvwx(z XOR y)(y XOR z)
```


## expanded code

```
int lastBit = x & 1;
int secondToLastBit = x & 2;
int rest = x & ~3;
int lastBitInPlace = lastBit << 1;
int secondToLastBitInPlace = secondToLastBit >> 1;
return rest | lastBitInPlace | secondToLastBitInPlace;
```


## exercise

Which of these are true only if $x$ has all of bit $0,3,6$, and 9 set (where bit $0=$ least significant bit)?

```
/* version A */
        x = (x >> 6) & x;
        x = (x >> 3) & x;
        return x & 1;
```

/* version B */
$\operatorname{return}((x \gg 9) \& 1) \&((x \gg 6) \& 1) \&((x \gg 3) \& 1) \&$

```
/* version C */
        return (x & 0x100) & (x & 0x40) & (x & 0x04) & (x & 0x01)
```

/* version D */
return (x \& $0 \times 145$ ) == $0 \times 145$;

## ISAs being manufactured today

(ISA = instruction set architecture)
x86 - dominant in desktops, servers
ARM - dominant in mobile devices
POWER - Wii U, IBM supercomputers and some servers
MIPS - common in consumer wifi access points
SPARC - some Oracle servers, Fujitsu supercomputers
z/Architecture - IBM mainframes
Z80 - TI calculators
SHARC — some digital signal processors
RISC V — some embedded

## microarchitecture v. instruction set

microarchitecture - design of the hardware
"generations" of Intel's x86 chips different microarchitectures for very low-power versus laptop/desktop changes in performance/efficiency
instruction set - interface visible by software what matters for software compatibility many ways to implement (but some might be easier)

## exercise

which of the following changes to a processor are instruction set changes?
A. increasing the number of registers available in assembly
B. decreasing the runtime of the add instruction
C. making the machine code for add instructions shorter
D. removing a multiply instruction
E. allowing the add instruction to have two memory operands (instead of two register operands))

## ISA "extensions"

I've been saying x86-64, ARM is an ISA
but there have been new instructions
(that weren't supported by original $\times 86-64$ or ARM processors)
really a bunch of variants of x86-64 (or ARM or ...), each of which is a different ISA
primary purpose of new processor designs usually to make non-ISA changes

ISA extensions won't improve performance of existing compiled code

## instruction set architecture goals

exercise: what are some goals to have when designing an instruction set?

## ISA variation

| instruction set | instr. <br> length | $\#$ normal <br> registers | approx. <br> $\#$ instrs. |
| :--- | :--- | :--- | :--- |
| x86-64 | $1-15$ byte | 16 | 1500 |
| Y86-64 | $1-10$ byte | 15 | 18 |
| ARMv7 | 4 byte* | 16 | 400 |
| POWER8 | 4 byte | 32 | 1400 |
| MIPS32 | 4 byte | 31 | 200 |
| Itanium | 41 bits* | 128 | 300 |
| Z80 | $1-4$ byte | 7 | 40 |
| VAX | $1-14$ byte | 8 | 150 |
| z/Architecture | $2-6$ byte | 16 | 1000 |
| RISC V | 4 byte* | 31 | $500^{*}$ |

## other choices: condition codes?

instead of:
cmpq \%r11, \%r12
je somewhere
could do:
/* _B_ranch if _EQ_ual */
beq \%r11, \%r12, somewhere

## other choices: addressing modes

ways of specifying operands. examples:
x86-64: $10(\% r 11, \% r 12,4)$
ARM: \%r11 << 3 (shift register value by constant)
VAX: ((\%r11)) (register value is pointer to pointer)

## other choices: number of operands

add src1, src2, dest ARM, POWER, MIPS, SPARC, ...
add src2, src1=dest x86, AVR, Z80, ...

VAX: both

## CISC and RISC

RISC - Reduced Instruction Set Computer reduced from what?

## CISC and RISC

RISC - Reduced Instruction Set Computer reduced from what?

CISC - Complex Instruction Set Computer

## some VAX instructions

MATCHC haystackPtr, haystackLen, needlePtr, needleLen Find the position of the string in needle within haystack.

POLY $x$, coefficientsLen, coefficientsPtr
Evaluate the polynomial whose coefficients are pointed to by coefficientPtr at the value $x$.

EDITPC sourceLen, sourcePtr, patternLen, patternPtr
Edit the string pointed to by sourcePtr using the pattern string specified by patternPtr.

## microcode

MATCHC haystackPtr, haystackLen, needlePtr, needleLen Find the position of the string in needle within haystack.
loop in hardware???
typically: lookup sequence of microinstructions ("microcode")
secret simpler instruction set

## Why RISC?

complex instructions were usually not faster
(even though programs with simple instructions were bigger)
complex instructions were harder to implement
compilers were replacing hand-written assembly
correct assumption: almost no one will write assembly anymore incorrect assumption: okay to recompile frequently

## typical RISC ISA properties

fewer, simpler instructions
seperate instructions to access memory
fixed-length instructions
more registers
no "loops" within single instructions
no instructions with two memory operands
few addressing modes

## is CISC the winner?

well, can't get rid of $x 86$ features backwards compatibility matters
more application-specific instructions
but...compilers tend to use more RISC-like subset of instructions
modern x86: often convert to RISC-like "microinstructions"
sounds really expensive, but ...
lots of instruction preprocessing used in 'fast' CPU designs (even for RISC ISAs)

## ISAs: who does the work?

CISC-like (harder to make hardware, easier to use assembly) choose instructions with particular assembly language in mind? hardware designer provides operations assembly-writers wants let the hardware worry about optimizing it?

RISC-like (easier to make hardware, harder to use assembly) choose instructions with particular HW implementation in mind? hardware designer exposes things it can do efficiently to assembly-writers
building blocks for compiler to make efficient programs?
note: general differences - no firm RISC v. CISC line

## ISAs: who does the work?

CISC-like (harder to make hardware, easier to use assembly) choose instructions with particular assembly language in mind? hardware designer provides operations assembly-writers wants let the hardware worry about optimizing it?

RISC-like (easier to make hardware, harder to use assembly) choose instructions with particular HW implementation in mind? hardware designer exposes things it can do efficiently to assembly-writers
building blocks for compiler to make efficient programs?
note: general differences - no firm RISC v. CISC line

## ISAs: who does the work?

CISC-like (harder to make hardware, easier to use assembly) choose instructions with particular assembly language in mind? hardware designer provides operations assembly-writers wants let the hardware worry about optimizing it?

RISC-like (easier to make hardware, harder to use assembly) choose instructions with particular HW implementation in mind? hardware designer exposes things it can do efficiently to assembly-writers
building blocks for compiler to make efficient programs?
note: general differences - no firm RISC v. CISC line

## backup slides

## parallel operations

key observation: bitwise and, or, etc. do many things in parallel can have single instruction do work of a loop
more than just bitwise operations:
e.g. "add four pairs of values together"
later: single-instruction, multiple data (SIMD)

## base-10 parallelism

compute $14+23$ and $13+99$ in parallel?
000014000013

+ 000023000099
000037000114
$14+23=37$ and $13+99=114$ - one add!
apply same principle in binary?


## base-2 parallelism

compute $110_{\mathrm{TWo}}+011_{\mathrm{TWo}}$ and $010_{\mathrm{TW}}+101_{\mathrm{TWo}}$ in parallel?

$$
\begin{array}{r}
0001100000010 \\
+\quad 000011000101
\end{array}
$$

(base 2)

001001000111
$110_{\mathrm{TWO}}+{011_{\mathrm{TWO}}}=1001_{\mathrm{TWO}} ; 010_{\mathrm{TWO}}+101_{\mathrm{TWO}}=111_{\mathrm{TWO}}$

## miscellaneous bit manipulation

common bit manipulation instructions are not in C :
rotate (x86: ror, rol) - like shift, but wrap around first/last bit set (x86: bsf, bsr)
population count (some $\times 86$ : popent) - number of bits set byte swap: (x86: bswap)

## simple operation performance

typical modern desktop processor:
bitwise and/or/xor, shift, add, subtract, compare $-\sim 1$ cycle integer multiply — ~ 1-3 cycles integer divide $-\sim 10-150$ cycles
(smaller/simpler/lower-power processors are different)

## simple operation performance

typical modern desktop processor:
bitwise and/or/xor, shift, add, subtract, compare - $\sim 1$ cycle integer multiply — ~ 1-3 cycles integer divide $-\sim 10-150$ cycles
(smaller/simpler/lower-power processors are different)
add/subtract/compare are more complicated in hardware!
but much more important for typical applications

## other choices: instruction complexity

instructions that write multiple values?
x86-64: push, pop, movsb, ...
more?

