Table of Contents

This is designed to be a short summary. For more, see also Bryan and O’Halleron (from CMU)’s 46-page chapter on writing assembly using AT&T syntax; Bloomfield (from UVA)’s 12-page chapter on assembly using Intel syntax and 6-page chapter on calling conventions; Intel’s 2,226-page decription of all instruction; and AMD’s multi-volume manual.

1 Addressing modes

Operands of most operations may be either a register, an immediate value, or the contents of memory. A memory address in general is made of an immediate, two registers, and a scale on one of the registers: imm + rA + rB*s where s is one of the four specific values 1, 2, 4, or 8.

2 Two syntaxes

For mostly historical reasons, x86-64 has two different syntaxes.

Feature	Intel syntax	AT&T syntax	Mnemonic
Register	`rsp`	`%rsp`	AT&T has a & in its name, and wants similar symbols elsewhere
Immediate	`23`	`$23`	AT&T has a & in its name, and wants similar symbols elsewhere
Reg+Imm Addr	`[rsp+23]`	`23(%rsp)`	Intel uses infix operators
R+R*4+Imm Addr	`[rsp+r8*4+23]`	`23(%rsp,%r8,4)`	Intel uses infix operators
`a += b`	`add rax,rbx`	`addq %rbx, %rax`	Intel starts with what that answer goes into; AT&T starts with the argument.

In general, AT&T syntax is more explicit: there are prefixes for types, operations have width suffixes, etc. Intel syntax, on the other hand, is more loose, and has to add things like QWORD PTR if the instructions operands do not make the width of a command obvious.

Intel syntax	AT&T syntax
`mov QWORD PTR [rdx+0x227],rax`	`mov %rax,0x227(%rdx)`

The width specifiers are

bits	historical name	Intel name	AT&T Suffix	register names
8	byte	BYTE	b	ah, al, r9b, …
16	word, as this was the native size of the 8086 processor	WORD	w	ax, r9w, …
32	double word	DWORD	l	eax, r9d, …
64	quad word	QWORD	q	rax, r9, …

The most popular *nix toolchains default to AT&T syntax. The most popular Windows toolchains default to Intel syntax.

3 Registers

The general-purpose program registers in x86-64 have somewhat ideosyncratic names:

8-bit	16-bit	32-bit	64-bit	calling	callee-save	notes
al, ah	ax	eax	rax	return		special meaning for multiply and divide instructions
cl, ch	cx	ecx	rcx	arg 4
dl, dh	dx	edx	rdx	arg 3		special meaning for multiply and divide instructions
bl, bh	bx	ebx	rbx		yes
spl	sp	esp	rsp		yes	stack pointer
bpl	bp	ebp	rbp		yes
sil	si	esi	rsi	arg 2
dil	di	edi	rdi	arg 1
r8b	r8w	r8d	r8	arg 5
r9b	r9w	r9d	r9	arg 6
r10b	r10w	r10d	r10
r11b	r11w	r11d	r11
r12b	r12w	r12d	r12		yes
r13b	r13w	r13d	r13		yes
r14b	r14w	r14d	r14		yes
r15b	r15w	r15d	r15		yes

The registers overlap in the low-order bits. Thus, if r15 is 0x0123456789abcedf then r15d is 0x89abcdef, r15w is 0xcdef, and r15b is 0xef. Some registers also have names for both the lowest-byte _l and next-higher-byte _h. Thus, if rax is 0x0123456789abcedf then eax is 0x89abcdef, ax is 0xcdef, al is 0xef, and ah is 0xcd.

In part because x86-64 has preserved backwards compatibility with many previous architectures all the way back to the 16-bit integer-only 8086, many newer operations have been placed in their own register bank with their own operations. As one of the larger examples, floating-point operations are not handled from the main registers. However, we’ll restrict ourselves to the registers above and the operations that work on them.

4 The most important instructions

x86-64 has thousands of instructions, but many of them are used only in fairly specialized cases. The following instructions are the most important to understand x86-64 code.

4.1 Move

The various mov instructions implement, in effect, the assignment operator =. Moves can be done between registers, memory, and immediates, with some limitations; as a rule of thumb, either the source or destination must be a register.

When moving from a smaller source to a larger destination, mov has two variants: movzx (zero-extend) fills in the extra high-order bits with zeros, and movsx (sign-extend) fills in the extra high-order bits with copies of the high-order bit of the source.

There is a special “swap” instruction xchg, though it is usually used to implement a no-op not a move.

There are special moves for moving between register banks (as, e.g., moves to and from the XMM registers, etc).

There are also conditional moves which only moves if the condition codes indicate the last compared value had a particular relationship to 0, although those are fairly uncommon in compiled code.

4.2 Jumps

Jumps move the pc to a new location. jmp does this unconditionally, and various other instructions do so conditionally. Conditions in x86-64 are based on the “condition codes”, a set of single-bit flags that store enough information to compare a value to 0. Condition codes are set by most ALU operations, as well as by the special cmp and test operations.

Because comparisons are done differently for signed and unsigned values, there are multiple versions of comparions:

je, jne: Jump if the compared values were equal (je) or not equal (jne) or the result of the last operations was equal to 0 (je) or not (jne)
ja, jae, jb, jbe: Jump if the first compared values was above/below the other, using unsigned comparisons.
jg, jge, jl, jle: Jump if the first compared values was greater/lesser that the other, using signed comparisons.

There are also conditional jumps that check just single bits of the conditions codes, one of moderate commonness being js which checks the sign bit.

4.3 Load Effective Address

One specific instruction, lea, is widely used. It looks like a memory-to-register move, but instead of loading the contents of memory at an address it loads the address itself.

Because addresses are computed by adding two registers and an immediate, with one address being multiplied by a small power-of-two constant, lea is commonly used to perform basic arithmetic. For example, code like a = 5*b + 20 can be written in AT&T syntax x86-64 as lea 20(rbx,rbx,4),%rax.

4.4 ALU operations

Most ALU operations are implemented in x86-64 as assignment operators in code.

Instruction	Is like
`add`	`+=`
`sub`	`-=`
`and`	`&=`
`or`	`\|=`
`xor`	`^=`
`shl`	`<<=`
`shr`	`>>=`, zero-extending
`sar`	`>>=`, sign-extending

These instructions also set the condition codes. Additionally, cmp sets the condition codes like sub, and test like and, but both without storing the result in a register.

Multiplication and division are implemented differently. The result of addition can be one bit larger than its largest operand, which makes += a relatively safe way to handle it; but multiplication can result in twice as many bits as the largest operand, and the circuitry that does division also does modulus as the same time, meaning both effectively have multiple registers of return value(s). There are several variants, but to get a feel two are described below:

imul X: multiplies rax by register X, storing the 128-bit result with the high-order 64-bits in rdx and the low-order in rax.
idiv X: divides a 128-bit numerator (high-order bits in rdx, low-order in rax) by register X, storing the quotient in rax and the remainder in rdx.

4.5 Push and pop

The behavior of push X can be described as

rsp -= 8
            memory[rsp] = X

The behavior of pop X can be described as

X = memory[rsp]
            rsp += 8

Note that some programs use only 32-bit and smaller values, and use a variant of push and pop that adjust esp by 4 instead of rsp by 8.

Push and pop are widely used in common function call protocols, for argument passing and register saving, as explained in Calling Conventions.

4.6 Call and return

call X means "push the address of the next instruction, then jmp X. ret means pop PC – an instruction not otherwise writeable using pop because PC is not a program register.

4.7 No operation

Compilers generate a surprising number of operations that do nothing. Called “no-ops” or “nops,” these are used to align certain instructions with multiple-of-8 addresses and other less-than-obvious optimizations.

Some no-ops may the specific no-op instruction nop, and others are encoded as meaningless instructions like xchg %eax,%eax

5 Calling Conventions

Although not intrinsically dictated by the ISA itself, it is common for ISAs to be accompanied by a recommended calling convention. This involves three primary components:

Argument passing

Invoking a function (with call) involves jumping to its code and storing where to return to. That code needs to know where to find it’s arguments.

x86-64’s most common calling convention¹ puts the arguments, in order, in rdi, rsi, rdx, rcx, r8, and r9. Remaining arguments, if any, are pushed onto the stack, last to first, before the call.

Return value passing

The code that invokes a function needs to know from where to retrieve it’s return value. x86-64’s most common calling conventions put the return value into rax.

Callee- and caller-save registers

In general, both the code that invokes a function and the code of the function itself will use all the program registers. This means that the old values of these registers must be saved and restored.

x86-64 calling conventions distinguish between callee-save and caller-save registers.

A caller-save register is one that the invoking code must assume the invoked code might have changed, thus necessitating saving it before the call if it contains meaningful data to the invoking code. It is also one that the invoked code can use without first saving and later restoring.

A callee-save register is one that the invoking code can assume the invoked code will not change, and thus the invoking code does not need to save before the call. It is also one that the invoked code cannot use unless it first saves its value and restores that saved value to the register before returning.

The most common way to save a register is to push its contents onto the stack using push (as, e.g., push %rax) or a similar rsp-based mov (as, e.g., mov %rax,-32(rsp)).

x86-64’s most common calling convention² identifies rsp, rbp, and r12 through r15 as callee-save registers and all others (rax, rbx, rcx, rdx, rsi, rdi, and r8 through r11) as caller-save.

Note that all of the above is merely convention. A program could violate all of these rules and still work fine, but it might have some difficulty interacting with other functions if it does. In that way it is similar to conventions about what side of the street to drive on or what color of traffic signal light means “stop”: the decision is fairly arbitrary, but if you make a different arbitrary decision than others do then things are not likely to go well for you.

6 The most common x86-64 instructions

For the curious, I counted how many times different instructions occurred in the 200,723,121 instructions comprising the programs in the /usr/bin directory of my installation of Manjaro Linux. The following table lists the most frequent, omitting those that use different register sets.

Frequency	instruction	meaning
72,239,722	`mov`	`=`
14,145,074	`lea`	“load effective address,” usually used for addition `lea rbp,[rip+0x20aa7e]` is equivalent to `rbp = rip + 0x20aa7e`
12,327,021	`call`	push the PC and jump to address
9,228,101	`add`	`+=`
8,346,941	`cmp`	set flags as if performing subtraction
7,897,873	`jmp`	unconditionally jump to new address
7,572,220	`test`	set flags as if performing `&`
7,539,235	`je`	jump if and only if flags indicate `== 0`
5,651,123	`pop`	pops a value off of the stack First reads from address in `rsp`, then increases `rsp` by the size of the value read.
5,555,926	`push`	pushes a value onto the stack First decreases `rsp` by the size of the value, then writes the value into memory at the address thereafter stored in `rsp`.
5,534,972	`xor`	`^=`
5,272,088	`jne`	jump if and only if flags indicate `!= 0`
5,216,558	`nop`	do nothing
4,902,189	`int3`	used to contact the operating system
2,696,014	`sub`	`-=`
2,366,491	`ret`	pop the PC from the stack
1,683,246	`movzx`	move, zero-extending (for assigning from smaller- to larger-sized register or memory region)
1,083,966	`and`	`&=`
863,217	`shl`	`<<=`
859,984	`movsx`	move, sign-extending (for assigning from smaller- to larger-sized register or memory region)
685,614	`jbe`	jump if and only if flags indicate `<= 0` using unsigned comparison (`b` indicates “below”)
662,053	`ja`	jump if and only if flags indicate `> 0` using unsigned comparison (`a` indicates “above”)
643,204	`or`	`\|=`
596,148	`shr`	`>>=`, zero-extending
493,925	`xchg`	swap the contents of the two arguments
444,418	`jle`	jump if and only if flags indicate `<= 0` using unsigned comparison (`l` indicates “less”)
403,570	`jb`	jump if and only if flags indicate `< 0` using unsigned comparison (`b` indicates “below”)
371,140	`jae`	jump if and only if flags indicate `>= 0` using unsigned comparison (`a` indicates “above”)
360,654	`jg`	jump if and only if flags indicate `> 0` using unsigned comparison (`g` indicates “greater”)
335,154	`sar`	`>>=`, sign-extending
320,519	`movabs`	(special move for large immediates)
307,118	`js`	jump if sign bit set
292,134	`imul`	integer multiply of 32-bit values. ???
259,750	`ud2`	unreachable code

Unlike other platforms, Microsoft toolchains use just 4 registers for arguments: rdx, rcx, r8, and r9↩︎
Unlike other platforms, Microsoft toolchains use rbx, rbp, rdi, rsi, rsp, r12, r13, r14, and r15 as callee-save and rax, rcx, rdx, r8, r9, r10 and r11 as caller-save↩︎