Contents: Registers | Memory and Addressing | Instructions | Calling Convention (System V AMD64)

This page is a 64-bit (x86-64) adaptation of the classic 32-bit x86 Assembly Guide (AT&T syntax / GNU as). It keeps the same teaching style and subset of instructions, but updates: register names and sizes, stack behavior, addressing, and the calling convention.

We assume the GNU assembler (gas) using the standard AT&T syntax on UNIX-like systems. Operand order is source, destination. Registers are prefixed with % and immediates with $.


Registers

In 64-bit mode, x86-64 provides sixteen general purpose registers, each 64 bits wide. Many have historical names (accumulator, counter, etc.), but today they are largely general-purpose. Two registers are still used by convention for the stack and stack frames: the stack pointer %rsp and (optionally) the base/frame pointer %rbp.

General purpose registers (64-bit)

%rax accumulator / return value
%rbx callee-saved general register
%rcx counter / shift count uses %cl
%rdx used with mul/div; also arg register
%rsi arg register (often “source index” historically)
%rdi arg register (often “destination index” historically)
%rbp frame pointer (optional)
%rsp stack pointer
%r8%r15 additional general registers

Most registers also have smaller “views” (sub-registers) used for 32-bit, 16-bit, or 8-bit operations.

64-bit 32-bit 16-bit 8-bit low 8-bit high
%rax %eax %ax %al %ah
%rbx %ebx %bx %bl %bh
%rcx %ecx %cx %cl %ch
%rdx %edx %dx %dl %dh
%rsi %esi %si %sil (none)
%rdi %edi %di %dil (none)
%rbp %ebp %bp %bpl (none)
%rsp %esp %sp %spl (none)
%r8 %r8d %r8w %r8b (none)
%r9 %r9d %r9w %r9b (none)
%r10 %r10d %r10w %r10b (none)
%r11 %r11d %r11w %r11b (none)
%r12 %r12d %r12w %r12b (none)
%r13 %r13d %r13w %r13b (none)
%r14 %r14d %r14w %r14b (none)
%r15 %r15d %r15w %r15b (none)

Important x86-64 rule: writing a 32-bit sub-register (e.g. %eax) zero-extends into the full 64-bit register (so writing %eax clears the upper 32 bits of %rax). This does not happen for 8-bit or 16-bit writes.


Memory and Addressing Modes

Declaring Static Data Regions

Static data regions (like global variables) are declared after the .data directive. In addition to .byte (1 byte), .short (2 bytes), and .long (4 bytes), x86-64 code often uses .quad (8 bytes).

Example declarations:

.data
var:
    .byte 64                 # 1 byte at var, value 64
    .byte 10                 # 1 byte at var + 1, value 10

x:
    .short 42                # 2 bytes at x, value 42

y:
    .long 30000              # 4 bytes at y, value 30000

z:
    .quad 0x1122334455667788 # 8 bytes at z

Arrays are contiguous memory cells. For 64-bit integer arrays, prefer .quad. For byte arrays, strings and .zero are still useful.

arr32:
    .long 1, 2, 3        # 3 x 4 bytes, so arr32 + 8 is 3
arr64:
    .quad 1, 2, 3        # 3 x 8 bytes, so arr64 + 16 is 3
barr:
    .zero 10             # 10 zero bytes at barr
str:
    .string "hello"      # bytes for hello followed by NUL

Addressing Memory

In 64-bit mode, pointers and addresses are 64-bit quantities (though current CPUs/OSes often use fewer than 64 bits of virtual address space). Labels are replaced by the assembler/linker with an address.

As in 32-bit x86, memory addresses can be computed using: a base register + an index register × scale + displacement. In AT&T syntax, this is written: disp(base, index, scale) where scale ∈ {1,2,4,8}.

New in x86-64: code commonly uses RIP-relative addressing for globals, written symbol(%rip). This is the standard position-independent form.

Examples using mov:

movq (%rbx), %rax        # load 8 bytes from address in RBX into RAX
movq %rbx, var(%rip)     # store RBX into global variable var
movl -4(%rsi), %eax      # load 4 bytes from (RSI-4) into EAX (zero-extends into RAX)
movb %cl, (%rsi,%rax,1)  # store 1 byte (CL) to address RSI+RAX
movq (%rsi,%rbx,4), %rdx # load 8 bytes from address RSI + 4*RBX into RDX

Some invalid address calculations (same restrictions as 32-bit):

movq (%rbx,%rcx,-1), %rax      # invalid: scale must be 1,2,4, or 8 (not -1)
movq %rbx, (%rax,%rsi,%rdi,1)  # invalid: at most 2 registers in the address computation

Operation Suffixes

The instruction suffix indicates operand size: b=1 byte, w=2 bytes, l=4 bytes, q=8 bytes.

Sometimes the operand size is ambiguous, e.g. mov $2, (%rbx). In such cases, use an explicit suffix:

movb $2, (%rbx)  # store 1 byte
movw $2, (%rbx)  # store 2 bytes
movl $2, (%rbx)  # store 4 bytes
movq $2, (%rbx)  # store 8 bytes

Instructions

Machine instructions fall into three broad categories: data movement, arithmetic/logic, and control-flow. This is not exhaustive; it is a useful subset.

Notation used below:

<reg64> Any 64-bit register (%rax, %rbx, …, %r15)
<reg32> Any 32-bit register (%eax, %ebx, …)
<reg16> Any 16-bit register (%ax, %bx, …)
<reg8> Any 8-bit register (%al, %cl, %r8b, …)
<mem> A memory operand (e.g. (%rax), 8(%rbp), var(%rip), (%rax,%rbx,4))
<con64> Any 64-bit immediate constant
<con> Any immediate constant (size depends on instruction)

Immediate operands use a dollar sign prefix: $123, $0xABC, etc.

Data Movement Instructions

mov — Move

Copies data from the first operand to the second. Register-to-register is allowed; memory-to-memory is not (use a register as a temporary).

Syntax

mov{b,w,l,q} <reg>, <reg>
mov{b,w,l,q} <reg>, <mem>
mov{b,w,l,q} <mem>, <reg>
mov{b,w,l,q} <con>, <reg>
mov{b,w,l,q} <con>, <mem>

Examples

movq %rbx, %rax        # copy RBX into RAX
movb $5, var(%rip)     # store 5 into the byte at var
movl $0, %eax          # set EAX to 0 (also clears upper half of RAX)

push — Push on stack

Pushes an 8-byte value onto the stack: decrements %rsp by 8, then stores the value at (%rsp).

Syntax

pushq <reg64>
pushq <mem>
pushq <con>

Examples

pushq %rax
pushq var(%rip)

pop — Pop from stack

Pops an 8-byte value from the stack: loads from (%rsp), then increments %rsp by 8.

Syntax

popq <reg64>
popq <mem>

Examples

popq %rdi
popq (%rbx)

lea — Load effective address

Computes an address and places it in a register (does not load memory contents). Often used for pointer arithmetic and for RIP-relative addresses.

Syntax

leaq <mem>, <reg64>

Examples

leaq (%rbx,%rsi,8), %rdi   # RDI = RBX + 8*RSI
leaq var(%rip), %rax       # RAX = &var

Arithmetic and Logic Instructions

add — Integer addition

Adds the two operands, storing the result in the second operand. At most one operand may be memory.

Examples

addq $10, %rax       # RAX = RAX + 10
addb $10, (%rax)     # add 10 to the byte at address RAX

sub — Integer subtraction

Subtracts first operand from the second operand, storing the result in the second operand.

Examples

subq $216, %rax
sub  %ah, %al        # still valid for 8-bit sub-registers

inc, dec — Increment / Decrement

Increment or decrement by one.

Examples

decq %rax
incl var(%rip)       # add one to a 32-bit integer at var

imul — Integer multiplication

The two-operand form multiplies its operands and stores the result in the second operand (a register). A three-operand form exists with an immediate multiplier.

Syntax

imulq <reg64>, <reg64>
imulq <mem>, <reg64>
imulq <con>, <reg64>, <reg64>

Examples

imulq (%rbx), %rax       # RAX *= *(qword*)RBX
imulq $25, %rdi, %rsi    # RSI = RDI * 25

idiv — Signed integer division

Divides the signed 128-bit integer in %rdx:%rax (high:low) by the operand. Quotient is stored in %rax, remainder in %rdx.

Typically you prepare %rdx:%rax using cqto (sign-extend RAX into RDX).

Example

cqto
idivq %rbx      # (RDX:RAX) / RBX  -> quotient in RAX, remainder in RDX

and, or, xor — Bitwise logical operations

Perform the operation and store the result in the second operand (AT&T order still source, destination).

Examples

andq $0x0f, %rax     # clear all but the last 4 bits
xorq %rdx, %rdx      # set RDX to zero

not — Bitwise NOT

Example

notq %rax

neg — Two’s complement negation

Example

negq %rax

shl, shr, sar — Shifts

Shift count is an 8-bit immediate or %cl. For 64-bit operands, shift counts are effectively taken modulo 64, and the operand can be shifted up to 63 places.

Examples

shlq $1, %rax     # RAX *= 2 (if no overflow concern)
shrq %cl, %rbx    # RBX = floor(RBX / 2^CL) for unsigned values
sarq %cl, %rbx    # arithmetic right shift (sign-propagating)

Control Flow Instructions

The processor maintains an instruction pointer %rip, a 64-bit value pointing to the current instruction. It cannot be written directly, but is changed by control-flow instructions.

jmp — Jump

Unconditional jump to a label or indirect target.

Examples

jmp begin
jmp *%rax      # indirect jump to address in RAX

j<condition> — Conditional jump

Conditional branches based on flags set by a previous instruction (often cmp). Common conditions: je, jne, jg, jge, jl, jle.

cmpq %rbx, %rax
jle done

cmp — Compare

Like subtraction for flags, but discards the result.

Example

cmpb $10, (%rbx)
je loop

call, ret — Call and return

call pushes an 8-byte return address (next %rip) onto the stack and jumps to the target. ret pops that address and jumps back.


Calling Convention (System V AMD64)

In 32-bit x86, a common “C calling convention” passes parameters on the stack. In 64-bit UNIX-like systems, the standard is the System V AMD64 ABI, which passes the first arguments in registers. (Windows uses a different convention; see note at the end.)

Argument passing

The first six integer/pointer arguments are passed in registers:

arg1 %rdi
arg2 %rsi
arg3 %rdx
arg4 %rcx
arg5 %r8
arg6 %r9

Additional arguments (7 and beyond) are passed on the stack. Integer/pointer return values are placed in %rax.

Caller-saved vs callee-saved

Registers are divided into those the caller must assume can be clobbered (caller-saved), and those a callee must preserve if it uses them (callee-saved). A common summary:

Callee-saved %rbx %rbp %r12 %r13 %r14 %r15
Caller-saved %rax %rcx %rdx %rsi %rdi %r8 %r9 %r10 %r11

Stack alignment

Before executing a call, the stack pointer %rsp must be aligned to a 16-byte boundary. Because call pushes an 8-byte return address, a typical callee entry sees %rsp misaligned by 8 and fixes alignment in its prologue as needed.

Example: making a call (caller side)

Call myFunc(p1, 216, *p3) where: p1 is in %rax, and %rbx holds a pointer to the third argument value.

movq %rax, %rdi      # arg1 = p1
movq $216, %rsi      # arg2 = 216
movq (%rbx), %rdx    # arg3 = *p3
# ensure 16-byte stack alignment here if needed
call myFunc          # return value in %rax

Example: function definition (callee side)

A simple function that returns arg1 + (arg2 + arg3). This version uses a frame pointer (like the 32-bit guide) for clarity.

.text
.globl myFunc
.type myFunc, @function
myFunc:
    # Prologue
    pushq %rbp
    movq  %rsp, %rbp
    subq  $16, %rsp        # space for locals, keeps stack aligned

    # Body (args in %rdi, %rsi, %rdx)
    movq  %rdx, -8(%rbp)   # local = arg3
    addq  %rsi, -8(%rbp)   # local += arg2
    movq  %rdi, %rax       # rax = arg1
    addq  -8(%rbp), %rax   # rax += local

    # Epilogue
    leave
    ret

Windows note

If you are targeting Windows x64, the integer argument registers are RCX, RDX, R8, R9, and the caller must reserve 32 bytes of “shadow space” on the stack. The rest of this section assumes SysV AMD64.


Credits: Based on the structure of the classic x86 Assembly Guide (Ferrari/Batson/Lack/Jones/Evans), and later AT&T-syntax revisions. This page is a teaching-focused x86-64 adaptation.