Contents: Registers | Memory and Addressing | Instructions | Calling Convention (System V AMD64)

This page is a 64-bit (x86-64) adaptation of the classic 32-bit x86 Assembly Guide, rewritten for NASM in Intel syntax.

We assume x86-64 on UNIX-like systems (Linux/BSD/macOS) and focus on the instruction subset used in typical systems/CS courses.

NASM + Intel syntax basics

Compared to AT&T syntax:

  • Operand order is destination, source.
  • Registers are written without a % prefix (e.g., rax, not %rax).
  • Immediate constants are written without a $ prefix (e.g., 123, 0xABC).
  • Memory operands use brackets: [...] (e.g., [rbp-8]).
  • NASM comments use ;.

Sections and symbols in NASM

NASM uses section directives (not .text/.data):

  • section .text for code
  • section .data for initialized globals
  • section .bss for uninitialized globals

Symbols are exported/imported with:

  • global name
  • extern name

64-bit mode and RIP-relative addresses

For x86-64 NASM source, it’s common to include:

  • bits 64 — assemble for 64-bit mode
  • default rel — treat bare [symbol] memory references as RIP-relative where appropriate (very convenient for position-independent code)

Minimal skeleton:

bits 64
default rel

section .text
global myFunc

myFunc:
    ret

Registers

In 64-bit mode, x86-64 provides sixteen general purpose registers, each 64 bits wide. Two registers are still used by convention for stack management: the stack pointer rsp and (optionally) the base/frame pointer rbp.

General purpose registers (64-bit)

Meaning (by convention)
rax accumulator / return value
rbx callee-saved general register
rcx counter / shift count uses cl
rdx used with mul/div; also arg register
rsi arg register (often “source index” historically)
rdi arg register (often “destination index” historically)
rbp frame pointer (optional)
rsp stack pointer
r8r15 additional general registers

Most registers also have smaller “views” (sub-registers) used for 32-bit, 16-bit, or 8-bit operations.

64 32 16 8 low 8 high
rax eax ax al ah
rbx ebx bx bl bh
rcx ecx cx cl ch
rdx edx dx dl dh
rsi esi si sil (none)
rdi edi di dil (none)
rbp ebp bp bpl (none)
rsp esp sp spl (none)
r8 r8d r8w r8b (none)
r9 r9d r9w r9b (none)
r10 r10d r10w r10b (none)
r11 r11d r11w r11b (none)
r12 r12d r12w r12b (none)
r13 r13d r13w r13b (none)
r14 r14d r14w r14b (none)
r15 r15d r15w r15b (none)

Important x86-64 rule: writing a 32-bit sub-register (e.g., eax) zero-extends into the full 64-bit register (so writing eax clears the upper 32 bits of rax). This does not happen for 8-bit or 16-bit writes.


Memory and Addressing Modes

Declaring Static Data Regions

Static data regions (like global variables) typically live in section .data (initialized) or section .bss (uninitialized).

NASM uses:

  • db (1 byte)
  • dw (2 bytes)
  • dd (4 bytes)
  • dq (8 bytes)

Example declarations:

section .data

var: db 64
     db 10

x:   dw 42

y:   dd 30000

z:   dq 0x1122334455667788

Arrays are contiguous memory cells. For 64-bit integer arrays, use dq. For byte arrays and strings, use db. For large areas of zeros you can use times (initialized) or .bss (uninitialized).

section .data

arr32: dd 1, 2, 3            ; 3 x 4 bytes, so arr32 + 8 is 3
arr64: dq 1, 2, 3            ; 3 x 8 bytes, so arr64 + 16 is 3
barr:  times 10 db 0         ; 10 zero bytes
str:   db "hello", 0         ; bytes for hello followed by NUL

section .bss
buf:   resb 10                ; 10 uninitialized bytes

Addressing Memory

In 64-bit mode, pointers and addresses are 64-bit quantities. Labels are replaced by the assembler/linker with addresses.

Memory addresses are written in brackets using the form:

[ base + index*scale + displacement ]

where scale ∈ {1,2,4,8} and the index*scale part is optional.

RIP-relative globals in NASM

In x86-64, globals are commonly accessed using RIP-relative addressing.

With default rel, you can usually write:

  • mov rax, [var] (load)
  • mov [var], rbx (store)
  • lea rax, [var] (address of global)

Without default rel, you can write the explicit form:

  • mov rax, [rel var]
  • lea rax, [rel var]

Examples using mov:

mov rax, [rbx] Load 8 bytes from address in RBX into RAX.
mov [var], rbx Store RBX into global variable var (RIP-relative with default rel).
mov eax, dword [rsi-4] Load 4 bytes from (RSI-4) into EAX (zero-extends into RAX).
mov [rsi+rax], cl Store 1 byte (CL) to address RSI+RAX.
mov rdx, [rsi+rbx*4] Load 8 bytes from address RSI + 4*RBX into RDX.

Some invalid address calculations (same restrictions as 32-bit):

mov rax, [rbx + rcx*3] Scale must be 1,2,4, or 8 (not 3).
mov [rax + rsi + rdi], rbx At most 2 registers in the address computation.

Operand size specifiers

NASM usually infers operand size from registers (e.g., mov eax, [rsi] implies a 32-bit load). But when a memory operand’s size is ambiguous, specify it explicitly:

  • byte (1 byte)
  • word (2 bytes)
  • dword (4 bytes)
  • qword (8 bytes)

For example, storing 2 to memory is ambiguous without a size:

mov byte  [rbx], 2
mov word  [rbx], 2
mov dword [rbx], 2
mov qword [rbx], 2

Instructions

Machine instructions fall into three broad categories: data movement, arithmetic/logic, and control-flow. This is not exhaustive; it is a useful subset.

Notation used below:

<reg64> Any 64-bit register (rax, rbx, …, r15)
<reg32> Any 32-bit register (eax, ebx, …)
<reg16> Any 16-bit register (ax, bx, …)
<reg8> Any 8-bit register (al, cl, r8b, …)
<mem> A memory operand (e.g. [rax], [rbp+8], [var], [rax+rbx*4])
<imm> Any immediate constant (size depends on instruction/assembler)

Immediate constants are written without a prefix: 123, 0xABC, etc.

Data Movement Instructions

mov — Move

Copies data from the source operand into the destination operand. Register-to-register is allowed; direct memory-to-memory is not (use a register as a temporary).

Syntax
mov <reg>, <reg>
mov <reg>, <mem>
mov <mem>, <reg>
mov <reg>, <imm>
mov <mem>, <imm>

Examples
mov rax, rbx — copy RBX into RAX
mov byte [var], 5 — store 5 into the byte at var (RIP-relative with default rel)
mov eax, 0 — set EAX to 0 (also clears upper half of RAX)

push — Push on stack

Pushes an 8-byte value onto the stack: decrements rsp by 8, then stores the value at [rsp].

Syntax
push <reg64>
push <mem>
push <imm>

Examples
push rax
push qword [var]

pop — Pop from stack

Pops an 8-byte value from the stack: loads from [rsp], then increments rsp by 8.

Syntax
pop <reg64>
pop <mem>

Examples
pop rdi
pop qword [rbx]

lea — Load effective address

Computes an address and places it in a register (does not load memory contents). Often used for pointer arithmetic and for RIP-relative addresses.

Syntax
lea <reg64>, <mem>

Examples
lea rdi, [rbx + rsi*8] — RDI = RBX + 8*RSI
lea rax, [var] — RAX = &var (RIP-relative with default rel)

Arithmetic and Logic Instructions

add — Integer addition

Adds the two operands, storing the result in the destination operand. At most one operand may be memory.

Examples
add rax, 10 — RAX = RAX + 10
add byte [rax], 10 — add 10 to the byte at address RAX

sub — Integer subtraction

Subtracts the source operand from the destination operand, storing the result in the destination operand.

Examples
sub rax, 216
sub al, ah — still valid for 8-bit sub-registers

inc, dec — Increment / Decrement

Increment or decrement by one.

Examples
dec rax
inc dword [var] — add one to a 32-bit integer at var

imul — Integer multiplication

The two-operand form multiplies its operands and stores the result in the destination operand (a register). A three-operand form exists with an immediate multiplier.

Examples
imul rax, qword [rbx] — RAX *= (qword)RBX
imul rsi, rdi, 25 — RSI = RDI * 25

idiv — Signed integer division

Divides the signed 128-bit integer in rdx:rax (high:low) by the operand. Quotient is stored in rax, remainder in rdx.

Typically you prepare rdx:rax using cqo (sign-extend RAX into RDX).

Example

cqo
idiv rbx        ; (RDX:RAX) / RBX

and, or, xor — Bitwise logical operations

Perform the operation and store the result in the destination operand.

Examples
and rax, 0x0f — clear all but the last 4 bits
xor rdx, rdx — set RDX to zero

not — Bitwise NOT

Example
not rax

neg — Two’s complement negation

Example
neg rax

shl, shr, sar — Shifts

Shift count is an 8-bit immediate or cl. For 64-bit operands, shift counts are effectively taken modulo 64, and the operand can be shifted up to 63 places.

Examples
shl rax, 1 — RAX *= 2 (if no overflow concern)
shr rbx, cl — RBX = floor(RBX / 2^CL) for unsigned values
sar rbx, cl — arithmetic right shift (sign-propagating)

Control Flow Instructions

The processor maintains an instruction pointer rip, a 64-bit value pointing to the current instruction. It cannot be written directly, but is changed by control-flow instructions.

jmp — Jump

Unconditional jump to a label or indirect target.

Examples
jmp begin
jmp rax — indirect jump to address in RAX

j<condition> — Conditional jump

Conditional branches based on flags set by a previous instruction (often cmp). Common conditions: je, jne, jg, jge, jl, jle.

cmp rax, rbx
jle done

cmp — Compare

Like subtraction for flags, but discards the result.

Example

cmp byte [rbx], 10
je loop

call, ret — Call and return

call pushes an 8-byte return address (next rip) onto the stack and jumps to the target. ret pops that address and jumps back.


Calling Convention (System V AMD64)

In 32-bit x86, a common “C calling convention” passes parameters on the stack. In 64-bit UNIX-like systems, the standard is the System V AMD64 ABI, which passes the first arguments in registers. (Windows uses a different convention; see note at the end.)

Argument passing

The first six integer/pointer arguments are passed in registers:

arg1 rdi
arg2 rsi
arg3 rdx
arg4 rcx
arg5 r8
arg6 r9

Additional arguments (7 and beyond) are passed on the stack. Integer/pointer return values are placed in rax.

Caller-saved vs callee-saved

Registers are divided into those the caller must assume can be clobbered (caller-saved), and those a callee must preserve if it uses them (callee-saved). A common summary:

Callee-saved rbx rbp r12 r13 r14 r15
Caller-saved rax rcx rdx rsi rdi r8 r9 r10 r11

Stack alignment

Before executing a call, the stack pointer rsp must be aligned to a 16-byte boundary. Because call pushes an 8-byte return address, a typical callee entry sees rsp misaligned by 8 and fixes alignment in its prologue as needed.

Example: making a call (caller side)

Call myFunc(p1, 216, *p3) where: p1 is in rax, and rbx holds a pointer to the third argument value.

mov rdi, rax        ; arg1 = p1
mov rsi, 216        ; arg2 = 216
mov rdx, [rbx]      ; arg3 = *p3
; ensure 16-byte stack alignment here if needed
call myFunc         ; return value in rax

Example: function definition (callee side)

A simple function that returns arg1 + (arg2 + arg3). This version uses a frame pointer (like the 32-bit guide) for clarity.

bits 64
default rel

section .text
global myFunc

myFunc:
    ; Prologue
    push rbp
    mov  rbp, rsp
    sub  rsp, 16            ; space for locals, keeps stack aligned

    ; Body (args in rdi, rsi, rdx)
    mov  qword [rbp-8], rdx
    add  qword [rbp-8], rsi
    mov  rax, rdi
    add  rax, qword [rbp-8]

    ; Epilogue
    leave
    ret

Windows note

If you are targeting Windows x64, the integer argument registers are RCX, RDX, R8, R9, and the caller must reserve 32 bytes of “shadow space” on the stack. The rest of this section assumes SysV AMD64.


Credits: Based on the structure of the classic x86 Assembly Guide (Ferrari/Batson/Lack/Jones/Evans), and later AT&T-syntax revisions. This page is a teaching-focused x86-64 adaptation.