HW4: System Calls
This homework teaches you how to set up xv6, start debugging it, and finally extend it with a simple system call.
You will program the xv6 operating system. We suggest that you use our xv6-64 course fork, which includes small modifications to support VSCode debugger integration. For this assignment, and for future assignments involving xv6, follow the xv6-64 setup instructions. After you’re done, you’ll be ready to start working on the assignment.
- Custom xv6-64 repo: here
- Native Debug VSCode extension (make sure you have this for better VSCode-to-GDB integration): here
Exercise 1: Debugging xv6
Use the shortcut Ctrl/Cmd + K Ctrl/Cmd + O to open a directory after launching VSCode and connecting to your remote server, presumably CADE. The shortcut opens a panel that lets you navigate to the correct directory. If your directory does not exist yet, you can use the terminal/command line and follow the xv6-64 setup instructions.
This first part of the assignment teaches you to debug the xv6 kernel with VSCode and GDB. First, let’s start the debugger and set a breakpoint on the main function.
From inside your xv6-64 directory, generate the debugger helper files once:
CADE$ make .gdbinit
CADE$ make launch.json
This creates .gdbinit, which tells GDB how to load xv6-64 symbols, and .vscode/launch.json, which VSCode uses to attach GDB to this QEMU process.
Now launch xv6-64 under QEMU with GDB enabled:
CADE$ make qemu-nox-gdb
Now open the Run and Debug tab in VSCode (Ctrl/Cmd + Shift + D) and click the Attach to QEMU button.
If everything is set up correctly, VSCode will attach GDB to the waiting QEMU instance. If you switch to the Debug Console tab in the VSCode terminal, you will see output roughly like this:
The target architecture is set automatically (currently i386:x86-64)
=cmd-param-changed,param="architecture",value="i386"
=> 0x80000103a93 <main>: push %rbp
Thread 1 hit Breakpoint 1, main () at main.c:19
19 {
You can set breakpoints in VSCode by clicking the red dot to the left of the line number, for example on main() in main.c. With this set, execution will stop when it reaches that line.
At the bottom of the VSCode Debug Console you can enter GDB commands, but prefix them with -exec. For example, -exec info registers lists the registers in use, -exec disas shows the assembly instructions, and -exec c continues execution. If you run GDB directly in a terminal instead of through VSCode, use the same commands without -exec.
To stop GDB, use Ctrl-C and then Ctrl-D. To exit xv6 running under QEMU, terminate it with Ctrl-A X.
GDB is attached early in boot, before xv6 has fully taken over the machine. If you want, you can stop execution at the beginning and single-step through the BIOS and early platform-initialization code with si. But that code is hard to interpret and not necessary for the homework, so it is fine to skip ahead. If you are curious, OSDev has background on System Initialization.
Part 1: Break inside the kernel and explain the stack (20%)
Now, set a breakpoint at the freerange() function in kalloc.c by clicking the red circle to the left of the function signature for freerange(). You may need to restart the debugging session for the breakpoint to take effect.
When the breakpoint is triggered, you can enter GDB commands in the Debug Console to inspect the state of the registers and dump 96 bytes pointed to by the rsp register.
-exec info registers
...
...
# Dump 96 bytes from the 64-bit stack pointer
-exec x/12gx $rsp
0x8000010fab0 <stack+4048>: 0x0000080000102da9 0x00000000000100e8
0x8000010fac0 <stack+4064>: 0x0000000000000000 0x000008000010fad8
0x8000010fad0 <stack+4080>: 0x0000080000103ab7 0x0000000000007bf8
0x8000010fae0 <bcache>: 0x0000000000000000 0x0000000000000000
0x8000010faf0 <bcache+16>: 0x0000000000000000 0x0000000000000000
0x8000010fb00 <bcache+32>: 0x0000000000000000 0x0000000000000000
(gdb)
Note: GDB prints two 8-byte values on each line in this stack dump.
Note: In the VSCode Debug Console, use -exec. In raw GDB, use the same commands without -exec.
Your task is to explain every value shown in the dump. Not all of the displayed memory belongs to the freerange() stack frame, so part of your answer should identify which values are part of that frame and which are not, and explain why. To do this well, you may need to understand what is placed on the stack when execution enters a function such as foo() and what the stack looks like when a breakpoint is hit inside that function. In x86-64, the first function arguments are usually passed in registers, so the stack mainly contains the return address, saved registers, and local variables. For example, lines labeled <bcache> are not part of the stack; they are global memory in .bss. This happens because x/12gx $rsp prints 96 bytes starting at $rsp, which can extend beyond the current stack frame and into adjacent global memory.
To better understand the dump, use additional debugging features. For example, you can inspect the assembly currently being executed:
-exec disas
The required task is still to explain the stack dump at freerange(). The following low-level tracing steps are optional, but they can help you understand where the stack contents come from.
You can try setting a breakpoint right when the stack is initialized in entry.S, at 0x0000080000100050 <start64+0>: movabs $0x8000010fae0, %rsp, and trace what is happening to the stack by single-stepping execution with the si (step instruction) command. VSCode may not let you set a breakpoint directly on this assembly instruction, but you can use GDB directly.
(gdb) set architecture i386:x86-64
(gdb) file kernel
(gdb) add-symbol-file kernel 0x80000000000
(gdb) disas start64
Dump of assembler code for function start64:
0x0000080000100050 <+0>: movabs $0x8000010fae0,%rsp
0x000008000010005a <+10>: movabs $0x80000103a93,%rax
0x0000080000100064 <+20>: jmpq *%rax
Note: There is a little trick here. The kernel is linked to run at 8 TB + 1 MB, but because the boot loader loads it at the 1 MB address, the entry point of the kernel ELF file is still set to the 1 MB range. For example, you can verify this by running readelf like this:
bash-4.4$ readelf -a kernel
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x10000c
while the actual linked address of the entry symbol is 0x8000010000c:
\>readelf -a kernel | grep entry
113: 000008000010000c 0 NOTYPE GLOBAL DEFAULT 1 entry
To set a breakpoint at the low physical entry point, subtract the 8 TB offset (0x80000000000 in hex). At startup, use the b (breakpoint) command in the Debug Console:
>b *0x10000c
If you restart the GDB session, set this breakpoint, and continue execution, you will break at the kernel entry point.
Note
- The kernel is linked at a high virtual address
0x80000000000, but the boot loader jumps to the low physical entry point, around0x10000c. - When GDB looks at the current instruction pointer (
rip), it is still in the low 1 MB region, which does not match the linked symbol addresses. So GDB cannot map it to a function name and will not show the usual disassembly unless you force it. - Disassemble the bytes at the current instruction pointer:
disas $rip,+16
Dump of assembler code from 0x10000c to 0x10001c:
=> 0x000000000010000c: mov %cr4,%rax
0x000000000010000f: or $0x20,%eax
0x0000000000100012: mov %rax,%cr4
0x0000000000100015: mov $0x10b000,%eax
0x000000000010001a: mov %rax,%cr3
End of assembler dump.
This is the power of GDB. Such tricks are hard or impossible to do with VSCode.
If you choose to follow this path, use si (step instruction) or ni (next instruction) to single-step until you reach freerange(). You can monitor the stack after each instruction.
Eventually you will reach the C code of the freerange() function, where you set your original breakpoint.
If you want to avoid typing these commands one by one, you can combine them into a user-defined GDB command that single-steps and prints the disassembly in a single shot. Add this to the .gdbinit file that GDB actually loads. If your repo generates .gdbinit from .gdbinit.tmpl, then update the template and regenerate it, or copy the changes into .gdbinit before starting GDB:
define mystep
si
disas $rip,+16
end
Now you can use mystep as a GDB command. You can also add stack dumping there so you can monitor stack changes.
Remember: your goal is to explain every value you see when you dump the stack.
Part 2: Process create system call (80%)
Now you’re ready to start on the main part of the homework, in which you will add a new pcreate() system call to the xv6 kernel. The main point of the exercise is for you to see some of the different pieces of the system call machinery, as well as the internals of process creation in xv6.
Your new system call will serve as a replacement for fork() and exec(), or more specifically, it will allow you to create new processes without forking.
Specifically, your new system call will have the following interface:
int pcreate(char *path, char **argv, int fds[16]);
Similar to exec(), pcreate() takes the path to the binary of the new process, its arguments **argv, and an additional array argument that specifies how the file descriptors of the parent are shared with the child. For example, if fds contains {3, 4, -1, -1, -1, ...}, it means that file descriptors 3 and 4 of the caller are copied into file descriptors 0 and 1 of the new process. -1 means that the file descriptor will be unallocated in the new process. At a high level, pcreate() can replace the fork() and exec() combination in most cases.
In order to test your system call, you should create a user-level program ptest that uses your new system call to create new processes. In order to make your new ptest program available to run from the xv6 shell, look at how other programs are implemented, such as ls and wc, and make the appropriate modifications to the Makefile so that the new application gets compiled, linked, and added to the xv6 filesystem.
When you’re done, you should be able to invoke your ptest program from the shell. You can follow the example template for ptest.c below, but feel free to extend it in any way you like:
#include "types.h"
#include "stat.h"
#include "user.h"
int
main(int argc, char *argv[])
{
/* Syscall invocation here */
exit();
}
In order to make your new ptest program available to run from the xv6 shell, add _ptest to the UPROGS definition in Makefile.
Your strategy for making the pcreate system call should be to clone all of the pieces of code that are specific to an existing system call, for example the uptime system call or read. You should grep for uptime in all the source files using grep -n uptime *.[chS]. You can also copy the code from the exec() system call to create the new process.
Hints
- You will want to allocate a new process similar to how it’s done in
fork()oruserinit(). - Since this is a new process, you will have to create a correct trapframe. Look at how it’s done in
userinit().
Extra credit (10%)
Use your new pcreate() implementation to create the first process in the system. That is, instead of using the assembly sequence, use the internals of pcreate() to load the ELF binary of the init process from disk.
Submit your work
Submit your solution as a compressed zip file of your xv6-64 source tree, after running make clean.
Your submission must include:
- Part 1 write-up: add an
explanation.txt,explanation.md, orexplanation.pdffile that explains each stack address/value from yourfreerange()dump. - Part 2 code: your
pcreate()implementation andptestuser program in your xv6 source tree.
You can use the following command to create the compressed zip file.
CADE$ cd xv6-64
CADE$ make clean
CADE$ zip -r ../hw4.zip .
Or, if you are using the original xv6 repository, which is not recommended:
CADE$ cd xv6-public
CADE$ make clean
CADE$ zip -r ../hw4.zip .