HW4: System Calls

This homework teaches you how to set up xv6, start debugging it and finally extend it with a simple system call.

You will program the xv6 operating system. We suggest you use a custom version of xv6 that includes small modifications that support VSCode Debugger integration. For this assignment (and for future xv6 assignments), follow the xv6 setup instructions. After you're done with them, you'll be ready to start working on the assignment.

Exercise 1: Debugging xv6

Use the shortcut Ctrl/Cmd + K Ctrl/Cmd + O to open a directory after launching VSCode and connecting to your remote server (CADE presumably). The shortcut will open a panel for you to navigate into the correct directory. If your directory does not exist yet, you can use the Terminal / Command-Line, and follow the xv6 setup instructions.

This first part of the assignment teaches you to debug the xv6 kernel with VSCode and GDB. First, let’s start the debugger and set a breakpoint on the main function.

From inside your xv6-cs5460 folder, launch QEMU (our version of the repo will automatically generate a json template in .vscode/launch.json which will allow your vscode to attach gdb to this qemu process:

CADE$ make qemu-nox-gdb
...

Now open the Run and Debug tab on VSCode (Ctrl/Cmd + Shift + D), and hit the Attach to QEMU button.

Congratulations! The VSCode debugger is connected to xv6. If you shift to the Debug Console tab on the VSCode Terminal, you might see the following:

Reading symbols from /home/u0478645/projects/cs5460/xv6-cs5460/kernel...
Warning: 'set target-async', an alias for the command 'set mi-async', is deprecated.
Use 'set mi-async'.

0x0000fff0 in ?? ()
+ symbol-file kernel
add symbol table from file "kernel"
Continue with 'c' or 'continue'

The GDB is ready to start running the assembly code of the BIOS that QEMU executes as part of the platform initialization. The BIOS starts at address 0xfff0 (you can read more about it in the How Does an Intel Processor Boot? blog post. You can single step through the BIOS machine code with the si (single instruction) GDB command if you like, but it's hard to make sense of what is going on so lets skip it for now and get to the point when QEMU starts executing the xv6 kernel.

At the bottom of your Debug Console you can enter GDB commands, like c (short for continue) to continue booting (you can also click the continue button at the top).

Note: if you need to exit GDB you can press Ctrl-C and then Ctrl-D. To exit xv6 running under QEMU you can terminate it with Ctrl-A X.

Part 1: Breaking inside the kernel and explaining what is on the stack (20%)

Now, set a breakpoint at the freerange() funciton in kalloc.c by clicking on the red circle to the left of the function signature of freerange(). You may need to restart the debugging session for the breakpoint to come into effect.

When breakpoint is triggered you can enter GDB commands in the Debug Console to inspect state of the registers and dump 24 bytes pointed by the esp register.

(gdb) info reg
...
(gdb) x/24x $esp
0x8010b590 <stack+4048>:	0x00010094	0x00010094	0x8010b5b8	0x80102fc1
0x8010b5a0 <stack+4064>:	0x801154a8	0x80400000	0x00000000	0x00000000
0x8010b5b0 <stack+4080>:	0x8010b5c4	0x00010094	0x00007bf8	0x00000000
0x8010b5c0 <bcache>:	0x00000000	0x00000000	0x00000000	0x00000000
0x8010b5d0 <bcache+16>:	0x00000000	0x00000000	0x00000000	0x00000000
0x8010b5e0 <bcache+32>:	0x00000000	0x00000000	0x00000000	0x00000000
(gdb)

Your assignment is to explain every value you see in this dump. Note, not all of it is stack (explain why). Also to better understand the meaning of the dump, you can use additional debugging features. For example, you can try to inspect the assembly code you are executing:

(gdb) disas

You can try to set a breakpoint right when the stack is initialized in entry.S. Note VSCode might not let you to set a breakpoint on the assembly instruction, but you can use GDB directly.

(gdb) disas entry
Dump of assembler code for function entry:
   0x8010000c <+0>:	mov    %cr4,%eax
   0x8010000f <+3>:	or     $0x10,%eax
   0x80100012 <+6>:	mov    %eax,%cr4
   0x80100015 <+9>:	mov    $0x109000,%eax
   0x8010001a <+14>:	mov    %eax,%cr3
   0x8010001d <+17>:	mov    %cr0,%eax
   0x80100020 <+20>:	or     $0x80010000,%eax
   0x80100025 <+25>:	mov    %eax,%cr0
   0x80100028 <+28>:	mov    $0x8010b5c0,%esp
   0x8010002d <+33>:	mov    $0x80102fa0,%eax
   0x80100032 <+38>:	jmp    *%eax
   0x80100034 <+40>:	xchg   %ax,%ax
   0x80100036 <+42>:	xchg   %ax,%ax
   0x80100038 <+44>:	xchg   %ax,%ax
   0x8010003a <+46>:	xchg   %ax,%ax
   0x8010003c <+48>:	xchg   %ax,%ax
   0x8010003e <+50>:	xchg   %ax,%ax
End of assembler dump.

You can also set a breakpoint at the entry point of the kernel (before the instruction that initializes the stack, i.e.,0x80100028 <+28>: mov $0x8010b5c0,%espand trace what is happening to the stack by single stepping execution with the step instruction (si) command.

Note There is a little trick here. The kernel is linked to run at 2GB + 1MB, but since the boot loader loads it at the 1MB address, the entry point of the kernel ELF file is still set to to the 1MB range. E.g., you can verify this by running readelf like this:

\>readelf kernel -a
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x10000c

While the actual linked address of the entry symbol is 0x8010000c

\>readelf kernel -a |grep entry
    83: 8010000c     0 NOTYPE  GLOBAL DEFAULT    1 entry

Ok, so to set a breakpoint we will subtract these 2GB (0x80000000 in hex) like this: on start in the Debug Console use the b (breakpoint) command:

>b *0x10000c

If you restart the GDB session, set the breakpoint, and hit continue you will break on the entry point of the kernel (you can re-watch the boot section to recall what is going on).

Note since the entry is linked at 2MB + 1GB but the instruction pointer is at 1MB mark GDB cannot match the addresses and show you a disassembly. To force the disassembly of the current code use this command (disassemble instructions pointed by the eip + 16 bytes):

disas $eip,+16
Dump of assembler code from 0x10000c to 0x10001c:
=> 0x0010000c:	mov    %cr4,%eax
   0x0010000f:	or     $0x10,%eax
   0x00100012:	mov    %eax,%cr4
   0x00100015:	mov    $0x109000,%eax
   0x0010001a:	mov    %eax,%cr3
End of assembler dump.

This is the power of GDB… such tricks are hard or impossible to do with VSCode.

Now use si (step instruction) or ni (next instruction) to single step until you reach freerange(). You can monitor what is going on on the stack after each instruction

Eventually you will reach the C code of the freerange() function where you set your original breakpoint.

Note, it’s a bit annoying to invoke one by one, so you can combine them in user-defined functions, like for example a command that single steps and prints the disassembly in a single shot. Add this to the .gdbinit.tmpl file:

define mystep
  si
  disas $eip,+16
end

Now you can use mystep as a GDB command. You can also add stack dumping there so you can monitor the stack changes.

Remember your goal is to explain every value you see when you dump the stack.

Part 2 (80%): process create system call

Now you're ready to start on the main part of the homework in which you will add a new pcreate() system call to the xv6 kernel. The main point of the exercise is for you to see some of the different pieces of the system call machinery as well as the internals of process creation in xv6.

Your new system call will serve as a replacement for fork() and exec() or more specifically, it will allow you to create new processes without forking.

Specifically, your new system call will have the following interface:

 int pcreate(char *path, char **argv, int fds[16]); 

Here similar to exec() the pcreate() takes the path to the binary of the new proces, it’s arguments **argv and an aditional argument an array that specifies how the file decriptors of the parent are shared with the child. For example if fds contains {3, 4, -1, -1, -1, ...} it means that file descriptors 3 and 4 of the caller are copied into thefile descriptors 0 and 1 of the new process. -1 means that the file decriptor will be unallocated in the new process. At a high-level, it seems that pcreate() can replace the fork() and exec() combination in most cases.

In order to test your system call you should create a user-level program ptest that uses your new system call to create new processes. In order to make your new ptest program available to run from the xv6 shell, look at how other programs are implemented, e.g., ls and wc and make appropriate modifications to the Makefile such that the new application gets compiled, linked, and added to the xv6 filesystem.

When you're done, you should be able to invoke your ptest program from the shell. You can follow the following example template for bt.c, but feel free to extend it in any way you like:

#include "types.h"
#include "stat.h"
#include "user.h"

int
main(int argc, char *argv[])
{
  /* Syscall invocation here */

  exit();
}

In order to make your new ptest program available to run from the xv6 shell, add _ptest to the UPROGS definition in Makefile.

Your strategy for making the pcreate system call should be to clone all of the pieces of code that are specific to some existing system call, for example the "uptime" system call or "read". You should grep for uptime in all the source files, using grep -n uptime *.[chS]. You can also copy the code from the exec() system call to create the new process.

Hints

You will want to allocate a new process similar to how it’s done in fork() or userinit().

Since it’s the new process you will have to create a correct trapframe, look at how it’s done in userinit()

Extra credit (10%)

Use your new pcreate() implementation to create the first process in the system, i.e., instead of using the assembly sequence use internals of pcreate() to load the ELF binary of the init process from disk.

Submit your work

Submit your solution through as a compressed zip file of your xv6 source tree (after running make clean). You can use the following command to create a compressed zip file.

CADE$ cd xv6-cs5460
CADE$ make clean
CADE$ zip -r ../hw4.zip .

Or, if you are using the original xv6 repository (not recommended):

CADE$ cd xv6-public
CADE$ make clean
CADE$ zip -r ../hw4.zip .