Homework 3: Boot into C

This assignment will teach you to build a minimal bootable kernel that boots into C on real x86_64 hardware. It will boot using GRUB, print "Hello, world!" on the screen, and then print "Hello from C!" on the serial line from the main() function.

Technically, you can do this assignment on any operating system that lets you use GCC, make, GRUB2, and QEMU (CADE machines, your laptop running Linux, a Linux VM such as WSL on Windows, and even macOS with cross-compilation via Nix, etc.). You don’t need to set up xv6 for this assignment, but if you’re running on CADE you’ll have to install QEMU, see QEMU setup instructions. Submit your code through Gradescope (see instructions at the bottom of this page).

NOTE: YOU CANNOT PUBLICLY RELEASE SOLUTIONS TO THIS HOMEWORK. It’s ok to show your work to your future employer as a private Git repo, however any public release is prohibited.

Much of this assignment is based on the intermezzOS project, the OSDev wiki, and the xv6-x64 project.

Assignment overview

Download the starter files here and extract it with tar xvf src.tgz or with your graphical file manager.

The starter files we provide will:

  1. Boot under GRUB with a Multiboot header.
  2. Enable paging and enter long mode in assembly.
  3. Print "Hello, world!" onto the screen.
  4. Set up a stack and call C main().

Most of this assignment describes the starter files. You will need to understand it for the final, but we provide code that already does this.

For the required part of HW3, your main.c must:

  1. Setup a new 4-level page-table hierarchy using 4KB pages.
  2. Identity-map the first 8MB of virtual addresses to the first 8MB of physical memory.
  3. Load the new PML4 into CR3 with write_cr3().

For the required submission, you only need to submit main.c. Do not modify the other files.

Environment setup on macOS

This assignment does not need special setup on Linux. However, if you’re using macOS, you will need to install a cross-compilation toolchain to build a kernel for x86_64. The easiest way to do this is with brew install x86_64-elf-binutils x86_64-elf-gcc. This will give you cross-compiling versions of gcc and ld with the prefix x86_64-elf-. So instead of using gcc and ld, you will use x86_64-elf-gcc and x86_64-elf-ld to compile and link your kernel. Also use brew install qemu to install QEMU on macOS. You still will not be able to create a QEMU-bootable ISO on macOS with this toolchain alone; a Docker-based workaround is described later. Alternatively, you can set up a Linux VM on macOS and do the development there.

Boot overview

When you turn on a computer, it loads the BIOS from some special flash memory. The BIOS runs self-test and initialization routines of the hardware, then it looks for bootable devices. If it finds one, the control is transferred to its bootloader, which is a small portion of executable code stored at the device’s beginning. The bootloader has to determine the location of the kernel image on the device and load it into memory. It also needs to switch the CPU to the so-called protected mode because x86 CPUs start in the very limited real mode by default (to be compatible with programs from 1978). (By the way, on modern PCs, the BIOS is replaced by the UEFI firmware, which usually boots all the way to 64-bit long mode.)

We won’t write a bootloader because that would be a complex project on its own (we partially covered this in class since xv6 implements a simple boot loader with two files: bootasm.S and bootmain.c). Instead, we will use one of the many well-tested bootloaders out there to boot our kernel from a CD-ROM.

Multiboot headers

Let’s get going! The very first thing we’re going to do is create a multiboot header. What’s that, you ask? Well, to explain it, let’s take a small step back and talk about how a computer boots up.

One of the amazing and terrible things about the x86_64 architecture is that it’s maintained backwards compatibility throughout the years. This has been a competitive advantage, but it’s also meant that the boot process is largely a pile of hacks. Each time a new iteration comes out, a new step gets added to the process. That’s right, when your fancy new computer starts up, it thinks it’s an 8086 from 1976. And then, through a succession of steps, we transition through more and more modern architectures until we end at the latest and greatest.

The first mode is called “real mode”. This is a 16-bit mode that the original x86 chips used. The second is “protected mode”. This 32-bit mode adds new things on top of real mode. It’s called “protected” because real mode sort of let you do whatever you wanted, even if it was a bad idea. Protected mode was the first time that the hardware enabled certain kinds of protections that allow us to exercise more control around such things as RAM. We’ll talk more about those details later.

The final mode is called “long mode”, and it’s 64 bits. In this homework, we will transition from 32-bit protected mode into 64-bit long mode before entering main().

So that’s the task ahead of us: make the jump up the ladder and get to 64-bit long mode. We can do it! Let’s talk more details.

Firmware and the BIOS

So let’s begin by turning on the computer.

When we press the power button, a bunch of low-level initialization protocols are executed: Management Engine, BIOS, etc.

With the BIOS we’re already in the land of software, but unlike software that you may be used to writing, the BIOS comes bundled with its computer and is located in read-only memory (ROM).

One of the first things the BIOS does is run a “POST” or power-on self-test which checks for the availability and integrity of all the pieces of hardware that the computer needs including the BIOS itself, CPU registers, RAM, etc. If you’ve ever heard a computer beeping at you as it boots up, that’s the POST reporting its findings.

Assuming no problems are found, the BIOS starts the real booting process.

By the way…

For a while now most commercial computer manufacturers have hidden their BIOS booting process behind some sort of splash screen. It’s usually possible to see the BIOS’ logs by pressing some collection of keys when your computer is starting up.

The BIOS also has a menu where you can see information about the computer like CPU and memory specs and all the hardware the BIOS detected like hard drives and CD and DVD drives. Typically this menu is accessed by pressing some other weird collection of keyboard keys while the computer is attempting to boot.

The BIOS automatically finds a “bootable drive’ by looking in certain pre-determined places like the computer’s hard drive and CD and DVD drives. A drive is ‘bootable’ if it contains software that can finish the booting process. In the BIOS menu, you can usually change in what order the BIOS looks for bootable drives or tell it to boot from a specific drive.

The BIOS knows it’s found a bootable drive by looking at the first few kilobytes of the drive and looking for some magical numbers set in that drive’s memory. This won’t be the last time some magical numbers or hacky-sounding things are used on our way to building an OS. Such is life at such a low level…

When the BIOS has found its bootable drive, it loads part of the drive into memory and transfers execution to it. With this process, we move away from what comes dictated by the computer manufacturer and move ever closer to getting our OS running.

If you are booting a modern machine manufactured in the last two decades, it is almost definitely using UEFI instead of the BIOS. UEFI is a newer firmware standard that replaces the BIOS, and it has a different boot process. However, we will not go into the details of UEFI in this assignment to keep things simple. The following sections will focus on the BIOS boot process, but the high-level concepts (e.g. how a machine transits from 16-bit real mode to 64-bit long mode, how the page tables work, etc.) are similar in both BIOS and UEFI.

Bootloaders

The part of our bootable drive that gets executed is called a “bootloader”, since it loads things at boot time. The bootloader’s job is to take our kernel, put it into memory, and then transition control to it.

Some people start their operating systems journey by writing a bootloader. For example, in class we started by looking at the xv6 bootloader that is loaded by the BIOS at the 0x7c00 address. In this assignment we will not be doing that.

In the interest of actually getting around to implementing a kernel, instead, we’ll use an existing bootloader: GRUB.

GRUB and Multiboot

GRUB stands for “grand unified bootloader”, and it’s a common one for GNU/Linux systems. GRUB implements a specification called Multiboot, which is a set of conventions for how a kernel should get loaded into memory. By following the Multiboot specification, we can let GRUB load our kernel.

The way that we do this is through a “header”. We’ll put some information in a format that Multiboot specifies right at the start of our kernel. GRUB will read this information, and follow it to do the right thing.

Another advantage of using GRUB is that it handles the transition from real mode to protected mode for us, skipping the first step. We then finish the job ourselves by setting up paging and switching to 64-bit long mode. If you’re curious about the things you would have needed to know, put “A20 line” into your favorite search engine, and get ready to cry yourself to sleep.

Writing our own Multiboot header

I said we were going to get to the code, and then I went on about more history. Sorry about that. It’s code time for real. You can download the entire folder that contains a skeleton for the homework files here. Inside your homework folder there is a file called multiboot_header.S (case matters). Open it in your favorite editor. I use emacs, but you should feel free to use anything you’d like.

$ emacs multiboot_header.S

This is a .S file, which is the extension for “assembly”. That’s right, we’re going to write some assembly code here. Don’t worry! It’s not super hard.

An aside about assembly

Have you ever watched Rich Hickey’s talk “Simple vs. Easy”? It’s a wonderful talk. In it, he draws a distinction between these two words, which are commonly used as synonyms.

Assembly coding is simple, but that doesn’t mean that it’s easy. We’ll be doing a little bit of assembly programming to build our operating system, but we don’t need to know that much. It is completely learnable, even for someone coming from a high-level language. You might need to practice a bit, and take it slow, but I believe in you. You’ve got this. A manual on GNU assembler (the assembler we’ll be using) can be found here.

The Magic Number

Our first assembly file will be almost entirely data, not code. Here’s the first line:

    .long 0xe85250d6 # magic number

Ugh! Gibberish! Let’s start with the # symbol. It’s a comment that lasts until the end of the line. This particular comment says “magic number”. As we said, you’ll be seeing a lot of magic numbers in your operating system work. The idea of a magic number is that it’s completely and utterly arbitrary. It doesn’t mean anything. It’s just magic. The very first thing that the multiboot specification requires is that we have the magic number 0xe85250d6 right at the start.

By the way…

Wondering how a number can have letters inside of it? 0xe85250d6 is written in hexadecimal notation. Hexadecimal is an example of a “numeral system”, which is a fancy term for a system for conveying numbers. The numeral system you’re probably most familiar with is the decimal system, which conveys numbers using a combination of the symbols 0 - 9.

Hexadecimal, on the other hand, uses a combination of 16 symbols: 0 - 9 and a - f. Along with its fellow numeral system, binary, hexadecimal is used a lot in low-level programming. In order to tell if a number is written in hexadecimal, you may be tempted to look for the use of letters in the number, but a more surefire way is to look for a leading 0x. While 100 isn’t a hexadecimal number, 0x100 is.

What’s the value in having an arbitrary number there? Well, it’s a kind of safeguard against bad things happening. This is one of the ways in which we can check that we actually have a real multiboot header. If it doesn’t have the magic number, something has gone wrong, and we can throw an error.

I have no idea why it’s 0xe85250d6, and I don’t need to care. It just is.

Finally, the .long directive. This is a GNU assembler directive that says “put the following 32-bit value into the file at this location”. The 0xe85250d6 is a 32-bit value, so it fits perfectly. If we had a bigger number, we’d have to use a different directive. If we had a smaller number, we could still use .long, but it would just be padded with zeros on the left. In case you were wondering, .short is the same thing but for 16-bit numbers, .byte for 8-bit numbers and .quad for 64-bit numbers.

The Mode Code

Okay, time to add a second line:

    .long 0xe85250d6 # magic number
    .long 0 # architecture (0 for i386)

This is another form of magic number. This value is the Multiboot2 architecture field. For x86_64 kernels, this value is 0 (i386 architecture). Even when we later switch to long mode, the Multiboot2 header still uses this architecture field value; it handles the transition to protected mode, and then we handle the transition to long mode ourselves.

Header length

The next thing that’s required is a header length. We could use .long and count out exactly how many bytes our header is, but there’s two reasons why we’re not doing that:

  1. Computers should do math, not people.
  2. We’re going to add more stuff, and we’d have to recalculate this number each time. Or wait until the end and come back. See #1.

Here’s what this looks like:

header_start:
    .long 0xe85250d6 # magic number
    .long 0 # architecture (0 for i386)
    .long header_end - header_start # total header length
header_end:

The header_start: and header_end: things are called “labels”. Labels let us use a name to refer to a particular part of our code. Labels also refer to the memory occupied by the data and code which directly follows it. So in our code above, the label header_start points directly to the memory at the very beginning of our magic number and thus to the very beginning of our header.

Our third .long line uses those two labels to do some math: the header length is the value of header_end minus the value of header_start. Because header_start and header_end are just the addresses of places in memory, we can simply subtract to get the distance between those two addresses. When we compile this assembly code, the assembler will do this calculation for us. No need to figure out how many bytes there are by hand. Awesome.

You’ll also notice that I indented the .long statements. Usually, labels go in the first column, and you indent actual instructions. How much you indent is up to you; it’s a pretty flexible format.

The Checksum

The fourth field Multiboot requires is a “checksum”. The idea is that we sum up some numbers, and then use that number to check that they’re all what we expected things to be. It’s similar to a hash, in this sense: it lets us and GRUB double-check that everything is accurate.

Here’s the checksum:

header_start:
    .long 0xe85250d6 # magic number
    .long 0 # architecture (0 for i386)
    .long header_end - header_start # total header length

    .long -(0xe85250d6 + 0 + (header_end - header_start)) # checksum
header_end:

Again, we’ll use math to let the computer calculate the sum for us. We add up the magic number, the architecture code, and the header length, and then negate that sum to get the checksum. This way, when we add the checksum back to the other three values, we should get zero. .long then puts that value into this spot in our file.

Ending tag

After the checksum, you can list a series of “tags”, which is a way for the OS to tell the bootloader to do some extra things before handing control over to the OS, or to give the OS some extra information once started. We don’t need any of that yet, though, so we just need to include the required “end tag”, which looks like this:

header_start:
    .long 0xe85250d6 # magic number
    .long 0 # architecture (0 for i386)
    .long header_end - header_start # total header length

    .long -(0xe85250d6 + 0 + (header_end - header_start)) # checksum

    # required end tag
    .short 0 # type
    .short 0 # flags
    .short 8 # size
header_end:

Here, we use .short to define a 16-bit value. The Multiboot specification demands that this be exactly a 16-bit value. You’ll find that this is super common in operating systems: the exact size and amount of everything matters. It’s just a side-effect of working at a low level.

The Section and Alignment

We have two last things to do: add a “section” annotation and make sure our header is aligned to an 8-byte boundary. We’ll talk more about sections later, so for now, just put what I tell you at the top of the file.

Alignment is a way to make sure that our data is stored in memory in a certain way. For example, if we have two 3-byte values stored back to back, you might expect them to be stored like this:

| 0x00 | 0x01 | 0x02 | 0x03 | 0x04 | 0x05 |
|  A   |  A   |  A   |  B   |  B   |  B   |

However, there could be many reasons we don’t want them to be stored like that. The most important reason is that the CPU might be optimized to read data that is stored in a certain way. For example, it might be optimized to read data in 8-byte chunks. Often, when we are working with some specification (like Multiboot), the specification will also require that our data be stored in a certain way. Still taking the example above, if we wanted to align the second value to an 8-byte boundary, it would look like this:

| 0x00 | 0x01 | 0x02 | 0x03 | 0x04 | 0x05 | 0x06 | 0x07 |
|  A   |  A   |  A   |      |      |      |      |      |
| 0x08 | 0x09 | 0x0A | 0x0B | 0x0C | 0x0D | 0x0E | 0x0F |
|  B   |  B   |  B   |      |      |      |      |      |

In the case of Multiboot, it requires that the header be aligned to an 8-byte boundary. This means that the header must start at a memory address that is a multiple of 8. To align our header to an 8-byte boundary, we can use the .p2align directive, which tells the assembler to align the next data to a boundary that is a power of 2. Since 8 is 2^3, we use .p2align 3 to align to an 8-byte boundary.

Here’s the final file:

.section .multiboot_header
.p2align 3 # 8 bytes alignment
header_start:
    .long 0xe85250d6 # magic number
    .long 0 # architecture (0 for i386)
    .long header_end - header_start # total header length

    .long -(0xe85250d6 + 0 + (header_end - header_start)) # checksum

    # required end tag
    .short 0 # type
    .short 0 # flags
    .short 8 # size
header_end:

That’s it! Congrats, you’ve written a multiboot compliant header. It’s a lot of esoterica, but it’s pretty straightforward once you’ve seen it a few times.

Assembling with GNU assembler

We can’t use this file directly, we need to turn it into binary. We can use a program called an “assembler” to “assemble” our assembly code into binary code. It’s very similar to using a “compiler” to “compile” our source code into binary. But when it’s assembly, people often use the more specific name.

We have a few options for assemblers, but the one we’ll be using is the GNU assembler, or as (sometimes called “gas”). It’s a common assembler that comes with the GNU toolchain, and it’s widely used in the industry. It’s also the assembler that gcc uses under the hood when you compile C code. Here, we don’t call as directly, but instead we use gcc to invoke it for us. Remember, gcc is a driver program that calls a bunch of different tools to do the work of compiling and linking. When we call gcc with an assembly file, it knows to call as to assemble it.

$ gcc -c multiboot_header.S -o multiboot_header.o

The option -c says “compile this file, but don’t link it yet”. This is because we just want to turn our assembly code into an object file, which is a binary representation of our code that still has some metadata in it. We don’t want to link it yet because we haven’t written the rest of our kernel code that we need to link it with.

After you run this command, you should see a multiboot_header.o file in the same directory. This is our “object file”, hence the .o. Don’t let the word “object” confuse you. It has nothing to do with anything object oriented. “Object files” are just binary code with some metadata in a particular format — in our case ELF. Later, we’ll take this file and use it to build our OS.

You can inspect the bytes of the header with hexdump (depending on the environment the address may be different but the following content should be somewhere in the ouput).

> hexdump -x multiboot_header.o
0000040    50d6    e852    0000    0000    0016    0000    af14    17ad
0000050    0000    0000    0008    0000

Summary

Congratulations! This is the first step towards building an operating system. We learned about the boot process, the GRUB bootloader, and the Multiboot specification. We wrote a Multiboot-compliant header file in assembly code, and used GNU as through gcc to create an object file from it.

Next, we’ll write the actual code that prints “Hello world” to the screen.

Hello, World!

Now that we’ve got the headers out of the way, let’s do the traditional first program: Hello, world!

The smallest kernel

Our hello world will be just 20 lines of assembly code. Let’s begin. Open a file called boot.S (still, case matters) and put this in it:

start:
    hlt

You’ve seen the name: form before: it’s a label. This lets us name a line of code. We’ll call this label start, which is the traditional name. GRUB will use this convention to know where to begin.

The hlt statement is our first bit of “real” assembly. So far, we had just been declaring data. This is actual, executable code. It’s short for “halt”. In other words, it ends the program.

By giving this line a label, we can call it, sort of like a function. That’s what GRUB does: “Call the function named start.” This function has just one line: stop.

Unlike many other languages, you’ll notice that there’s no way to say if this “function” takes any arguments or not. We’ll talk more about that later.

This code won’t quite work on its own though. We need to do a little bit more bookkeeping first. Here’s the next few lines:

.global start

.code32
section .text
start:
    hlt

Three new bits of information. The first:

.global start

This says “I’m going to define a label start, and I want it to be available outside of this file.” If we don’t say this, GRUB won’t know where to find its definition. You can kind of think of it like a “public” annotation in other languages.

.code32

GRUB will boot us into protected mode, aka 32-bit mode. Similar to how the xv6 bootloader starts in 16-bit real mode, GRUB is loaded by the BIOS and switches into 32-bit protected mode for us. Our bootstrap starts in .code32, and later in the homework we will switch into 64-bit mode before calling C.

section .text

We saw section briefly, but I told you we’d get to it later. The place where we get to it is at the end of this chapter. For the moment, all you need to know is this: code goes into a section named .text. Everything that comes after the section line is in that section, until another section line.

That’s it! We could theoretically stop here, but instead, let’s actually print the “Hello world” text to the screen. We’ll start off with an “H':

.global start

.code32
.section .text
start:
    movw $0x0248, 0xb8000 # H
    hlt

This new line is the most complicated bit of assembly we’ve seen yet. There’s a lot packed into this little line.

The first important bit is movw. This is one variant of the mov instruction (short for “move”), and the w stands for “word”. Remember, when x86 first started, it was a 16-bit architecture. That meant that the amount of data that could be held in a CPU register (or one “word” as it’s commonly known) was 16 bits. To transition to a 32-bit architecture without losing backwards compatibility, x86 got the concept of a “double word”, or double 16 bits.

Anyway, the mov instruction sorta looks like this:

mov{size} thing, place

where size could be b for byte, w for word, l for long (32 bits), or q for quad (64 bits). Oh, # starts a comment, remember? So the # H is just for us. I put this comment here because this line prints an H to the screen!

Yup, it does. Okay, so here’s why: mov copies thing into place. The amount of stuff it copies is determined by size.

#  size thing    place
#  |    |        |
#  V    V        V
movw    $0x0248, 0xb8000 # H

“Copy one word: the number 0x0248 to the memory location 0xb8000.”

The thing looks like a number just like 0xb8000, but it has a dollar sign $ in front of it. This dollar sign is special. It means “this is an immediate value”. In other words, we’re copying the number 0x0248 itself, not the value stored at some memory location. That’s what this line does. In the meantime, the place is just a number without a dollar sign, which actually is not a plain number — it refers to the memory location 0xb8000. So we’re copying the number 0x0248 to the memory location 0xb8000.

Why? Well, we’re using the screen as a “memory mapped” device. Specific positions in memory correspond to certain positions on the screen. And the position 0xb8000 is one of those positions: the upper-left corner of the screen.

By the way…

“Memory mapping” is one of the fundamental techniques used in computer engineering to help the CPU know how to talk to all the different physical components of a computer. The CPU itself is just a weird little machine that moves numbers around. It’s not of any use to humans on its own: it needs to be connected to devices like RAM, hard drives, a monitor, and a keyboard. The way the CPU does this is through a bus, which is a huge pipeline of wires connecting the CPU to every single device that might have data the CPU needs. There’s one wire per bit (since a wire can store a 1 or a 0 at any given time). A 32-bit bus is literally 32 wires in parallel that run from the CPU to a bunch of devices like Christmas lights around a house.

There are two buses that we really care about in a computer: the address bus and the data bus. There’s also a third signal that lets all the devices know whether the CPU is requesting data from an input (reading, like from the keyboard) or sending data to an output (writing, like to the monitor via the video card). The address bus is for the CPU to send location information, and the data bus is for the CPU to either write data to or read data from that location. Every device on the computer has a unique hardcoded numerical location, or “address”, literally determined by how the thing is wired up at the factory. In the case of an input/read operation, when it sends 0x1001A003 out on the address bus and the control signal notifies every device that it’s a read operation, it’s asking: “What is the data currently stored at location 0x1001A003?” If the keyboard happens to be identified by that particular address, and the user is pressing SPACE at this time, the keyboard says, “Oh, you’re talking to me!” and sends back the ASCII code 0x00000020 (for “SPACE”) on the data bus.

What this means is that memory on a computer isn’t just representing things like RAM and your hard drive. Actual human-scale devices like the keyboard, the mouse, the video card have their own memory locations too. But instead of writing a byte to a hard drive for storage, the CPU might write a byte representing some color and symbol to the monitor for display. There’s an industry standard somewhere that says video memory must live in the address range beginning 0xb8000. In order for computers to work out of the box, this means the BIOS needs to be manufactured to assume video memory lives at that location and the motherboard (which is where the bus is all wired up) has to be manufactured to route a 0xb8000 request to the video card. It’s kind of amazing this stuff works at all! Anyway, “memory mapped hardware”, or “memory mapping” for short, is the name of this technique.

Now, we are copying 0x0248. Why this number? Well, it’s in three parts:

 __ background color
/  __foreground color
| /
V V
0 2 48 <- letter, in ASCII

We’ll start at the right. First, two numbers are the letter, in ASCII. H is 72 in ASCII, and 48 is 72 in hexadecimal: (4 * 16) + 8 = 72. So this will write H.

The other two numbers are colors. There are 16 colors available, each with a number. Here’s the table:

| Value | Color          |
|-------|----------------|
| 0x0   | black          |
| 0x1   | blue           |
| 0x2   | green          |
| 0x3   | cyan           |
| 0x4   | red            |
| 0x5   | magenta        |
| 0x6   | brown          |
| 0x7   | gray           |
| 0x8   | dark gray      |
| 0x9   | bright blue    |
| 0xA   | bright green   |
| 0xB   | bright cyan    |
| 0xC   | bright red     |
| 0xD   | bright magenta |
| 0xE   | yellow         |
| 0xF   | white          |

So, 02 is a black background with a green foreground. Classic. Feel free to change this up, use whatever combination of colors you want!

So this gives us a H in green, over black. Next letter: e.

.global start

.code32
.section .text
start:
    movw $0x0248, 0xb8000 # H
    movw $0x0265, 0xb8002 # e
    hlt

Lower case e is 65 in ASCII, at least, in hexadecimal. And 02 is our same color code. But you’ll notice that the memory location is different.

Okay, so we copied four hexadecimal digits into memory, right? For our H. 0248. A hexadecimal digit has sixteen values, which is 4 bits (for example, 0xf would be represented in bits as 1111). Two of them make 8 bits, i.e. one byte. Since we need half a word for the colors (02), and half a word for the H (48), that’s one word in total (or two bytes). Each place that the memory address points to can hold one byte (a.k.a. 8 bits or half a word). Hence, if our first memory position is at 0, the second letter will start at 2.

You might be wondering, “If we’re in 32 bit mode, isn’t a word 32 bits?” since sometimes “word” is used to talk about native CPU register size. Well, the “word” keyword in the context of x86_64 assembly specifically refers to 2 bytes, or 16 bits of data. This is for reasons of backwards compatibility.

This math gets easier the more often you do it. And we won’t be doing that much more of it. There is a lot of working with hex numbers in operating systems work, so you’ll get better as we practice.

With this, you should be able to get the rest of Hello, World. Go ahead and try if you want: each letter needs to bump the location twice, and you need to look up the letter’s number in hex.

If you don’t want to bother with all that, here’s the final code:

.global start

.code32
.section .text
start:
    movw $0x0248, 0xb8000 # H
    movw $0x0265, 0xb8002 # e
    movw $0x026c, 0xb8004 # l
    movw $0x026c, 0xb8006 # l
    movw $0x026f, 0xb8008 # o
    movw $0x022c, 0xb800a # ,
    movw $0x0220, 0xb800c # (space)
    movw $0x0277, 0xb800e # w
    movw $0x026f, 0xb8010 # o
    movw $0x0272, 0xb8012 # r
    movw $0x026c, 0xb8014 # l
    movw $0x0264, 0xb8016 # d
    movw $0x0221, 0xb8018 # !
    hlt

Finally, now that we’ve got all of the code working, we can assemble our boot.S file with gcc (actually as under the hood), just like we did with the multiboot_header.S file:

$ gcc -c boot.S -o boot.o

This will produce a boot.o file. We’re almost ready to go!

Linking it together

Okay! So we have two different .o files: multiboot_header.o and boot.o. But what we need is one file with both of them. Our OS doesn’t have the ability to do anything yet, let alone load itself in two parts somehow. We just want one big binary file.

Enter “linking”. If you haven’t worked in a compiled language before, you probably haven’t had to deal with linking before. Linking is how we’ll turn these two files into a single output: by linking them together.

Open up a file called linker.ld and put this in it:

ENTRY(start)

SECTIONS {
  . = 0x100000; /* Tells GRUB to load the kernel starting at the 1MB mark */

  .rodata :
  {
    /* ensure that the multiboot header is at the beginning */
    KEEP(*(.multiboot_header))
    *(.rodata .rodata.*)
    . = ALIGN(4K);
  }

  .text :
  {
    *(.text .text.*)
    . = ALIGN(4K);
  }

  .data :
  {
    *(.data .data.*)
    . = ALIGN(4K);
  }

  .bss :
  {
    *(.bss .bss.*)
    . = ALIGN(4K);
  }
}

This is a “linker script”. It controls how our linker will combine these files into the final output. Let’s take it bit-by-bit:

ENTRY(start)

This sets the “entry point” for this executable. In our case, we called our entry point by the name people use: start. Remember? In boot.S? Same name here.

SECTIONS {

Okay! I’ve been promising you that we’d talk about sections. Everything inside of these curly braces is a section. We annotated parts of our code with sections earlier, and here, in this part of the linker script, we will describe each section by name and where it goes in the resulting output.

. = 0x100000;

This line means that we will start putting sections at the one megabyte mark. This is the conventional place to put a kernel, at least to start. Below one megabyte is all kinds of memory-mapped stuff. Remember the VGA stuff? It wouldn’t work if we mapped our kernel’s code to that part of memory… garbage on the screen!

.rodata :

This will create a section named rodata. And inside of it…

*(.multiboot_header)

… goes every section named multiboot_header. Remember how we defined that section in multiboot_header.S? It’ll be here, at the start of the boot section. That’s what we need for GRUB to see it.

.text :

Next, we define a text section. The text section is where you put code. And inside of it…

*(.text)

… goes every section named .text. See how this is working? The syntax is a bit weird, but it’s not too bad.

We do the same for the code and bss section.

That’s it for our script! We can then use ld to link all of this stuff together:

$ gcc -c -m64 -fno-pie -no-pie main.c -o main.o
$ gcc -c -m64 -fno-pie -no-pie console.c -o console.o
$ ld -T linker.ld -o kernel.bin multiboot_header.o boot.o main.o console.o

(On CADE machines, the command should be ld --nmagic -T linker.ld -o kernel.bin multiboot_header.o boot.o.)

Recall that on macOS you will want to use the linker from a cross-compiling toolchain, with the name x86_64-elf-ld.

By running this command, we do a few things:

-T linker.ld

This is the linker script we just made, we ask the linker to use it.

-o kernel.bin

This sets the name of our output file. In our case, that’s kernel.bin. We’ll be using this file in the next step. It’s our whole kernel!

multiboot_header.o boot.o

Finally, we pass all the .o files we want to link together.

That’s it! We’ve now got our kernel in the kernel.bin file. Next, we’re going to make an ISO out of it, so that we can load it up in QEMU.

Making an ISO

Now that we have our kernel.bin, the next step is to make an ISO. Remember compact discs? Well, by making an ISO file, we can both test our Hello World kernel in QEMU, as well as running it on actual hardware!

To do this, we’re going to use a GRUB tool called grub2-mkrescue. We have to create a certain structure of files on disk, run the tool, and we’ll get an hello.iso file at the end.

Doing so is not very much work, but we need to make the files in the right places. First, we need to make several directories:

$ mkdir -p build/isofiles/boot/grub

The -p flag to mkdir will make the directory we specify, as well as any “parent” directories, hence the p. In other words, this will make a build directory with a isofiles directory inside that has boot inside, and finally the grub directory inside of that.

Next, create the grub.cfg file inside of that build/isofiles/boot/grub directory, and put this in it:

set timeout=0
set default=0

menuentry "cs5460os" {
    multiboot2 /boot/kernel.bin
    boot
}

This file configures GRUB. Let’s talk about the menuentry block first. GRUB lets us load up multiple different operating systems, and it usually does this by displaying a menu of OS choices to the user when the machine boots. Each menuentry section corresponds to one of these. We give it a name, in this case, cs5460os, and then a little script to tell it what to do. First, we use the multiboot2 command to point at our kernel file. In this case, that location is /boot/kernel.bin. Remember how we made a boot directory inside of isofiles? Since we’re making the ISO out of the isofiles directory, everything inside of it is at the root of our ISO. Hence /boot.

Let’s copy our kernel.bin file there now:

manvik@DESKTOP-KSD9ND5:~/classes/os/hw3/src$ cp kernel.bin build/isofiles/boot/

Finally, the boot command in grub.cfg says “that’s all the configuration we need to do, boot it up.”

But what about those timeout and default settings? Well, the default setting controls which menuentry we want to be the default. The numbers start at zero, and since we only have that one, we set it as the default. When GRUB starts, it will wait for timeout seconds, and then choose the default option if the user didn’t pick a different one. Since we only have one option here, we just set it to zero, so it will start up right away.

The final layout should look like this:

build/
|---isofiles/
    |---boot
        |-- grub
        |   |-- grub.cfg
        |-- kernel.bin

Now create the bootable ISO from build/isofiles.

Use the command name that exists on your system:

# Fedora/RHEL/CentOS
$ grub2-mkrescue -o hello.iso build/isofiles

# Debian/Ubuntu
$ grub-mkrescue -o hello.iso build/isofiles

You should get something like this

manvik@DESKTOP-KSD9ND5:~/classes/os/hw3/src$ grub-mkrescue -o hello.iso build/isofiles/
xorriso 1.5.6 : RockRidge filesystem manipulator, libburnia project.

Drive current: -outdev 'stdio:hello.iso'
Media current: stdio file, overwriteable
Media status : is blank
Media summary: 0 sessions, 0 data blocks, 0 data,  917g free
Added to ISO image: directory '/'='/tmp/grub.POwzr4'
xorriso : UPDATE :     295 files added in 1 seconds
Added to ISO image: directory '/'='/home/manvik/classes/os/hw3/src/build/isofiles'
xorriso : UPDATE :     299 files added in 1 seconds
xorriso : NOTE : Copying to System Area: 512 bytes from file '/usr/lib/grub/i386-pc/boot_hybrid.img'
ISO image produced: 2510 sectors
Written to medium : 2510 sectors at LBA 0
Writing to 'stdio:hello.iso' completed successfully.

If you are on Debian/Ubuntu and do not have the command yet, install:

sudo apt update
sudo apt install grub-common grub-pc-bin xorriso mtools

If you are on Fedora/RHEL/CentOS and do not have the command yet, install:

sudo dnf install grub2-tools-extra xorriso mtools

The -o flag controls the output filename, which we choose to be hello.iso. And then we pass it the directory to make the ISO out of, which is the build/isofiles directory we just set up.

Note, if you’re on a CADE machine, likely you don’t have the GRUB i386 module, so add the following option -d /home/cs5460/grub/lib/grub/i386-pc (to use the i386) to the grub2-mkrescue command.

$ grub2-mkrescue -d /home/cs5460/grub/lib/grub/i386-pc -o hello.iso build/isofiles

After this, you have an hello.iso file with our teeny kernel on it. You could burn this to a USB stick or CD and run it on an actual computer if you wanted to! But doing so would be really annoying during development. So in the next section, we’ll use an emulator, QEMU, to run the ISO file on our development machine.

Troubleshooting GRUB issues

There is a chance you might encounter the following issue:

grub2-mkrescue: error: xorriso not found

Solution: if on your own machine, install xorriso. On CADE, xorriso should already be installed.

On some systems, the command might be grub-mkrescue instead of grub2-mkrescue. If you get a “command not found” error, try the other one.

On macOS, even if you installed GRUB and have grub-mkrescue, you still won’t be able to create a QEMU-bootable ISO. If you still want to develop on macOS, you can use Docker to create the ISO file in a Linux environment, and then copy it back to your machine. The following is an example docker command to do that:

docker run --platform linux/amd64 -it --rm -v $(pwd):/root -w /root ubuntu:latest bash -c "
apt update && \
apt install -y grub2-common grub-pc-bin xorriso mtools && \
grub-mkrescue -o hello.iso build/isofiles"

Running in QEMU

Let’s actually run our kernel! To do this, we’ll use QEMU, a full-system emulator. Using QEMU is fairly straightfoward.

If you’re running on CADE inside an ssh terminal you don’t have a GUI interface, hence we need to use -display curses, Curses is a library that is designed to facilitate GUI-like functionality on a text-only device (see a wiki page)

$ qemu-system-x86_64 -display curses -cdrom hello.iso

Type it in, hit Enter, and you should see green Hello, world! at the top-left corner of the terminal. (To exit, hit Alt+2, or Esc then 2, and type quit in the console.)

If you’re running on your own machine with a GUI terminal you can simply run:

$ qemu-system-x86_64 -cdrom hello.iso

If you prefer TUI, you can also use:

qemu-system-x86_64 -boot d -cdrom hello.iso -nographic -serial mon:stdio -no-reboot

You should see something what really looks like a screen of the computer with Hello, world! (To exit, hit Alt+2 and type quit in the console.)

If it shows up for you, congrats! If not, something may have gone wrong. Double-check that you followed the examples exactly. Maybe you missed something, or made a mistake while copying things down.

Note all of the other stuff around the Hello World message: this part may look different depending on your version of GRUB, and since we didn’t clear the screen, GRUB’s output remains visible. We’ll write a function to do that eventually.

Let’s talk about this command before we move on:

qemu-system-x86_64

We’re running the x86_64 variant of QEMU. In this homework the kernel transitions from 32-bit protected mode into 64-bit long mode, so this matches the target architecture directly. QEMU emulates an x86-64 machine, and our early 32-bit bootstrap code still runs correctly because 32-bit mode is part of that architecture.

-cdrom hello.iso

We’re going to start QEMU with a CD-ROM drive, and its contents are the hello.iso file we made.

That’s it. Even though none of those steps was too complicated on its own, there were a lot of them. Each time we make a change, we have to go through the whole process again. In the next section, we’ll use Make to automate it.

Troubleshooting QEMU issues

If you see the following error on the screen:

error: no multiboot header found.
error: you need to load the kernel first.

Add --nmagic to the ld command when linking the kernel.

If you see the following error on the screen:

Boot failed: Could not read from CDROM (code 0004)

See the i386 module issue on CADE above.

If you see the following error on the screen:

Boot failed: Could not read from CDROM (code 0009)

Try running the command sudo apt install grub-pc-bin, or similar command for your package manager. See here.

Automation with Make

Typing all of these commands out every time we want to build the project is tiring and error-prone. It’s nice to be able to have a single command that builds our entire project. To do this, we’ll use make. Download this Makefile and look over it.

To make this Makefile work, create a boot folder in the same directory as Makefile and put the grub.cfg file you created earlier into that folder. Your tree should look like this (copy any missing files from src.tgz):

$ tree
.
|-- Makefile
|-- boot
|   `-- grub.cfg
|-- boot.S
|-- console.c
|-- console.h
|-- linker.ld
|-- main.c
|-- mmu.h
|-- multiboot_header.S
`-- types.h

The makefile starts by defining several variables kernel, iso, linker_script, and grub_cfg that define names of the output files we want to make. It also finds all the assembly and C source files in the current directory, and creates lists of their corresponding object files.

kernel := build/kernel.bin
iso := build/hello.iso

linker_script := linker.ld
grub_cfg := boot/grub.cfg

We then create two lists: a list of assembly files in the folder assembly_source_files, and a list of C source files, c_source_files. We then use the patsubst command to generate two more lists with the same file names but with .o as the extension:

assembly_source_files := $(wildcard *.asm)
assembly_object_files := $(patsubst %.asm, build/%.o, $(assembly_source_files))
c_source_files := $(wildcard *.c)
c_object_files := $(patsubst %.c, build/%.o, $(c_source_files))

CFLAGS is a variable that defines all flags to the GCC compiler.

CFLAGS = -fno-pic -static -fno-builtin -fno-strict-aliasing -Og -Wno-infinite-recursion -Wall -MD -ggdb -Werror -fno-omit-frame-pointer -mno-default -fno-pie -no-pie -m64 -mcmodel=large

Next, there is a bunch of code trying to find the correct gcc, ld, qemu, and grub2-mkrescue commands. This is to make sure the Makefile works in different environments, including CADE and your own machine. However, if you encounter issues with this part and are very sure about the correct command to use, you can just set the variables directly.

Finally, we have the actual rules.

.PHONY: all clean qemu qemu-nox qemu-gdb qemu-gdb-nox iso kernel

all: $(kernel)

clean:
	rm -r build

qemu: $(iso)
	$(QEMU) -cdrom $(iso) -display curses -vga std -serial file:serial.log

qemu-nox: $(iso)
	$(QEMU) -m 128 -cdrom $(iso) -vga std -no-reboot -nographic

qemu-gdb: $(iso)
	$(QEMU) -S -m 128 -cdrom $(iso) -display curses -vga std -s -serial file:serial.log -no-reboot -no-shutdown -d int,cpu_reset

qemu-gdb-nox: $(iso)
	$(QEMU) -S -m 128 -cdrom $(iso) -vga std -s -serial file:serial.log -no-reboot -no-shutdown -d int,cpu_reset -nographic

iso: $(iso)
	@echo "Done"

$(iso): $(kernel) $(grub_cfg)
	@mkdir -p build/isofiles/boot/grub
	cp $(kernel) build/isofiles/boot/kernel.bin
	cp $(grub_cfg) build/isofiles/boot/grub
ifneq (,$(findstring Darwin,$(UNAME_A)))
	@echo "Building ISO via Docker (AMD64) to support BIOS booting..."
	$(DOCKER_CMD) bash -c "$(APT_INSTALL) && grub-mkrescue -o $(iso) build/isofiles"
else
	$(mkrescue) $(MKRESCUEFLAGS) -o $(iso) build/isofiles #2> /dev/null
endif

kernel: $(kernel)
	@echo "Done"

$(kernel): $(c_object_files) $(assembly_object_files) $(linker_script)
	$(LD) $(LDFLAGS) -T $(linker_script) -o $(kernel) $(assembly_object_files) $(c_object_files)

# compile C files
build/%.o: %.c
	@mkdir -p $(shell dirname $@)
	$(CC) $(CFLAGS) -c $< -o $@

# compile assembly files
build/%.o: %.S
	@mkdir -p $(shell dirname $@)
	$(CC) $(CFLAGS) -c $< -o $@

Our default action is all (it will build the kernel by invoking the linker). Of course before linking the kernel, all object files have to be compiled.

It’s also useful to add targets that describe specific actions. To run the kernel in QEMU, we add a rule:

qemu: $(iso)
        qemu-system-x86_64 -cdrom $(iso) -vga std -s -serial file:serial.log

Finally, there’s another useful common rule: clean. The clean rule should remove all of the generated files, and allow us to do a full re-build.

Now there’s just one more wrinkle. We have four targets that aren’t really files on disk, they are just actions: default, build, run and clean. Remember we said earlier that make decides whether or not to execute a command by comparing the last time a target was built with the last-modified-time of its prerequisites? Well, it determines the last time a target was built by looking at the last-modified-time of the target file. If the target file doesn’t exist, then it’s definitely out-of-date so the command will be run.

But what if we accidentally create a file called clean? It doesn’t have any prerequisites so it will always be up-to-date and the commands will never be run! We need a way to tell make that this is a special target, it isn’t really a file on disk, it’s an action that should always be executed. We can do this with a magic built-in target called .PHONY:

.PHONY: all clean qemu qemu-nox qemu-gdb qemu-gdb-nox iso kernel

Paging

Up until now we did a lot of work that wasn’t actually writing kernel code. So let’s review what we’re up to:

  1. GRUB loaded our kernel, and started running it.
  2. We’re currently running in “protected mode”, a 32-bit environment.
  3. We still need to set up x86_64 paging and explicitly enter long mode.

Our plan now:

  1. Build a temporary page table.
  2. Enable paging.
  3. Load a 64-bit-capable GDT.
  4. Far jump into 64-bit code.
  5. Set stack and call main().

Paging, the concept

Paging is implemented by a part of the CPU called an “MMU”, for “memory management unit”. The MMU will translate virtual addresses into their respective physical addresses automatically; we can write all of our software with virtual addresses only. The MMU does this with a data structure called a “page table”. As an operating system, we load up the page table with a certain data structure, and then tell the CPU to enable paging. This is the task ahead of us; it’s required to set up paging before we transition to long mode.

How should we do our mapping of virtual to physical addresses? You can make this easy, or complex, and it depends on exactly what you want your OS to be good at. Some strategies are better than others, depending on the kinds of programs you expect to be running. We’re going to keep it simple, and use a strategy called “identity mapping”. This means that every virtual address will map to a physical address of the same number. Nothing fancy.

Let’s talk more about the page table. In x86_64 long mode, the page table is four levels deep, and the most common page size is 4096 bytes. Here are the four levels:

Each of these tables is an array of entries. The size of each entry is 8 bytes, and the size of each table is 4096 bytes (so that it fits in one page). Therefore, each table has 512 entries (4096 / 8 = 512). So, to index into one of these tables, we need 9 bits (since 2^9 = 512). Inside one page, we have 12 bits for the offset (since 2^12 = 4096). Combined together, we have 9 bits for the index in PML4, 9 bits for the index in PDPT, 9 bits for the index in PD, 9 bits for the index in PT, and 12 bits for the offset within the page. This adds up to 48 bits, which is the size of canonical virtual addresses in x86_64.

Each of the entries in these tables is 64 bits (8 bytes), which matches the size of a 64-bit address. But we do not need all 64 bits to represent an address, and not all 64 bits are used for the address portion anyway. Naturally, since the page size is 4096 bytes, we do not need the last 12 bits to represent the address of a page, so those bits can be used for flags. Even beyond that, x86-64 does not use all 64 bits for physical addresses, so additional bits are available for flags as well. The main idea is that each page-table entry stores both the physical address of the next-level table (or final page frame) and the flags that tell the MMU how to interpret that entry.

However, during the transition into long mode, we will not use the full 4-level page-table setup yet. The reason is practical.

The main reason is that we need to make sure the address of the code we’re currently running is still valid and points to the right place after paging is enabled. Because of how the code is linked and loaded, our code will be located around the 1MB mark in physical memory. So we need to set up the page tables so that the address of our code, and all registers pointing to it, still refers to the same place as before paging is enabled. The most straightforward way to do this is to set up an identity mapping covering those addresses. However, using 4-level page tables to cover the address space around 1MB would require filling in a lot of entries, which is a lot of work, especially in assembly before we have switched into C.

Luckily, x86_64 long mode also supports a 2MB page size, usually called a “huge page”. One such huge page can cover the address space from 0 to 2MB, which is more than enough for our code. This is done by setting a specific flag in the page-directory entry, which tells the hardware to treat this entry as mapping a huge page instead of pointing to the next level page table. This lets us set up the page tables with only one entry per level while still creating an identity mapping for the address space around 1MB. So in this part of the bootstrapping process, we use a 3-level setup with a huge page instead of a full 4-level page-table structure.

Paging, the technical details

Transitioning a processor from a standard state to 64-bit long mode requires a specific sequence of steps. Conceptually, the process is:

  1. Set the PAE enable bit in CR4,
  2. Load CR3 with the physical address of the PML4 (Level 4 Page Map).
  3. Enable long mode by setting the LME flag (bit 8) in MSR 0xC0000080 (aka EFER),
  4. Enable paging by setting the PG bit in CR0.

You might be asking, “What are all these abbreviations? What are CR3, CR4, MSR, EFER, LME, PAE?” The following sections will explain all of these terms, and why they are important for bootstrapping into long mode.

PAE and CR4

PAE stands for Physical Address Extension. CR4 is a control register used to control processor features, and there is a specific bit in CR4 that enables PAE. In older 32-bit systems, memory addresses were limited to 32 bits, capping RAM at 4GB. PAE was originally designed to allow 32-bit processors to access more than 4GB of physical memory by expanding the page table entries. Long mode requires PAE to be enabled because 64-bit paging structures are based on the PAE format. You cannot enter Long Mode without this bit being set first. So even though we are heading to long mode, we still must set the PAE bit in CR4 to 1 while in protected mode before enabling paging for long mode.

CR3 and PML4

CR3 is another control register, also known as the Page Directory Base Register (PDBR). It holds the physical address of the top-level page table (the PML4). When we enable paging, the CPU will use the address in CR3 to find the PML4, and then use the page tables to translate virtual addresses to physical addresses. So before we can enable paging, we need to load CR3 with the physical address of our PML4. (Of course, we need to have already set up the PML4 and the rest of the page tables in memory before we can do this.)

EFER, LME, and MSRs

MSR stands for Model-Specific Register. These are special registers that control various features of the CPU. They are accessed and modified using the rdmsr and wrmsr instructions. EFER (Extended Feature Enable Register) is one of them. This register controls “extended” features that weren’t part of the original Pentium architecture. EFER has a specific bit called LME (Long Mode Enable) that we need to set in order to enable long mode. This is a separate step from enabling paging, and it must be done before enabling paging.

CR0 and PG

CR0 is another control register that controls various aspects of the processor’s operation. The PG (Paging) bit in CR0 is what actually turns on paging. When we set this bit to 1, the CPU starts using the page tables to translate virtual addresses to physical addresses. This is the final step in enabling long mode, but it must be done after setting up CR3, CR4, and EFER.

Paging, the code

All the above processes need to write to specific registers, and using assembly is the most straightforward way to do that.

Include Macros

To make our life easier, we can define some macros for the flags we need to set in the page table entries. The provided mmu.h file defines these macros for us. We can include this header file in our assembly code to use these macros. To do this, we need to add the following line at the top of our boot.S file:

#include "mmu.h"

Yes, you can include header files in assembly code! The syntax is the same as in C. This is because we make the assembly files have the .S (capital S) extension, which tells the build system to run them through the C preprocessor before assembling them. If we had used the .s (lowercase s) extension, then the preprocessor would not run, and the #include directive would not work. That is why the case of the file extension matters here.

Allocating Space for Page Tables

Before we can load CR3 with the physical address of the PML4, we need to have the space for the page tables allocated. We can do this in assembly by reserving some space in the .bss section:

.section .bss
.p2align 12 # 4KB alignment
/* Reserve memory for the 3 page tables (4KB each) */
p4_table: .skip 4096
p3_table: .skip 4096
p2_table: .skip 4096

The .section .bss directive tells the assembler to put the following in the BSS section. The .p2align 12 directive ensures that the next data is aligned to a 4096-byte boundary, which is required for page tables. We then reserve 4096 bytes for each of the three page tables (PML4, PDPT, and PD) using the .skip directive. .skip just reserves a certain number of bytes in the output.

Technically, this piece of code can be put in any of the two assembly files, multiboot_header.S or boot.S, since they will be linked together. However, it makes more sense to put it in boot.S, since it’s more related to the bootstrapping process, and that saves us writing .global directives for these labels.

Filling in Page Table Entries

After we have reserved space for the page tables, we need to fill in the entries in these tables to set up our identity mapping. This is done by the following code:

    mov $p3_table, %eax
    or $(PTE_P | PTE_W), %eax
    mov %eax, p4_table

    mov $p2_table, %eax
    or $(PTE_P | PTE_W), %eax
    mov %eax, p3_table

    mov $(PTE_P | PTE_W | PTE_PS), %eax
    mov %eax, p2_table

These code are in the start function in the .text section we defined in boot.S. You could put this code after the Hello World message, or before it, or replacing it.

Now we explain this code step by step.

    mov $p3_table, %eax
    or $(PTE_P | PTE_W), %eax
    mov %eax, p4_table

This code sets up an entry in the PML4 (Level 4 Page Map) that points to the PDPT (Page Directory Pointer Table). Remember how mov works? It moves the value of the first argument (the “thing”) into the second argument (the “place”). Here, the “thing” is $p3_table. Remember we said that the dollar sign means “immediate value”? So, $p3_table is the immediate value of p3_table, which is a label pointing to the reserved space for the PDPT, that is, an address. If we wrote p3_table without the dollar sign, it would mean “the value at the address p3_table”, which is not what we want — it would be the data at the reserved space for the PDPT, which is not what we want to put in the PML4 entry. We load this into the eax register.

Then, we set the flags for this entry. PTE_P is the “present” flag, which tells the MMU that this entry is valid and should be used for address translation. PTE_W is the “writable” flag, which allows writing to the pages mapped by this entry. or is a bitwise operation that calculates the bitwise OR of the first argument with the second argument, and stores the result in the second argument. So, or $(PTE_P | PTE_W), %eax sets the present and writable flags in eax. These flags are all in the lower bits of the entry, so they don’t interfere with the address part of the entry.

Finally, we move the value in eax into the PML4 entry at p4_table. Note that this time, we don’t use the dollar sign, because we want to write the value of eax into the memory location p4_table, which is the reserved space for the PML4. This sets up the PML4 entry to point to the PDPT with the appropriate flags.

    mov $p2_table, %eax
    or $(PTE_P | PTE_W), %eax
    mov %eax, p3_table

These 3 lines of code do the similar thing for the PDPT: they set up an entry in the PDPT that points to the PD (Page Directory) with the present and writable flags.

    mov $(PTE_P | PTE_W | PTE_PS), %eax
    mov %eax, p2_table

Finally, these 2 lines of code set up an entry in the PD that maps a 2MB huge page starting at physical address 0 with the present, writable, and page size flags. The PTE_PS flag tells the MMU that this entry maps a huge page instead of pointing to the next level page table. By setting the value of eax to PTE_P | PTE_W | PTE_PS, we are saying that this entry is present, writable, and maps a huge page. What is the address of this huge page? Since the address part of the entry is 0 (because we didn’t set any bits for the address), it maps the physical address range from 0 to 2MB. This is enough to cover the code and data of our kernel, which are located around the 1MB mark in physical memory. In the end, we mov this value into the PD entry at p2_table, which sets up the mapping for the huge page.

Telling the CPU where the PML4 is

    mov $p4_table, %eax
    mov %eax, %cr3

As we mentioned before, CR3 is the control register that holds the physical address of the PML4. After we have set up the PML4 and the rest of the page tables in memory, we need to load CR3 with the physical address of the PML4 so that the CPU knows where to find our page tables when we enable paging. The first line loads the address of p4_table into eax, and the second line moves this value into cr3. This tells the CPU where our PML4 is located in memory.

Enabling PAE

    mov %cr4, %eax
    or $(CR4_PAE), %eax
    mov %eax, %cr4

This code enables PAE by setting the PAE bit in CR4. The first line moves the current value of CR4 into eax. The second line sets the PAE bit in eax using a bitwise OR operation. The third line moves the modified value back into CR4, which enables PAE.

Setting LME in EFER

    mov $(EFER_MSR), %ecx
    rdmsr
    or $(EFER_MSR_LME), %eax
    wrmsr

This code enables long mode by setting the LME bit in the EFER MSR. The first line loads the MSR index for EFER into ecx. The rdmsr instruction then reads the value of the EFER MSR into eax and edx (since MSRs are 64 bits, they are split across these two registers). The next line sets the LME bit in eax using a bitwise OR operation. Finally, the wrmsr instruction writes the modified value in eax and edx back to the EFER MSR.

Enabling Paging

    mov %cr0, %eax
    or $(CR0_PG), %eax
    mov %eax, %cr0

Finally, this code enables paging by setting the PG bit in CR0. The first line moves the current value of CR0 into eax. The second line sets the PG bit in eax using a bitwise OR operation. The third line moves the modified value back into CR0, which enables paging.

Switching to Long Mode

Finally, we are almost ready to switch to long mode. After enabling paging, the CPU is technically in the compatibility mode of long mode, which means that it is still executing 32-bit code, but it is using the paging structures we set up for long mode. To actually switch to long mode, we need to first set up a GDT with a 64-bit code segment descriptor, and then do a far jump to that code segment. After the far jump, we will be executing 64-bit code in long mode.

The GDT

The GDT (Global Descriptor Table) is a data structure used by x86 processors to define the characteristics of the various memory segments used in protected mode and long mode. In long mode, segmentation is mostly disabled for normal code and data access, in the sense that the “ranges” of the segments defined in the GDT are ignored, and all segments are treated as if they have a base of 0 and a limit of 2^64. However, we still need a GDT to define at least a 64-bit code segment descriptor and some data segment descriptors. The code segment descriptor will have the L (long) bit set to indicate that it is a 64-bit code segment. The data segment descriptors can be set up as normal, but they won’t actually be used for segmentation in long mode. The GDT is also required for privilege transitions and for setting up the TSS and interrupts later on, but these features are not relevant for this homework.

Setting up the GDT

We set up a minimal GDT. A GDT always starts with a null descriptor, which is not used for anything. Then, we define a kernel code segment descriptor with the L bit set, and a kernel data segment descriptor. First, add the following line to the top of boot.S to include some macros for defining GDT entries:

#include "asm.h"

Then, we can define the GDT in the .data section like this:

.section .data
.p2align 3
gdt64:
  SEG_NULLASM                             # null seg
  SEG64_ASM(STA_X|STA_R, SEG64_CODE)      # code seg
  SEG64_ASM(STA_W, SEG64_OTHER)           # data seg

gdt64desc:
  .word   (gdt64desc - gdt64 - 1)         # sizeof(gdt64) - 1
  .quad   gdt64                           # address gdt64

We put the GDT in the .data section, and we align it to an 8-byte boundary per the requirements of the GDT.

gdt64 points to the actual GDT, which contains three entries. Besides the null entry, we have a code segment descriptor defined by SEG64_ASM(STA_X|STA_R, SEG64_CODE), which sets the execute and read flags for a 64-bit code segment, and marks it as a 64-bit code segment. We also have a data segment descriptor defined by SEG64_ASM(STA_W, SEG64_OTHER), which sets the write flag for a 64-bit data segment.

gdt64desc is the GDT descriptor, which is a special structure that contains the size of the GDT and the address of the GDT. This is the structure that we will load into the GDTR register using the lgdt instruction to tell the CPU how big our GDT is and where it is located in memory. Since gdt64 is defined right before gdt64desc, the size of the GDT can be calculated as gdt64desc - gdt64, and we subtract 1 from it because the size field in the GDT descriptor is defined as the size of the GDT minus 1. This is stored as a 16-bit word, and the address of the GDT is stored as a 64-bit quadword.

Then, in the start function in the .text section, we append the following code:

    lgdt gdt64desc                     # load GDT descriptor

This code uses the lgdt instruction to load the GDT descriptor into the GDTR register. The gdt64desc symbol refers to the GDT descriptor structure we defined earlier.

Far Jump to Long Mode

Now the only thing left to do is to jump to the 64-bit code.

First, we define a label for the entry point of our long mode code in the .text section in boot.S:

.code64
start64:
    hlt

This defines a label start64 that marks the entry point of our long mode code. The .code64 directive tells the assembler that the following code is 64-bit code.

Then, after loading the GDT, we do a far jump to this label:

    ljmp $(SEG_KCODE<<3), $start64

This code uses the ljmp instruction to do a far jump to the start64 label. The first argument is the segment selector for the code segment we defined in the GDT, which is SEG_KCODE shifted left by 3 bits (because each GDT entry is 8 bytes, so the index in the GDT is multiplied by 8 to get the selector). The second argument is the offset of the start64 label. This far jump will cause the CPU to load the new code segment descriptor from the GDT, which has the L bit set, and switch to long mode. After this instruction, we will be executing 64-bit code at the start64 label.

Goodbye Assembly, Hello C

At this point, we have successfully switched to long mode, and we are executing 64-bit code. However, our start64 function is still in assembly, and it just has a hlt instruction for now. We want to be able to write our kernel code in C, so the next step is to call a main function written in C. In order to do that, we need to set up the stack.

Setting up the Stack

We have been avoiding using the stack so far, but since we are going to call a C function, we need to set up the stack properly, because C code relies on the stack for function calls, local variables, and so on. We can define a stack in the .data section like this:

.comm stack, 4096

This reserves 4096 bytes for the stack and defines a symbol stack that points to the beginning of this reserved space. This is basically another way to reserve space in memory, similar to using .bss with .skip.

Then, in the start64 function, before calling main, we need to set up the stack pointer (rsp) to point to the top of this stack. Since the stack grows downwards (from higher addresses to lower addresses), we want to set rsp to point to the end (the top) of the reserved stack space:

    movabs $(stack + 4096), %rsp

Here, we use movabs to move a 64-bit immediate value into rsp. movabs is a special version of mov that allows for 64-bit immediate values, which is necessary here because stack + 4096 is a 64-bit address. You could also observe that we are finally using rsp here, which is the name of the stack pointer register in 64-bit mode (in 32-bit mode, it was esp).

Now let’s call the main function in C!

    call main

In GNU as syntax, it is not necessary to declare main as an external symbol before calling it, because the assembler will automatically treat any non-local symbol as external. However, it is good practice to declare it explicitly for clarity. You can add the following line at the top of your boot.S file, though it does not change the behavior of the assembler, only serves as a declaration for the reader:

.extern main

Compiling main()

Download the skeleton for the main.c, console.c, console.h and types.h files. A minimal main() function can look something like this:

#include "console.h"
#include "types.h"

int main(void)
{
    // Initialize the page table here

    // Initialize the console
    uartinit();

    printk("Hello from C\n");

    return 0;
}

It calls the uartinit() function to initialize the serial line and then prints “Hello from C” on the serial line.

Serial ports are a legacy communications port common on IBM-PC compatible computers. Use of serial ports for connecting peripherals has largely been deprecated in favor of USB and other modern peripheral interfaces, however it is still commonly used in certain industries for interfacing with industrial hardware such as CNC machines or commercial devices such as POS terminals. Historically it was common for many dial-up modems to be connected via a computer’s serial port, and the design of the underlying UART hardware itself reflects this.

Serial ports are typically controlled by UART hardware. This is the hardware chip responsible for encoding and decoding the data sent over the serial interface. Modern serial ports typically implement the RS-232 standard, and can use a variety of different connector interfaces. The DE-9 interface is the one most commonly used connector for serial ports in modern systems.

Serial ports are of particular interest to operating-system developers since they are much easier to implement drivers for than USB, and are still commonly found in many x86 systems. It is common for operating-system developers to use a system’s serial ports for debugging purposes, since they do not require sophisticated hardware setups and are useful for transmitting information in the early stages of an operating-system’s initialization. Many emulators such as QEMU and Bochs allow the redirection of serial output to either stdio or a file on the host computer.

Why Use a Serial Port?

During the early stages of kernel development, you might wonder why you would bother writing a serial driver. There are several reasons why you might:

Serial line driver

To print something on the serial line we need to implement a minimal serial line driver. In this homework assignment we provide you a simple serial driver in console.c, but it can be useful to look over the Serial Ports page on OSDev.org that describes the details of the serial line protocol. At a high level we define which I/O port serial line is connected to:

#define COM1    0x3f8

We then use a couple of helper functions that provide the interface to assembly in and out instructions.

static inline unsigned char inb(unsigned short port)
{
    unsigned char data;

    asm volatile("in %1,%0" : "=a" (data) : "d" (port));
    return data;
}

static inline void outb(unsigned short port, unsigned char data)
{
    asm volatile("out %0,%1" : : "a" (data), "d" (port));
}

We then use the uartinit() function to initialize the serial line interface:

void uartinit(void)
{

  // Turn off the FIFO
  outb(COM1+2, 0);

  // 9600 baud, 8 data bits, 1 stop bit, parity off.
  outb(COM1+3, 0x80);    // Unlock divisor
  outb(COM1+0, 115200/115200);
  outb(COM1+1, 0);
  outb(COM1+3, 0x03);    // Lock divisor, 8 data bits.
  outb(COM1+4, 0);
  outb(COM1+1, 0x01);    // Enable receive interrupts.

  // If status is 0xFF, no serial port.
  if(inb(COM1+5) == 0xFF)
      return;

  uart = 1;

  // Acknowledge pre-existing interrupt conditions;
  // enable interrupts.
  inb(COM1+2);
  inb(COM1+0);
}

The uartputc() displays an individual character on the screen:

void uartputc(int c)
{
  int i;

  if(!uart)
      return;

  for(i = 0; i < 128 && !(inb(COM1+5) & 0x20); i++)
      microdelay(10);

  outb(COM1+0, c);
}

And finally the printk() function prints a string on the screen:

void printk(char *str)
{
    int i, c;

    for(i = 0; (c = str[i]) != 0; i++){
        uartputc(c);
    }
}

Booting into C

Now we’re finally ready to boot into C. If you put all the files in the correct places, you can run make and get “Hello from C” on the serial line. The serial line is configured to be recorded in the serial.log file.

$ make qemu

And finally if you want to see only the console (not the VGA) you can run

$ make qemu-nox

Debugging

When developing an operating system, you will inevitably encounter various bugs and issues. Debugging is an essential skill for any operating-system developer. In this section, we will discuss how to use the VSCode debugger and GDB to debug your kernel.

Debugging with VSCode Debugger

Create a launch.json file as before, using the Debugging Panel in VSCode. Within the launch.json, add the following configuration:

{
    "name": "Debug QEMU",
    "type": "cppdbg",
    "request": "launch",
    "program": "${workspaceRoot}/build/kernel.bin",
    "cwd": "${workspaceFolder}",
    "miDebuggerPath": "/usr/local/bin/gdb",
    "miDebuggerServerAddress": "127.0.0.1:1234",
    "MIMode": "gdb",
    "stopAtEntry": true,
    "setupCommands": [
	{
	    "description": "Pretty Printing",
	    "text": "-enable-pretty-printing",
	    "ignoreFailures": false
	},
	{
	    "description": "Set architecture",
	    "text": "set arch i386:x86-64",
	    "ignoreFailures": false
	}
    ]
}

Now, in your terminal, launch the QEMU GDB process using make qemu-gdb or make qemu-gdb-nox, and then use the Debug QEMU option in the Debugging tab, and you should be good to go!

Debugging with GDB

Another intersting skill to learn while working on this homework is debugging kernels with GDB. To do this we will be using GDB’s remote debugging feature and QEMU’s remote GDB debugging stub. Remote debugging is a very important technique for kernel development in general: the basic idea is that the main debugger (GDB in this case) runs separately from the program being debugged (the xv6 kernel atop QEMU) - they could be on completely separate machines, in fact.

Finding and breaking at an address

For example, if you want to break at the very first instruction of your kernel you can use readelf tool to see where this address is (remember the kernel is the same ELF file that you loaded in your previous homework):

$ readelf -h build/kernel.bin
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x101098

In this case, the entry point is 0x101098.

Now we can start QEMU with GDB and break at this address. Open two terminals, either using a terminal multiplexer like tmux or in another teminal. Run make qemu-gdb in the first terminal. In the other terminal, change directory, and start gdb. Remember to put the provided .gdbinit file to the path of your homework 3.

$ make qemu-gdb
$ cd <path_to_hw3>
$ gdb -q
+ target remote localhost:1234
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
0x000000000000fff0 in ?? ()
+ symbol-file kernel
(gdb)

What you see on the screen is the assembly code of the BIOS that QEMU executes as part of the platform initialization. The BIOS starts at address 0xfff0 (you can read more about it in the How Does an Intel Processor Boot? blog post). You can single step through the BIOS machine code with the si (single instruction) GDB command if you like, but it’s hard to make sense of what is going on so lets skip it for now and get to the point when QEMU starts executing the xv6 kernel.

Set a breakpoint at the address of the entry point, e.g.

(gdb) br *0x00101098
Breakpoint 1 at 0x101098: file boot.S, line 9.

The details of what you see may differ slightly from the above output.

Troubleshooting GDB issues

It might be possible that you get the following error on gdb.

warning: File "/home/uXXXXXXX/hw3/src/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /home/uXXXXXXX/hw3/src/.gdbinit
line to your configuration file "/home/uXXXXXXX/.config/gdb/gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/uXXXXXXX/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"

GDB uses a file called .gdbinit to initialize things. We provide you with a .gdbinit file with the required setup. However, to allow this local .gdbinit file to be used, we have to add the a line to the global .gdbinit file. Add that line to ~/.config/gdb/gdbinit as the error message suggests, and then restart gdb.

Try to examine the .gdbinit file that we provide you. It tells gdb that the file to read symbols from while debugging is in build/kernel.bin . Since we are using remote-debugging, gdb and the target environment communicate a network socket. The other lines in the gdbinit set up the communication.

Making yourself familiar with GDB

This part of the homework teaches you how to use GDB. If your OS and GDB are still running exit them. You can exit QEMU by it with Ctrl-A X (or if you’re running on CADE you will have to press Esc followed by 2 or Alt+2) to switch to the QEMU command prompt and then type quit. You can exit GDB by pressing Ctrl-C and then Ctrl-D.

Start your OS and gdb again as you did before. Use two terminals: one to start the OS in QEMU (make qemu-gdb) and one to start GDB (gdb).

Now we explore the other ways of setting breakpoints. Instead of br *0x00101098, you can use the name of the function or an assembly label, e.g., to set the breakpoint at the beginning of the start label you can use:

(gdb) br start

BTW, autocomplete works inside GDB, so you can just type “s” and hit Tab. Similar you can set the breakpoint on the main() function.

(gdb) br main

If you need help with GDB commands, GDB can show you a list of all commands with

(gdb) help all

Now since you set two breakpoints you can continue execution of the system until one of them gets hit. In gdb enter the “c” (continue) command to run your kernel until it hits the first breakpoint (start).

(gdb) c

Now use the si (step instruction) command to single step your execution (execute it one machine instruction at a time). Remember that the start label is defined in the assembly file boot.S to be the entry point for the kernel. Enter si a couple of times. Note, you don’t have to enter si every time, if you just press “enter” the GDB will execute the last command.

(gdb) si

If everything is working correctly, you should see the assembly instructions being executed one by one. For example, if you didn’t delete the “Hello World” message in the start function and put other code after it, you should see one character of the message being printed on the screen every time you enter si and execute the next instruction.

Every time you enter si it executes one machine instruction and shows you the next machine instruction so you know what is coming next

(gdb) si
10          movw $0x0265, 0xb8002 # e

You can either continue single stepping until you reach your code or the main() function, or you can enter “c” to continue execution until the next breakpoint.

(gdb) c
Continuing.

Breakpoint 2, main () at main.c:11
11      {

You should be able to view the source alongside using the l (list) command. Since we compiled the kernel with the “-g” flag that includes the symbol information into the ELF file we can see the source code that we’re executing. Note that this applies to both C and assembly code.

Breakpoint 2, main () at main.c:11
11      {
(gdb) l
6       {
7           asm volatile("hlt" : : );
8       }
9
10      int main(void)
11      {
12          int i;
13          int sum = 0;
14
15          // Initialize the page table here
(gdb)

Remember that when you hit the main breakpoint GDB showed you that you’re at line 11 in the main.c file (main.c:11). You can either step into the functions with the s (step) command (note, in contrast to the si step instruction command, this one will execute one C line at a time), or step over the functions with the n (next) command which will not enter the function, but instead will execute it till completion. If you have used the debugging capabilities of an IDE before, you can think of s as the “Step Into” button and n as the “Step Over” button.

Try stepping into one of the functions you built. Once gdb has stopped at the line where you invoke a function, type s for step.

(gdb) s

The whole listing of the source code seems a bit inconvenient (entering l every time you want to see the source line is a bit annoying). GDB provides a more conventional way of following the program execution with the TUI mechanism. Enable it with the following GDB command

(gdb) tui enable

Now you see the source code window and the machine instructions at the bottom. You can use the same commands to walk through your program. You can scroll the source with arrow keys, PgUp, and PgDown.

TUI can show you the state of the registers and how they are changing as you execute your code

(gdb) tui reg general

TUI is a very cute part of GDB and hence it makes sense to read more about various capabilities http://sourceware.org/gdb/onlinedocs/gdb/TUI-Commands.html. For example, you can specify the assembly layout to single step through machine instructions similar to source code:

(gdb) layout asm

Or you can use them both (try it)

(gdb) layout split

Or you can look at the registers too:

(gdb) layout regs

Beej’s Quick Guide to GDB is a wonderful introduction to GDB using TUI.

You can also print variables and data structures. For example, to see what’s the value of the sum variable you can do:

p sum

If you want to see the address of the sum variable:

p &sum

If you want to print it as raw memory (3 ushorts shown as hex, for example)

x /3xh &sum

Debugging with QEMU’s built-in monitor

QEMU has a built-in monitor that can inspect and modify the machine state. To enter the monitor press Alt + 2 . Some of the following commands should be helpful in the monitor.

QEMU 6.0.1 monitor - type 'help' for more information
(qemu) info mem
0000000000000000-0000000040000000 0000000040000000 -rw
(qemu)

This displays mapped virtual memory and permissions. The above example tells us that 0x0000000040000000 bytes of memory from 0x0000000000000000 to 0x0000000040000000 are mapped read/write.

This displays a full dump of the machine’s internal register state. Note that GDT shows the limit and base of the GDT (this is helpful!)


Implementing the page table

Finally, your assignment is to implement all the boot code that we’ve discussed above and in addition a page table that maps the first 8MB of virtual addresses to the first 8MB of physical memory. In the assembly code we provided you, we’ve already set up a huge page table setup that maps the first 2MB of virtual addresses to the first 2MB of physical memory. However, you need to write C code in main.c to set up a new 4-layer page table setup that does not use huge pages, but instead uses normal 4KB pages to map the first 8MB of virtual addresses to the first 8MB of physical memory. Use the definitions from mmu.h to set up the page table entries, and use the write_cr3 function in the template main.c to load the address of your new PML4 into CR3.

Summary

The starter files we provide will:

  1. Boot under GRUB with a Multiboot header.
  2. Enable paging and enter long mode in assembly.
  3. Print "Hello, world!" onto the screen.
  4. Set up a stack and call C main().

Most of this assignment describes the starter files. You will need to understand it for the final, but we provide code that already does this.

For the required part of HW3, your main.c must:

  1. Setup a new 4-level page-table hierarchy using 4KB pages.
  2. Identity-map the first 8MB of virtual addresses to the first 8MB of physical memory.
  3. Load the new PML4 into CR3 with write_cr3().

For the required submission, you only need to submit main.c. Do not modify the other files.

Extra credit

VGA console (15% bonus)

Implement a simple VGA driver, i.e., when you use the printk() it should print on both serial line like now and on the VGA screen. Only edit the console.c file for this part.

Real hardware boot (5% bonus)

Boot on real hardware. I.e., try booting your code on a real desktop or laptop by either burning a CD-ROM or a USB flash drive. Virtual machines don’t count. Record a video of your code booting.

GDT privilege experiment (5% bonus)

Change the descriptor privilege level in the GDT to 3. Analyse (understand and explain) what happens.

Submit your work

Submit your solution through Gradescope CS5460/6460 Operating Systems. Please zip all of your files and submit them. If you have done extra credit then place files required for extra credit part into separate folders extra1, extra2 and extra3. The structure of the zip file should be the following:

/
  - main.c
  - extra1/                         -- optional
    - console.c
  - extra2/                         -- optional
    - Video or a text file with link to a video (no Rick Roll please)
  - extra3/                         -- optional
    - explanation.txt or explanation.md or explanation.pdf

Your code will be compiled using all other files provided in the src folder, so you don’t need to submit those. Only submit the required files for each part or extra part. Put them at the right place.