Homework 3: Memory Allocation

This assignment will teach you how to extend your minimal kernel with support for memory allocation and Box<T>. Technically, you can do this assignment on Linux, Mac or WSL (Windows Subsystem for Linux). Submit your code through Gradescope (see instructions at the bottom of this page).

NOTE: YOU CANNOT PUBLICLY RELEASE SOLUTIONS TO THIS HOMEWORK.

It’s ok to show your work to your future employer as a private Git repo, however any public release is prohibited.

Overview

This homework asks you to extend your HW2 submission with support for allocating memory for both kernel objects via the Rust Box, kernel pages (e.g., pages used to construc the page table), and pages mapped as memory of user processes.

To make things a bit simpler, insted of building the full stack of typical kernel allocators, i.e., slab, buddy, zones and nodes like in Linux, we plan to build a memory allocator that supports only two allocation sizes: 4K and 2MB. Both allocators will be imlemented as doubly-linked lists.We will use the 4KB list for allocating kernel objects via Box<T> (well wasting an antire 4KB page on an object that is smaller than a page), and for both 4KB and 2MB pages for user processes.

Well, you can take a look on implementation of Buddy and it’s not hard so maybe you can ask a question why the hell we’re dealing with these two lists. One addtional reason (besides the fact that you have to build it) is that we have a verified version of a similar allocator in Atmosphere which hopefully we will look at towards the end of this class.

We will organize our allocator as two main data structures:

We also plan to support two operations: split() and merge() that allow us to convert 2MB pages into 4KB pages (split()) and merging them back (merge()). I.e., the split operation splits one 2MB page into 512 4KB pages.

Allocating the metadata region (page_array)

A natural question is how to allocate the memory for the page_array itself. After all we don’t have a memory allocator yet. One option is to use a statically allocated variable (allocated in the data or BSS section of the program), but this is not ideal since we don’t know the size of the page_array until we boot.

This is why we make a practical choice to allocate the metadata region dynamically after booting into the kernel and picking up the size of available physical memory from the GRUB boot loader.

Specifically, we utilize two bits of information. First, we use the __end symbol that we add to the linker to mark the last address occupied by the kernel in memory (the kernel is loaded by GRUB at 1MB address and goes until the __end). Then our plan is to allocate page_array right above the kernel, i.e., starting from the next page above __end (we round up the __end address to the next page).

The end symbol

To introduce the __end symbol, we modify the the linker script and add the simbol similar to how it’s done in (another Rust OS we did a couple of RedLeaf years ago).

We can then access this symbol similar to how RedLeaf does it by defining a static __end: u8; variable and implementing a cute wrapper around it.

Memory size

The size of the page_array is defined by the number of physical pages in our system. Physical memory can have special ACPI regions, and even gaps, so it’s better to say it’s defined by the max physical addres returned by GRUB. In other words, the page_array has enough elements to describe the state of every page from 0 to the max physical address.

To learn how much memory our system has, we utilize the Multiboot v2 that is supported by GRUB. Multiboot2 is a long and dry specification, and while you’re welcome to look through it, it’s much easier to use modern AI tools to ask a couple of questions about it, e.g., “How does multiboot v2 passes memory information to the kernel”.

At a high level, a multiboot2 compliant boot loader like GRUB passes a pointer to the multiboot information structure in the ebx register (remember GRUB leaves us in 32bit mode). This structure is a sequence of tags – each describing a different piece of boot-time data (command line, framebuffer, modules, memory map, etc.). Here is how it looks in memory:

+--------------------------+
| total_size (u32)         |
| reserved (u32, must=0)   |
+--------------------------+
| Tag #1                   |
+--------------------------+
| Tag #2                   |
+--------------------------+
| Tag #3                   |
+--------------------------+
| ...                      |
+--------------------------+
| Tag #N (type=0, end tag) |
+--------------------------+

Each tag has a common header:

struct multiboot_tag {
    uint32_t type;
    uint32_t size; // size of the entire tag including header
    // followed by tag-specific data
};

Each tag starts on an 8-byte boundary. If a tag’s size is not a multiple of 8, it is padded with zeros.

The memory map tag is just one of these tags, inserted by the bootloader:

+-----------------------------------------------------------+
| type = 6 (MULTIBOOT_TAG_TYPE_MMAP)                        |
| size = sizeof(struct multiboot_tag_mmap) + entries        |
| entry_size                                                |
| entry_version (0)                                         |
|   +----------------------------------------------+        |
|   | addr (u64) | len (u64) | type (u32) | zero   | entry1 |
|   +----------------------------------------------+        |
|   | addr (u64) | len (u64) | type (u32) | zero   | entry2 |
|   +----------------------------------------------+        |
|   | ...                                          | ...    |
|   +----------------------------------------------+        |
+-----------------------------------------------------------+

Or in other words:

struct multiboot_tag_mmap {
    uint32_t type;      // = MULTIBOOT_TAG_TYPE_MMAP (6)
    uint32_t size;      // size of this tag including all entries
    uint32_t entry_size;  // size of each mmap entry (may include extra fields)
    uint32_t entry_version; // version (currently 0)
    struct multiboot_mmap_entry {
        uint64_t addr;   // base physical address
        uint64_t len;    // length in bytes
        uint32_t type;   // 1 = available, other values = reserved, ACPI, etc.
        uint32_t zero;   // reserved, must be zero
    } entries[];
};

Each entry describes a physical memory region:

The types can be:

While you can implement your own boot info parsing logic I suggest that you grab the mutliboot2 crate from RedLeaf and adapt it to your kernel.

Note that since the pointer to the memory info region is passed by GRUB in the EBX register we need to save it early on boot in our boot loader before we overwrite the register value.

We have to modify our boot loader similar to how RedLeaf does it in boot.asm by calling check_multiboot. Here RedLeaf checks if the mutliboot magic number is correct in eax and then saves the pointer to the multiboot info stucture in the _bootinfo variable (we should also add it to our code, like it’s done here in RedLeaf, i.e., we declare static _bootinfo: usize;.

We then pass it to the mutliboot crate here

Note that since we are not sure if we accidentally overwrite the boot info data structure from GRUB, RedLeaf moves it right above the __end symbol in the kernel (well, we know that memory is available there). This means that we can’t really use __end for the page_array and instead should maintain the logical KERNEL_END similar to RedLeaf.

Organization of page_array

Inside the page_array we track the state of each memory page. For example, for all pages allocated by the kernel (from 0 to KERNEL_END), we mark the pages in page_array as Unavailable. It means that we assume that these pages are reserved and cannot be used by the allocator (we do the same for the regions of memory that are not available due to ACPI or other gaps as passed in the memory map).

Starting from KERNEL_END and up to the next 2MB page we mark the pages as Free4K. The pages that are aligned at 2MB boundary we mark as Free2MB.

As I mentioned, we plan to maintain two lists of free pages: 4KB list and 2MB list. When we initialize the allocator we add all Free4K pages on the 4KB list and all Free2MB pages on the 2MB list.

Note that we implement both lists as doubly-linked lists to support fast removal from the list which we utilize for the merge() operation.

When we allocate a 4KB object we simply take the first element from the head of the list and mark the corresponding element in the page_array as Allocated4K. Note that on each allocation of a 4K page we update the counter (decrementing it) that tracks how many of 4KB pages are free in the 2MB superpage.

When we run out of pages on the 4KB list we need to split one 2MB page into 4KB pages. We take the 2MB page from the list of 2MB pages (it has to be in the Free2MB state) and change the state of each 4KB page within this 2MB page to Free4K while adding all 4KB pages on the 4KB page list. Note we also update the counter of the head page to 512 indicating that all 512 4KB pages are currently free. The next 4KB allocation will decrement this counter to 511 and so on.

Later on when we start freeing 4KB pages, we will start incrementing this counter and when it reaches 512 again we will implement the merge() operation. I.e., we will merge all 4KB pages back into a single 2MB page. Note that when we perform the merge() we will have to remove all 4KB pages of the 2MB page from the free list of 4KB pages. But this is easy, we simply iterate over all 4KB pages of the 2MB page and delete each of them from the list of 4KB pages (since the list is doubly linked deletion is easy).

Note, that some metadata is excessive. I.e., we already know that the page is free if it’s on the free list. However, tracking this state explicitly as Free4K will harden our allocator and guard against possible bugs when we try to either free a page multiple times (a double free error), or try to free a 4KB page as a 2MB page, etc. All modern allocators implement some degree of hardening to protect against double free bugs.

Locking and synchronization

To simplify implementation, lets execute all allocations under two big locks that guard 4KB and 2MB lists, i.e., a Mutex<T> in Rust. You can take a look how we do this in RedLeaf.

Note however that since we are working in the kernel we cannot acquire locks with interrupts enabled (or we have to guarantee that interrupt handlers don’t try to acquire the same mutexes). Hence, we extend our mutex with support for disabling interrupts. Which means that you have to implement your own Mutex<T> primitive.

Note that if we deside to support non-maskable interrupts we still have to guarantee that NMI handlers don’t acquire the same mutexes. A runtime check that tracks the execution context (i.e., NMI vs non-NMI and panics if the mutex is acquired from the NMI context is a good idea to guard for mesterious bugs).

Integrating with Box<T>

To implement integration with Box<T>, we can utilizere instructions from Philipp Oppermann’s blog. At a high level you will need to implement GlobalAlloc trait for your allocator. You can use Allocator interface in RedLeaf as a reference. Then you have to declare global_allocator.

Submitting your work

To test your work, we provide a test.rs that you will be integrating into your project. To integrate test.rs, a few things you need to provide

  1. Implement a Global allocator instance named “ALLOCATOR” under module crate::memory
  2. Enable rust std alloc crate

Add a rust module by add following files

In mod.rs, add your global allocator instance.

#[global_allocator]
pub static ALLOCATOR: YourAllocatorType = ...;

To enable rust std alloc crate, add following into your .cargo/config.toml.

[unstable]
build-std = [ "core","alloc" ]
unstable-options = true

Finally, call test_all() function in your main.rs AFTER initializing your memory allocator.

To submit, do not print anything in your timer interrupt handler. you may use submission.sh to zip up all your files. Make sure to submit with .cargo/config.toml file.