Homework 3: Memory Allocation
This assignment will teach you how to extend your minimal kernel with support
for memory allocation and Box<T>. Technically, you can do this
assignment on Linux, Mac or WSL (Windows Subsystem for Linux). Submit your
code through Gradescope (see instructions at the bottom of this page).
NOTE: YOU CANNOT PUBLICLY RELEASE SOLUTIONS TO THIS HOMEWORK.
It’s ok to show your work to your future employer as a private Git repo, however any public release is prohibited.
Overview
This homework asks you to extend your HW2 submission with support for allocating memory
for both kernel objects via the Rust Box, kernel pages (e.g., pages used to
construc the page table), and pages mapped as memory of user processes.
To make things a bit simpler, insted of building the full stack of typical
kernel allocators, i.e., slab, buddy, zones and nodes like in Linux, we plan to
build a memory allocator that supports only two allocation sizes: 4K and 2MB.
Both allocators will be imlemented as doubly-linked lists.We will use the 4KB
list for allocating kernel objects via Box<T> (well wasting an antire 4KB
page on an object that is smaller than a page), and for both 4KB and 2MB pages
for user processes.
Well, you can take a look on implementation of Buddy and it’s not hard so maybe you can ask a question why the hell we’re dealing with these two lists. One addtional reason (besides the fact that you have to build it) is that we have a verified version of a similar allocator in Atmosphere which hopefully we will look at towards the end of this class.
We will organize our allocator as two main data structures:
-
A metadata region (
page_array) that tracks the state of all physical pages in the system as well as maintains the metadata about allocated and free pages. I.e., each element ofpage_arrayrepresents one physical page starting from page 0 (we can call it a page with the page frame number (PFN) 0) to the last physical page the machine on which we boot has. -
Two lists that track free pages of 4KB and 2MB sizes. Note, we plan to maintain the list pointers, i.e.,
nextandprevinside thepage_arrayelements.
We also plan to support two operations: split() and merge() that allow us
to convert 2MB pages into 4KB pages (split()) and merging them back
(merge()). I.e., the split operation splits one 2MB page into 512 4KB pages.
Allocating the metadata region (page_array)
A natural question is how to allocate the memory for the page_array itself.
After all we don’t have a memory allocator yet. One option is to use a
statically allocated variable (allocated in the data or BSS section of the
program), but this is not ideal since we don’t know the size of the
page_array until we boot.
This is why we make a practical choice to allocate the metadata region dynamically after booting into the kernel and picking up the size of available physical memory from the GRUB boot loader.
Specifically, we utilize two bits of information. First, we use the __end
symbol that we add to the linker to mark the last address occupied by the
kernel in memory (the kernel is loaded by GRUB at 1MB address and goes until
the __end). Then our plan is to allocate page_array right above the kernel,
i.e., starting from the next page above __end (we round up the __end
address to the next page).
The end symbol
To introduce the __end symbol, we modify the the linker
script and
add the simbol similar to how it’s done in (another Rust OS we did a couple of
RedLeaf
years ago).
We can then access this symbol similar to how
RedLeaf
does it by defining a static __end: u8; variable and implementing a cute
wrapper around it.
Memory size
The size of the page_array is defined by the number of physical pages in our
system. Physical memory can have special ACPI regions, and even gaps, so it’s
better to say it’s defined by the max physical addres returned by GRUB. In
other words, the page_array has enough elements to describe the state of
every page from 0 to the max physical address.
To learn how much memory our system has, we utilize the Multiboot v2 that is supported by GRUB. Multiboot2 is a long and dry specification, and while you’re welcome to look through it, it’s much easier to use modern AI tools to ask a couple of questions about it, e.g., “How does multiboot v2 passes memory information to the kernel”.
At a high level, a multiboot2 compliant boot loader like GRUB passes a pointer
to the multiboot information structure in the ebx register (remember GRUB
leaves us in 32bit mode). This structure is a sequence of tags – each
describing a different piece of boot-time data (command line, framebuffer,
modules, memory map, etc.). Here is how it looks in memory:
+--------------------------+
| total_size (u32) |
| reserved (u32, must=0) |
+--------------------------+
| Tag #1 |
+--------------------------+
| Tag #2 |
+--------------------------+
| Tag #3 |
+--------------------------+
| ... |
+--------------------------+
| Tag #N (type=0, end tag) |
+--------------------------+
Each tag has a common header:
struct multiboot_tag {
uint32_t type;
uint32_t size; // size of the entire tag including header
// followed by tag-specific data
};
Each tag starts on an 8-byte boundary. If a tag’s size is not a multiple of 8, it is padded with zeros.
The memory map tag is just one of these tags, inserted by the bootloader:
+-----------------------------------------------------------+
| type = 6 (MULTIBOOT_TAG_TYPE_MMAP) |
| size = sizeof(struct multiboot_tag_mmap) + entries |
| entry_size |
| entry_version (0) |
| +----------------------------------------------+ |
| | addr (u64) | len (u64) | type (u32) | zero | entry1 |
| +----------------------------------------------+ |
| | addr (u64) | len (u64) | type (u32) | zero | entry2 |
| +----------------------------------------------+ |
| | ... | ... |
| +----------------------------------------------+ |
+-----------------------------------------------------------+
Or in other words:
struct multiboot_tag_mmap {
uint32_t type; // = MULTIBOOT_TAG_TYPE_MMAP (6)
uint32_t size; // size of this tag including all entries
uint32_t entry_size; // size of each mmap entry (may include extra fields)
uint32_t entry_version; // version (currently 0)
struct multiboot_mmap_entry {
uint64_t addr; // base physical address
uint64_t len; // length in bytes
uint32_t type; // 1 = available, other values = reserved, ACPI, etc.
uint32_t zero; // reserved, must be zero
} entries[];
};
Each entry describes a physical memory region:
-
addr = physical start address
-
len = length in bytes
-
type = 1 (available), 2 (reserved), etc.
-
zero = reserved (set to 0)
The types can be:
- 1 Available RAM (usable by OS)
- 2 Reserved
- 3 ACPI Reclaimable Memory
- 4 ACPI NVS Memory
- 5 Bad Memory (defective)
While you can implement your own boot info parsing logic I suggest that you grab the mutliboot2 crate from RedLeaf and adapt it to your kernel.
Note that since the pointer to the memory info region is passed by GRUB in the
EBX register we need to save it early on boot in our boot loader before we
overwrite the register value.
We have to modify our boot loader similar to how RedLeaf does it in boot.asm
by calling
check_multiboot.
Here RedLeaf checks if the mutliboot magic number is correct in eax and then
saves the pointer to the multiboot info stucture in the _bootinfo variable
(we should also add it to our code, like it’s done here in
RedLeaf,
i.e., we declare static _bootinfo: usize;.
We then pass it to the mutliboot crate here
Note that since we are not sure if we accidentally overwrite the boot info data
structure from GRUB, RedLeaf moves it right above the __end symbol in the
kernel (well, we know that memory is available there). This means that we can’t
really use __end for the page_array and instead should maintain the logical
KERNEL_END
similar to RedLeaf.
Organization of page_array
Inside the page_array we track the state of each memory page. For example,
for all pages allocated by the kernel (from 0 to KERNEL_END), we mark the
pages in page_array as Unavailable. It means that we assume that these
pages are reserved and cannot be used by the allocator (we do the same for the
regions of memory that are not available due to ACPI or other gaps as passed in
the memory map).
Starting from KERNEL_END and up to the next 2MB page we mark the pages as
Free4K. The pages that are aligned at 2MB boundary we mark as Free2MB.
As I mentioned, we plan to maintain two lists of free pages: 4KB list and 2MB
list. When we initialize the allocator we add all Free4K pages on the 4KB
list and all Free2MB pages on the 2MB list.
Note that we implement both lists as doubly-linked lists to support fast
removal from the list which we utilize for the merge() operation.
When we allocate a 4KB object we simply take the first element from the head of
the list and mark the corresponding element in the page_array as
Allocated4K. Note that on each allocation of a 4K page we update the counter
(decrementing it) that tracks how many of 4KB pages are free in the 2MB
superpage.
When we run out of pages on the 4KB list we need to split one 2MB page into 4KB
pages. We take the 2MB page from the list of 2MB pages (it has to be in the
Free2MB state) and change the state of each 4KB page within this 2MB page to
Free4K while adding all 4KB pages on the 4KB page list. Note we also update
the counter of the head page to 512 indicating that all 512 4KB pages are
currently free. The next 4KB allocation will decrement this counter to 511 and
so on.
Later on when we start freeing 4KB pages, we will start incrementing this
counter and when it reaches 512 again we will implement the merge()
operation. I.e., we will merge all 4KB pages back into a single 2MB page. Note
that when we perform the merge() we will have to remove all 4KB pages of the
2MB page from the free list of 4KB pages. But this is easy, we simply iterate
over all 4KB pages of the 2MB page and delete each of them from the list of 4KB
pages (since the list is doubly linked deletion is easy).
Note, that some metadata is excessive. I.e., we already know that the page is
free if it’s on the free list. However, tracking this state explicitly as
Free4K will harden our allocator and guard against possible bugs when we try
to either free a page multiple times (a double free error), or try to free a
4KB page as a 2MB page, etc. All modern allocators implement some degree of
hardening to protect against double free bugs.
Locking and synchronization
To simplify implementation, lets execute all allocations under two big locks
that guard 4KB and 2MB lists, i.e., a Mutex<T> in Rust. You can take a look
how we do this in
RedLeaf.
Note however that since we are working in the kernel we cannot acquire locks
with interrupts enabled (or we have to guarantee that interrupt handlers don’t
try to acquire the same mutexes). Hence, we extend our mutex with support for
disabling interrupts. Which means that you have to implement your own
Mutex<T> primitive.
Note that if we deside to support non-maskable interrupts we still have to guarantee that NMI handlers don’t acquire the same mutexes. A runtime check that tracks the execution context (i.e., NMI vs non-NMI and panics if the mutex is acquired from the NMI context is a good idea to guard for mesterious bugs).
Integrating with Box<T>
To implement integration with Box<T>, we can utilizere instructions from
Philipp Oppermann’s
blog. At a
high level you will need to implement GlobalAlloc trait for your allocator.
You can use Allocator interface in
RedLeaf
as a reference. Then you have to declare
global_allocator.
Submitting your work
To test your work, we provide a test.rs that you will be integrating into your project. To integrate test.rs, a few things you need to provide
- Implement a Global allocator instance named “ALLOCATOR” under module
crate::memory - Enable rust std alloc crate
Add a rust module by add following files
- src/memory/mod.rs
- src/memory/test.rs (MAKE SURE you place test.rs in this path, because autograder will replace your test.rs at this path.)
In mod.rs, add your global allocator instance.
#[global_allocator]
pub static ALLOCATOR: YourAllocatorType = ...;
To enable rust std alloc crate, add following into your .cargo/config.toml.
[unstable]
build-std = [ "core","alloc" ]
unstable-options = true
Finally, call test_all() function in your main.rs AFTER initializing your memory allocator.
To submit, do not print anything in your timer interrupt handler. you may use submission.sh to zip up all your files. Make sure to submit with .cargo/config.toml file.