m1rko March 27, 2019 at 21:03

We write an operating system on Rust. Implementing page memory (new)

Transfer

In this article, we will figure out how to implement page memory support in our core. First, we’ll study various methods so that the frames of the physical page table become available to the kernel, and discuss their advantages and disadvantages. Then we implement the address translation function and the function of creating a new mapping.

This series of articles published on GitHub . If you have any questions or problems, open the corresponding ticket there. All sources for the article are in this thread .

Another article about paging?
If you follow this cycle, you saw the article “Page Memory: Advanced Level” at the end of January. But they criticized mefor recursive page tables. Therefore, I decided to rewrite the article, using a different approach for accessing frames.

Here is a new option. The article still explains how recursive page tables work, but we use a simpler and more powerful implementation. We will not delete the previous article, but mark it as outdated and will not update it.

I hope you enjoy the new option!

Content

Introduction
- Dependency Updates
Access to page tables
Bootloader support
- Boot information
- Entry point macro
Implementation
Summary
What's next?

Introduction

From the last article, we learned about the principles of page organization of memory and how four-level page tables work on x86_64. We also found that the loader already set up the page table hierarchy for our kernel, so the kernel runs on virtual addresses. This increases security because unauthorized access to memory causes a page fault instead of randomly changing the physical memory.

The article ended up not being able to access page tables from our kernel, because they are stored in physical memory, and the kernel is already running on virtual addresses. Here we continue the topic and explore different options for accessing the frames of the page table from the kernel. We will discuss the advantages and disadvantages of each of them, and then choose the appropriate option for our core.

Boot loader support is required, so we will configure it first. Then we implement a function that runs through the entire hierarchy of page tables in order to translate virtual addresses into physical ones. Finally, we will learn how to create new mappings in page tables and how to find unused memory frames to create new tables.

Dependency Updates

This article requires registration in dependencies of bootloaderversion 0.4.0 or higher and x86_64version 0.5.2 or higher. You can update the dependencies in Cargo.toml:

[dependencies]
bootloader = "0.4.0"
x86_64 = "0.5.2"

For changes in these versions, see the bootloader log and x86_64 log .

Access to page tables

Accessing page tables from the kernel is not as easy as it might seem. To understand the problem, take another look at the four-level table hierarchy from the previous article:

The important thing is that each page entry stores the physical address of the next table. This avoids the translation of these addresses, which reduces performance and easily leads to endless loops.

The problem is that we cannot directly access physical addresses from the kernel, since it also works on virtual addresses. For example, when we access the address 4 KiB, we get access to the virtual address 4 KiB, and not to the physical address where the table of pages of the 4th level is stored. If we want to access the physical address 4 KiB, then we need to use some virtual address, which is translated into it.

Therefore, to access the frames of the page tables, you need to map some virtual pages to these frames. There are different ways to create such mappings.

Identity mapping

A simple solution is the identical display of all page tables .

In this example, we see the identical display of frames. The physical addresses of the page tables are at the same time valid virtual addresses, so that we can easily access the page tables of all levels, starting with register CR3.

However, this approach clutters the virtual address space and makes it difficult to find large contiguous areas of free memory. Let's say we want to create a 1000 KiB virtual memory area in the above figure, for example, to display a file in memory . We cannot start with the region 28  KiB, because it will run into an already occupied page on 1004  KiB. Therefore, we will have to look further until we find a suitable large fragment, for example, with 1008  KiB. There is the same fragmentation problem as in segmented memory.

In addition, the creation of new page tables is much more complicated, since we need to find physical frames whose corresponding pages are not yet used. For example, for our file, we reserved an area of 1000 KiB of virtual memory, starting from the address 1008  KiB. Now we can no longer use a single frame with a physical address between 1000  KiBand 2008  KiB, because it cannot be displayed identically.

Fixed offset map

To avoid cluttering the virtual address space, you can display the page tables in a separate memory area . Therefore, instead of identifying the mapping, we map frames with a fixed offset in the virtual address space. For example, the offset can be 10 TiB:

By allocating this range of virtual memory purely for displaying page tables we will avoid the problems of identical display. Reserving such a large area of virtual address space is only possible if the virtual address space is much larger than the size of the physical memory. This x86_64is not a problem, because the 48-bit address space is 256 TiB.

But this approach has the disadvantage that when creating each page table, you need to create a new mapping. In addition, it does not allow access to tables in other address spaces, which would be useful when creating a new process.

Full physical memory mapping

We can solve these problems by displaying all physical memory , and not just page table frames:

This approach allows the kernel to access arbitrary physical memory, including page table frames of other address spaces. A range of virtual memory is reserved the same size as before, but only there are no unmatched pages left in it.

The disadvantage of this approach is that additional page tables are needed to display the physical memory. These page tables should be stored somewhere, so they use some of the physical memory, which can be a problem on devices with a small amount of RAM.

However, on x86_64 we can use huge pages to display2 MiB instead of the default size 4 KiB. Thus, to display 32 GiB of physical memory, only 132 KiB per page table is required: only one third-level table and 32 second-level tables. Huge pages are also cached more efficiently because they use fewer entries in the dynamic translation buffer (TLB).

Temporary display

For devices with very little physical memory, you can only display page tables temporarily when you need to access them. For temporary comparisons, an identical display of only the first level table is required:

In this figure, a level 1 table manages the first 2 MiB of virtual address space. This is possible because access is carried out from the CR3 register through zero entries in the tables of levels 4, 3 and 2. The record with the index 8translates the virtual page at the address 32 KiBinto the physical frame at the address 32 KiB, thereby identifying the level 1 table itself. The figure shows this horizontal arrow.

By writing to the identically mapped level 1 table, our kernel can create up to 511 time comparisons (512 minus the record needed for the identity mapping). In the above example, the kernel creates two time comparisons:

Matching a null record of a level 1 table with a frame at an address 24 KiB. This creates a temporary mapping of the virtual page at the address 0 KiBto the physical frame of the level 2 page table indicated by the dotted arrow.
Matching the 9th record of a level 1 table with a frame at 4 KiB. This creates a temporary mapping of the virtual page at the address 36 KiBto the physical frame of the page level 4 table indicated by the dotted arrow.

Now the kernel can access the level 2 table by writing to the page that starts at the address 0 KiBand the level 4 table by writing to the page that starts at the address 33 KiB.

Thus, access to an arbitrary frame of the page table with temporary mappings consists of the following actions:

Find a free entry in the identically displayed level 1 table.
Map this entry to the physical frame of the page table we want to access.
Access this frame through the virtual page associated with the entry.
Set the record back to unused, thereby removing the temporary mapping.

With this approach, the virtual address space remains clean, since the same 512 virtual pages are constantly used. The disadvantage is some cumbersomeness, especially since a new comparison may require changing several levels of the table, that is, we need to repeat the described process several times.

Recursive Page Tables

Another interesting approach that does not require additional page tables at all is recursive matching .

The idea is to translate some records from the fourth-level table into it itself. Thus, we actually reserve a part of the virtual address space and map all current and future table frames to this space.

Let's look at an example to understand how this all works:

The only difference from the example at the beginning of the article is an additional record with an index 511in the level 4 table, which is mapped to the physical frame 4 KiBlocated in this table itself.

When the CPU goes on this record, it does not refer to the level 3 table, but again refers to the level 4 table. This is similar to a recursive function that calls itself. It is important that the processor assumes that each record in the level 4 table points to a level 3 table, so now it treats the level 4 table as a level 3 table. This works because tables of all levels in x86_64 have the same structure.

By following a recursive record one or more times before starting the actual conversion, we can effectively reduce the number of levels that the processor goes through. For example, if we follow a recursive record once, and then go to a level 3 table, the processor thinks that the level 3 table is a level 2 table. Moving on, he considers the level 2 table as a level 1 table, and the level 1 table as mapped frame in physical memory. This means that we can now read and write to the page level 1 table because the processor thinks this is a mapped frame. The figure below shows the five steps of such a translation:

Similarly, we can follow a recursive entry twice before starting the conversion to reduce the number of levels passed to two:

Let's go through this procedure step by step. First, the CPU follows the recursive entry in the level 4 table and thinks it has reached the level 3 table. Then it follows the recursive entry again and thinks that it has reached level 2. But in reality it is still at level 4. Then the CPU goes to the new address and gets into the level 3 table, but thinks it’s already in the level 1 table. Finally, at the next entry point in the level 2 table, the processor thinks it has accessed the physical memory frame. This allows us to read and write to a level 2 table.

The tables of levels 3 and 4 are also accessed. To access the table of level 3 we follow a recursive entry three times: the processor thinks that it is already in the table of level 1, and in the next step we reach level 3, which the CPU considers as a mapped frame. To access the level 4 table itself, we simply follow the recursive record four times until the processor processes the level 4 table itself as a mapped frame (in blue in the figure below).

The concept is hard to understand at first, but in practice it works pretty well.

Address Calculation

So, we can access tables of all levels by following a recursive record one or more times. Since indexes in tables of four levels are derived directly from the virtual address, special virtual addresses must be created for this method. As we recall, page table indexes are extracted from the address as follows:

Suppose we want to access a level 1 table that displays a specific page. As we learned above, you need to go through a recursive record once, and then through the indices of the 4th, 3rd and 2nd levels. To do this, we move all the address blocks one block to the right and set the index of the recursive record to the place of the initial index of level 4:

To access the level 2 table of this page, we move all index blocks two blocks to the right and set the recursive index to the place of both source blocks: level 4 and level 3:

To access the level 3 table, we do the same, we just shift to the right already three address blocks.

Finally, to access the level 4 table, move everything four blocks to the right.

Now you can calculate virtual addresses for page tables of all four levels. We can even calculate an address that exactly points to a specific page table entry by multiplying its index by 8, the size of the page table entry.

The table below shows the structure of addresses for accessing various types of frames:

Virtual address for	Address structure ( octal )
Page	`0o_SSSSSS_AAA_BBB_CCC_DDD_EEEE`
Entry in level 1 table	`0o_SSSSSS_RRR_AAA_BBB_CCC_DDDD`
Entry in a level 2 table	`0o_SSSSSS_RRR_RRR_AAA_BBB_CCCC`
Entry in a level 3 table	`0o_SSSSSS_RRR_RRR_RRR_AAA_BBBB`
Entry in level 4 table	`0o_SSSSSS_RRR_RRR_RRR_RRR_AAAA`

Here АААis the index of level 4, ВВВ- level 3, ССС- level 2, and DDD- the index of level 1 for the displayed frame, EEEE- its offset. RRR- index of recursive record. An index (three digits) is converted to an offset (four digits) by multiplying by 8 (the size of the page table entry). With this offset, the resulting address directly points to the corresponding page table entry.

SSSS- Sign extension bits, that is, all of them are copies of bit 47. This is a special requirement for valid addresses in the x86_64 architecture, which we discussed in a previous article .

addresses octal, since each octal character represents three bits, which allows you to clearly separate the 9-bit indexes of tables at different levels. This is not possible in the hexadecimal system, where each character represents four bits.

Rust code

You can construct such addresses in Rust code using bitwise operations:

// the virtual address whose corresponding page tables you want to access
let addr: usize = […];
let r = 0o777; // recursive index
let sign = 0o177777 << 48; // sign extension
// retrieve the page table indices of the address that we want to translate
let l4_idx = (addr >> 39) & 0o777; // level 4 index
let l3_idx = (addr >> 30) & 0o777; // level 3 index
let l2_idx = (addr >> 21) & 0o777; // level 2 index
let l1_idx = (addr >> 12) & 0o777; // level 1 index
let page_offset = addr & 0o7777;
// calculate the table addresses
let level_4_table_addr =
    sign | (r << 39) | (r << 30) | (r << 21) | (r << 12);
let level_3_table_addr =
    sign | (r << 39) | (r << 30) | (r << 21) | (l4_idx << 12);
let level_2_table_addr =
    sign | (r << 39) | (r << 30) | (l4_idx << 21) | (l3_idx << 12);
let level_1_table_addr =
    sign | (r << 39) | (l4_idx << 30) | (l3_idx << 21) | (l2_idx << 12);

This code assumes a recursive mapping of the last level 4 entry with an index 0o777(511) mapped recursively. This is currently not the case, so the code will not work yet. See below on how to tell the loader to set up a recursive mapping.

As an alternative to manually performing bitwise operations, you can use the type of RecursivePageTablecrate x86_64, which provides safe abstractions for various table operations. For example, the code below shows how to convert a virtual address to its corresponding physical address:

// in src/memory.rs
use x86_64::structures::paging::{Mapper, Page, PageTable, RecursivePageTable};
use x86_64::{VirtAddr, PhysAddr};
/// Creates a RecursivePageTable instance from the level 4 address.
let level_4_table_addr = […];
let level_4_table_ptr = level_4_table_addr as *mut PageTable;
let recursive_page_table = unsafe {
    let level_4_table = &mut *level_4_table_ptr;
    RecursivePageTable::new(level_4_table).unwrap();
}
/// Retrieve the physical address for the given virtual address
let addr: u64 = […]
let addr = VirtAddr::new(addr);
let page: Page = Page::containing_address(addr);
// perform the translation
let frame = recursive_page_table.translate_page(page);
frame.map(|frame| frame.start_address() + u64::from(addr.page_offset()))

Again, this code requires a correct recursive mapping. With this mapping, the missing is level_4_table_addrcalculated as in the first code example.

Recursive mapping is an interesting method that shows how powerful matching can be through a single table. It is relatively easy to implement and requires only minimal setup (just one recursive entry), so this is a good choice for the first experiments.

But it has some disadvantages:

A large amount of virtual memory (512 GiB). This is not a problem in a large 48-bit address space, but may lead to suboptimal cache behavior.
It easily gives access only to the currently active address space. Access to other address spaces is still possible by changing the recursive entry, but temporary matching is required for switching. We described how to do this in a previous (obsolete) article.
It depends heavily on the x86 page table format and may not work on other architectures.

Bootloader support

All the approaches described above require changes to the page tables and the corresponding settings. For example, to map physical memory identically or recursively map records of a fourth-level table. The problem is that we cannot make these settings without access to the page tables.

So, I need help from the bootloader. He has access to page tables, so he can create any displays that we need. In its current implementation, the crate bootloadersupports the two above approaches using cargo functions :

The function map_physical_memorydisplays the full physical memory somewhere in the virtual address space. Thus, the kernel gains access to all physical memory and can apply an approach with the display of full physical memory .
Using the function, the recursive_page_tableloader recursively displays a record of the fourth-level page table. This allows the kernel to work according to the method described in the "Recursive Page Tables" section .

For our kernel, we choose the first option, because it is a simple, platform-independent and more powerful approach (it also gives access to other frames, not just page tables). For support from the bootloader, add the function to its dependencies map_physical_memory:

[dependencies]
bootloader = { version = "0.4.0", features = ["map_physical_memory"]}

If this feature is enabled, the bootloader maps the full physical memory to some unused range of virtual addresses. To pass a range of virtual addresses to the kernel, the bootloader passes the structure of the boot information .

Boot information

The crate bootloaderdefines the structure of BootInfo with all the information passed to the kernel. The structure is still being finalized, so there may be some failures when upgrading to future versions incompatible with semver . Currently, the structure has two fields: memory_mapand physical_memory_offset:

The field memory_mapprovides an overview of the available physical memory. It tells the kernel how much physical memory is available on the system and which areas of memory are reserved for devices such as VGA. A memory card can be requested from the BIOS or UEFI firmware, but only at the very beginning of the boot process. For this reason, the loader must provide it, because then the kernel will no longer be able to receive this information. A memory card will come in handy later in this article.
physical_memory_offsetreports the virtual start address of the physical memory mapping. Adding this offset to the physical address, we get the corresponding virtual address. This gives access from the kernel to arbitrary physical memory.

The loader passes the structure BootInfoto the kernel as an argument &'static BootInfoto the function _start. Add it:

// in src/main.rs
use bootloader::BootInfo;
#[cfg(not(test))]
#[no_mangle]
pub extern "C" fn _start(boot_info: &'static BootInfo) -> ! { // new argument
    […]
}

It is important to specify the correct argument type, since the compiler does not know the correct signature type of our entry point function.

Entry point macro

Since the function _startis called externally from the loader, the signature of the function is not checked. This means that we can let it accept arbitrary arguments without compilation errors, but this will crash or cause undefined runtime behavior.

To ensure that the entry point function always has the correct signature, the crate bootloaderprovides a macro entry_point. We rewrite our function using this macro:

// in src/main.rs
use bootloader::{BootInfo, entry_point};
entry_point!(kernel_main);
#[cfg(not(test))]
fn kernel_main(boot_info: &'static BootInfo) -> ! {
    […]
}

You no longer need to use for the entry point extern "C"or no_mangle, since the macro defines for us the real entry point of the lower level _start. The function has kernel_mainnow become a completely normal Rust function, so we can choose an arbitrary name for it. The important thing is that it is checked by type, so if you use the wrong signature, for example, by adding an argument or changing its type, a compilation error will occur

Implementation

Now we have access to physical memory, and we can finally begin the implementation of the system. First, consider the current active page tables on which the kernel runs. In the second step, create a translation function that returns the physical address to which this virtual address is mapped. In the last step, we’ll try to modify the page tables to create a new mapping.

First, create a new module in the code memory:

// in src/lib.rs
pub mod memory;

For the module, create an empty file src/memory.rs.

Access to page tables

At the end of the previous article, we tried to look at the table of pages on which the kernel works, but could not access the physical frame pointed to by the register CR3. Now we can continue working from this place: the function active_level_4_tablewill return a link to the active table of pages of the fourth level:

// in src/memory.rs
use x86_64::structures::paging::PageTable;
/// Returns a mutable reference to the active level 4 table.
///
/// This function is unsafe because the caller must guarantee that the
/// complete physical memory is mapped to virtual memory at the passed
/// `physical_memory_offset`. Also, this function must be only called once
/// to avoid aliasing `&mut` references (which is undefined behavior).
pub unsafe fn active_level_4_table(physical_memory_offset: u64)
    -> &'static mut PageTable
{
    use x86_64::{registers::control::Cr3, VirtAddr};
    let (level_4_table_frame, _) = Cr3::read();
    let phys = level_4_table_frame.start_address();
    let virt = VirtAddr::new(phys.as_u64() + physical_memory_offset);
    let page_table_ptr: *mut PageTable = virt.as_mut_ptr();
    &mut *page_table_ptr // unsafe
}

First, we read the physical frame of the active table of the 4th level from the register CR3. Then we take its physical starting address and convert it to a virtual address by adding physical_memory_offset. Finally, convert the address to a raw pointer *mut PageTableby the method as_mut_ptr, and then unsafely create a link from it &mut PageTable. We create the link &mutinstead &, because later in the article we will modify these page tables.

There is no need to insert an unsafe block here, because Rust regards the whole body unsafe fnas one large, unsafe block. This increases the risks, because it is possible to accidentally introduce an unsafe operation in the previous lines. It also makes it difficult to detect unsafe operations. An RFC has already been created to modify this behavior of Rust.

Now we can use this function to output the records of the fourth level table:

// in src/main.rs
#[cfg(not(test))]
fn kernel_main(boot_info: &'static BootInfo) -> ! {
    […] // initialize GDT, IDT, PICS
    use blog_os::memory::active_level_4_table;
    let l4_table = unsafe {
        active_level_4_table(boot_info.physical_memory_offset)
    };
    for (i, entry) in l4_table.iter().enumerate() {
        if !entry.is_unused() {
            println!("L4 Entry {}: {:?}", i, entry);
        }
    }
    println!("It did not crash!");
    blog_os::hlt_loop();
}

We physical_memory_offsetpass in the corresponding field of the structure BootInfo. Then we use a function iterto iterate through the page table entries and a combinator enumerateto add an index ito each element. Only non-empty entries are displayed, because all 512 entries will not fit on the screen.

When we run the code, we see this result:

We see several non-empty records that are mapped to various third-level tables. So many memory areas are used because separate areas are needed for kernel code, kernel stack, physical memory translation, and boot information.

To go through the page tables and look at the third level table, we can again convert the displayed frame to a virtual address:

// in the for loop in src/main.rs
use x86_64::{structures::paging::PageTable, VirtAddr};
if !entry.is_unused() {
    println!("L4 Entry {}: {:?}", i, entry);
    // get the physical address from the entry and convert it
    let phys = entry.frame().unwrap().start_address();
    let virt = phys.as_u64() + boot_info.physical_memory_offset;
    let ptr = VirtAddr::new(virt).as_mut_ptr();
    let l3_table: &PageTable = unsafe { &*ptr };
    // print non-empty entries of the level 3 table
    for (i, entry) in l3_table.iter().enumerate() {
        if !entry.is_unused() {
            println!("  L3 Entry {}: {:?}", i, entry);
        }
    }
}

To view the tables of the second and first levels, repeat this process, respectively, for records of the third and second levels. As you can imagine, the amount of code is growing very quickly, so we will not publish the full listing.

Manually traversing tables is interesting because it helps to understand how the processor translates addresses. But usually we are only interested in displaying one physical address for a specific virtual address, so let's create a function for this.

Address Translation

To translate a virtual address into a physical address, we must go through a four-level page table until we reach the mapped frame. Let's create a function that performs this address translation:

// in src/memory.rs
use x86_64::{PhysAddr, VirtAddr};
/// Translates the given virtual address to the mapped physical address, or
/// `None` if the address is not mapped.
///
/// This function is unsafe because the caller must guarantee that the
/// complete physical memory is mapped to virtual memory at the passed
/// `physical_memory_offset`.
pub unsafe fn translate_addr(addr: VirtAddr, physical_memory_offset: u64)
    -> Option
{
    translate_addr_inner(addr, physical_memory_offset)
}

We refer to a safe function translate_addr_innerto limit the amount of unsafe code. As noted above, Rust regards the entire body unsafe fnas a large unsafe block. By invoking one safe function, we again make each operation explicit unsafe.

A special internal function has real functionality:

// in src/memory.rs
/// Private function that is called by `translate_addr`.
///
/// This function is safe to limit the scope of `unsafe` because Rust treats
/// the whole body of unsafe functions as an unsafe block. This function must
/// only be reachable through `unsafe fn` from outside of this module.
fn translate_addr_inner(addr: VirtAddr, physical_memory_offset: u64)
    -> Option
{
    use x86_64::structures::paging::page_table::FrameError;
    use x86_64::registers::control::Cr3;
    // read the active level 4 frame from the CR3 register
    let (level_4_table_frame, _) = Cr3::read();
    let table_indexes = [
        addr.p4_index(), addr.p3_index(), addr.p2_index(), addr.p1_index()
    ];
    let mut frame = level_4_table_frame;
    // traverse the multi-level page table
    for &index in &table_indexes {
        // convert the frame into a page table reference
        let virt = frame.start_address().as_u64() + physical_memory_offset;
        let table_ptr: *const PageTable = VirtAddr::new(virt).as_ptr();
        let table = unsafe {&*table_ptr};
        // read the page table entry and update `frame`
        let entry = &table[index];
        frame = match entry.frame() {
            Ok(frame) => frame,
            Err(FrameError::FrameNotPresent) => return None,
            Err(FrameError::HugeFrame) => panic!("huge pages not supported"),
        };
    }
    // calculate the physical address by adding the page offset
    Some(frame.start_address() + u64::from(addr.page_offset()))
}

Instead of reusing the function, active_level_4_tablewe re-read the fourth-level frame from the register CR3, because this simplifies the implementation of the prototype. Do not worry, we will improve the solution soon.

The structure VirtAddralready provides methods for calculating indexes in tables of pages of four levels. We store these indexes in a small array, because it allows you to loop through all the tables for. Outside the loop, we remember the last frame visited to calculate the physical address later. framepoints to the frames of the page table during the iteration and to the associated frame after the last iteration, that is, after passing the level 1 record.

Inside the loop, we again applyphysical_memory_offsetto convert a frame to a page table link. Then we read the record of the current page table and use the function PageTableEntry::frameto retrieve the matched frame. If the record is not mapped to a frame, return None. If the record displays a huge page of 2 MiB or 1 GiB, so far we will have a panic.

So, let's check the translation function at some addresses:

// in src/main.rs
#[cfg(not(test))]
fn kernel_main(boot_info: &'static BootInfo) -> ! {
    […] // initialize GDT, IDT, PICS
    use blog_os::memory::translate_addr;
    use x86_64::VirtAddr;
    let addresses = [
        // the identity-mapped vga buffer page
        0xb8000,
        // some code page
        0x20010a,
        // some stack page
        0x57ac_001f_fe48,
        // virtual address mapped to physical address 0
        boot_info.physical_memory_offset,
    ];
    for &address in &addresses {
        let virt = VirtAddr::new(address);
        let phys = unsafe {
            translate_addr(virt, boot_info.physical_memory_offset)
        };
        println!("{:?} -> {:?}", virt, phys);
    }
    println!("It did not crash!");
    blog_os::hlt_loop();
}

When we run the code, we get the following result:

As expected, with an identical mapping, the address is 0xb8000converted to the same physical address. The code page and the stack page are converted to arbitrary physical addresses, which depend on how the loader created the initial mapping for our kernel. The mapping physical_memory_offsetshould point to the physical address 0, but fails, because the translation uses huge pages for efficiency. A future version of the bootloader may apply the same optimization for the kernel and stack pages.

Using MappedPageTable

Translation of virtual addresses into physical addresses is a typical task of the OS kernel, therefore the crate x86_64provides an abstraction for it. It already supports huge pages and several other functions, except translate_addr, therefore, we use it instead of adding support for large pages to our own implementation.

The basis of abstraction is two traits that define various translation functions of the page table:

The trait Mapperprovides functions that work on pages. For example, translate_pageto translate this page into a frame of the same size, as well as map_toto create a new mapping in the table.
The trait MapperAllSizesimplies application Mapperfor all page sizes. In addition, it provides functions that work with pages of different sizes, including translate_addror general translate.

Traits define only the interface, but do not provide any implementation. Now the crate x86_64provides two types that implement traits: MappedPageTableand RecursivePageTable. The first requires that each frame of the page table is displayed somewhere (for example, with an offset). The second type can be used if the table of the fourth level is displayed recursively.

We have all the physical memory mapped to physical_memory_offset, so you can use the MappedPageTable type. To initialize it, create a new function initin the module memory:

use x86_64::structures::paging::{PhysFrame, MapperAllSizes, MappedPageTable};
use x86_64::PhysAddr;
/// Initialize a new MappedPageTable.
///
/// This function is unsafe because the caller must guarantee that the
/// complete physical memory is mapped to virtual memory at the passed
/// `physical_memory_offset`. Also, this function must be only called once
/// to avoid aliasing `&mut` references (which is undefined behavior).
pub unsafe fn init(physical_memory_offset: u64) -> impl MapperAllSizes {
    let level_4_table = active_level_4_table(physical_memory_offset);
    let phys_to_virt = move |frame: PhysFrame| -> *mut PageTable {
        let phys = frame.start_address().as_u64();
        let virt = VirtAddr::new(phys + physical_memory_offset);
        virt.as_mut_ptr()
    };
    MappedPageTable::new(level_4_table, phys_to_virt)
}
// make private
unsafe fn active_level_4_table(physical_memory_offset: u64)
    -> &'static mut PageTable
{…}

We cannot directly return MappedPageTablefrom a function because it is common to a closure type. We will get around this problem with a syntax construct impl Trait. An additional advantage is that you can then switch the kernel to RecursivePageTablewithout changing the signature of the function.

The function MappedPageTable::newexpects two parameters: a mutable link to the page table of level 4 and a closure phys_to_virtthat converts the physical frame into a page table pointer *mut PageTable. For the first parameter, we can reuse the function active_level_4_table. For the second, we create a closure that uses physical_memory_offsetto perform the conversion.

We also make it a active_level_4_tableprivate function, because from now on it will only be called from init.

To use the methodMapperAllSizes::translate_addrinstead of our own function memory::translate_addr, we need to change just a few lines in kernel_main:

// in src/main.rs
#[cfg(not(test))]
fn kernel_main(boot_info: &'static BootInfo) -> ! {
    […] // initialize GDT, IDT, PICS
    // new: different imports
    use blog_os::memory;
    use x86_64::{structures::paging::MapperAllSizes, VirtAddr};
    // new: initialize a mapper
    let mapper = unsafe { memory::init(boot_info.physical_memory_offset) };
    let addresses = […]; // same as before
    for &address in &addresses {
        let virt = VirtAddr::new(address);
        // new: use the `mapper.translate_addr` method
        let phys = mapper.translate_addr(virt);
        println!("{:?} -> {:?}", virt, phys);
    }
    println!("It did not crash!");
    blog_os::hlt_loop();
}

After starting, we see the same translation results as before, but only huge pages now also work:

As expected, the virtual address is physical_memory_offsetconverted to a physical address 0x0. Using the translation function for the type MappedPageTable, we eliminate the need to implement support for huge pages. We also have access to other page functions, such as map_towhich we will use in the next section. At this stage, we no longer need the function memory::translate_addr, you can delete it if you want.

Create a new mapping

So far, we have only looked at page tables, but have not changed anything. Let's create a new mapping for a previously not displayed page.

We will use the function map_tofrom the trait Mapper, so first we will consider this function. The documentation says that it requires four arguments: the page we want to display; The frame to which the page should be mapped. set of flags for writing page table and frame distributor frame_allocator. A frame allocator is necessary because mapping this page may require the creation of additional tables that need unused frames as backup storage.

Function `create_example_mapping`

The first step in our implementation is to create a new function create_example_mappingthat maps this page to the 0xb8000physical frame of the VGA text buffer. We select this frame because it makes it easy to check whether the display was created correctly: we just need to write to the recently displayed page and see if it appears on the screen.

The function create_example_mappinglooks like this:

// in src/memory.rs
use x86_64::structures::paging::{Page, Size4KiB, Mapper, FrameAllocator};
/// Creates an example mapping for the given page to frame `0xb8000`.
pub fn create_example_mapping(
    page: Page,
    mapper: &mut impl Mapper,
    frame_allocator: &mut impl FrameAllocator,
) {
    use x86_64::structures::paging::PageTableFlags as Flags;
    let frame = PhysFrame::containing_address(PhysAddr::new(0xb8000));
    let flags = Flags::PRESENT | Flags::WRITABLE;
    let map_to_result = unsafe {
        mapper.map_to(page, frame, flags, frame_allocator)
    };
    map_to_result.expect("map_to failed").flush();
}

In addition to the page pageyou want to map, the function expects an instance of mapperand frame_allocator. The type mapperimplements the trait that the method provides . A general parameter is necessary, since the trait is common for the trait , working with both standard 4 KiB pages and huge pages of 2 MiB and 1 GiB. We want to create only 4 KiB pages, so we can use it instead of the requirement . For comparison, set the flag , since it is necessary for all valid entries, and the flag to make the displayed page writable. CallMappermap_toSize4KiBMapperPageSizeMapperMapperAllSizes

PRESENTWRITABLEmap_tounsafe: you can violate memory safety with invalid arguments, so you have to use a block unsafe. For a list of all possible flags, see the “Page Table Format” section of the previous article .

The function map_tomay fail, so it returns Result. Since this is just an example of code that should not be reliable, we simply use it expectto panic in the event of an error. If successful, the function returns a type MapperFlushthat provides an easy way to clear the recently displayed page from the dynamic translation buffer (TLB) using the method flush. Like Resultthis type uses the [ #[must_use]] attribute to issue a warning if we accidentally forget to use it.

Fictitious `FrameAllocator`

To call create_example_mapping, you must first create FrameAllocator. As noted above, the complexity of creating a new display depends on the virtual page we want to display. In the simplest case, a level 1 table for the page already exists, and we only need to make one record. In the most difficult case, the page is in a memory area for which level 3 has not yet been created, so first you will have to create page tables of levels 3, 2 and 1.

Let's start with a simple case and assume that you do not need to create new page tables. A frame distributor that always returns is enough for this None. We create such a EmptyFrameAllocatordisplay function for testing:

// in src/memory.rs
/// A FrameAllocator that always returns `None`.
pub struct EmptyFrameAllocator;
impl FrameAllocator for EmptyFrameAllocator {
    fn allocate_frame(&mut self) -> Option {
        None
    }
}

Now you need to find a page that can be displayed without creating new page tables. The loader is loaded into the first megabyte of the virtual address space, so we know that for this region there is a valid level 1 table. For our example, we can select any unused page in this memory area, for example, the page at the address 0x1000.

To test the function, we first display the page 0x1000, and then display the contents of the memory:

// in src/main.rs
#[cfg(not(test))]
fn kernel_main(boot_info: &'static BootInfo) -> ! {
    […] // initialize GDT, IDT, PICS
    use blog_os::memory;
    use x86_64::{structures::paging::Page, VirtAddr};
    let mut mapper = unsafe { memory::init(boot_info.physical_memory_offset) };
    let mut frame_allocator = memory::EmptyFrameAllocator;
    // map a previously unmapped page
    let page = Page::containing_address(VirtAddr::new(0x1000));
    memory::create_example_mapping(page, &mut mapper, &mut frame_allocator);
    // write the string `New!` to the screen through the new mapping
    let page_ptr: *mut u64 = page.start_address().as_mut_ptr();
    unsafe { page_ptr.offset(400).write_volatile(0x_f021_f077_f065_f04e)};
    println!("It did not crash!");
    blog_os::hlt_loop();
}

First, we create a mapping for the page in 0x1000, calling a function create_example_mappingwith a mutable link to instances mapperand frame_allocator. This maps the page 0x1000to the VGA text buffer frame, so we should see what is written there on the screen.

Then convert the page to a raw pointer and write the value to the offset 400. We do not write to the top of the page because the top line of the VGA buffer is directly shifted from the screen as follows println. Write the value 0x_f021_f077_f065_f04ethat corresponds to the string “New!” on white background. As we learned in the article “VGA Text Mode” , writing to the VGA buffer must be volatile, so we use the method write_volatile.

When we run the code in QEMU, we see the following result:

After writing to the page 0x1000, the inscription “New!” . So, we have successfully created a new mapping in page tables.

This collation worked because there was already a level 1 table for collation 0x1000. When we try to map a page for which a level 1 table does not yet exist, the function map_tofails because it tries to allocate frames from EmptyFrameAllocatorto create new tables. We see that this happens when we try to display the page 0xdeadbeaf000instead of 0x1000:

// in src/main.rs
#[cfg(not(test))]
fn kernel_main(boot_info: &'static BootInfo) -> ! {
    […]
    let page = Page::containing_address(VirtAddr::new(0xdeadbeaf000));
    […]
}

If this is started, a panic occurs with the following error message:

panicked at 'map_to failed: FrameAllocationFailed', /…/result.rs:999:5

To display pages that do not yet have a page level 1 table, you need to create the correct one FrameAllocator. But how do you know which frames are free and how much physical memory is available?

Frame Selection

For new page tables, you need to create the correct frame distributor. Let's start with the general skeleton:

// in src/memory.rs
pub struct BootInfoFrameAllocator where I: Iterator {
    frames: I,
}
impl FrameAllocator for BootInfoFrameAllocator
    where I: Iterator
{
    fn allocate_frame(&mut self) -> Option {
        self.frames.next()
    }
}

Поле frames может быть инициализировано произвольным итератором кадров. Это позволяет просто делегировать вызовы alloc методу Iterator::next.

Для инициализации BootInfoFrameAllocator используем карту памяти memory_map, которую передаёт загрузчик как часть структуры BootInfo. Как объяснилось в разделе «Загрузочная информация», карта памяти предоставляется прошивкой BIOS/UEFI. Её можно запросить только в самом начале процесса загрузки, поэтому загрузчик уже вызвал нужные функции.

Карта памяти состоит из списка структур MemoryRegion, которые содержат начальный адрес, длину и тип (например, неиспользуемый, зарезервированный и т. д.) каждой области памяти. Создав итератор, который выдаёт кадры из неиспользуемых областей, мы можем создать валидный BootInfoFrameAllocator.

Инициализация BootInfoFrameAllocator происходит в новой функции init_frame_allocator:

// in src/memory.rs use bootloader::bootinfo::{MemoryMap, MemoryRegionType}; /// Create a FrameAllocator from the passed memory map pub fn init_frame_allocator( memory_map: &'static MemoryMap, ) -> BootInfoFrameAllocator> { // get usable regions from memory map let regions = memory_map .iter() .filter(|r| r.region_type == MemoryRegionType::Usable); // map each region to its address range let addr_ranges = regions.map(|r| r.range.start_addr()..r.range.end_addr()); // transform to an iterator of frame start addresses let frame_addresses = addr_ranges.flat_map(|r| r.step_by(4096)); // create `PhysFrame` types from the start addresses let frames = frame_addresses.map(|addr| { PhysFrame::containing_address(PhysAddr::new(addr)) }); BootInfoFrameAllocator { frames } }

Эта функция использует комбинатор для преобразования начальной карты MemoryMap в итератор используемых физических фреймов:

Во-первых, вызываем метод iter для преобразования карты памяти в итератор MemoryRegion. Затем используем метод filter для пропуска зарезервированных или недоступных регионов. Загрузчик обновляет карту памяти для всех сопоставлений, которые создаёт, поэтому фреймы, используемые ядром (код, данные или стек) или для хранения информации о загрузке, уже помечены как InUse или аналогично. Таким образом, мы можем быть уверены, что фреймы Usable не используются где-то ещё.
На втором этапе запускаем комбинатор map и синтаксисическую конструкцию range из Rust для преобразования итератора областей памяти в итератор диапазонов адресов.
Третий шаг самый сложный: преобразуем каждый диапазон в итератор с помощью метода into_iter, а затем выбираем каждый 4096-й адрес с помощью step_by. Поскольку 4096 байт (= 4 КиБ) — это размер страницы, мы получаем начальный адрес каждого фрейма. Страница загрузчика выравнивает все области памяти, так что нам не нужен код выравнивания или округления. Используя flat_map вместо map, мы получаем Iterator вместо Iterator>.
На последнем шаге преобразуем начальные адреса в типы PhysFrame, чтобы построить требуемый Iterator. Затем применяем этот итератор для создания и возврата нового BootInfoFrameAllocator.

Теперь можно изменить нашу функцию kernel_main, чтобы передать экземпляр BootInfoFrameAllocator вместо EmptyFrameAllocator:

// in src/main.rs #[cfg(not(test))] fn kernel_main(boot_info: &'static BootInfo) -> ! { […] let mut frame_allocator = memory::init_frame_allocator(&boot_info.memory_map); […] }

На этот раз сопоставление адресов прошло успешно и мы снова видим на экране чёрно-белую надпись “New!”. За кулисами метод map_to создаёт отсутствующие таблицы страниц следующим образом:

Выделить неиспользуемый фрейм из переданного frame_allocator.
Обнулить фрейм для создания новой пустой таблицы страниц.
Сопоставить запись таблицы более высокого уровня с этим фреймом.
Перейти к следующему уровню таблицы.

Хотя наша функция create_example_mapping — всего лишь пример кода, теперь мы можем создавать новые сопоставления для произвольных страниц. Это будет необходимо для выделения памяти и реализации многопоточности в будущих статьях.

Резюме

В этой статье мы узнали о различных методах доступа к физическим фреймам таблиц страниц, включая тождественное отображение, отображение полной физической памяти, временное отображение и рекурсивные таблицы страниц. Мы выбрали отображение полной физической памяти как простой и мощный метод.

Мы не можем сопоставить физическую память из ядра без доступа к таблице страниц, поэтому нужна поддержка загрузчика. Крейт bootloader создаёт необходимые сопоставления через дополнительные функции cargo. Он передаёт необходимую информацию ядру как аргумент &BootInfo в функции точки входа.

Для нашей реализации мы сначала вручную прошли через таблицы страниц, сделав функцию трансляции, а затем использовали тип MappedPageTable крейта x86_64. We also learned how to create new mappings in the page table and how to make them FrameAllocatoron a memory card transmitted by the bootloader.

What's next?

In the next article, we will create a heap memory area for our kernel, which will allow us to allocate memory and use different types of collections .

Tags:
page organization of memory
operating system
x86
address translation
virtual memory
OS kernel
recursive page tables
full memory display

We write an operating system on Rust. Implementing page memory (new)

Content

Introduction

Dependency Updates

Access to page tables

Identity mapping

Fixed offset map

Full physical memory mapping

Temporary display

Recursive Page Tables

Address Calculation

Rust code

Bootloader support

Boot information

Entry point macro

Implementation

Access to page tables

Address Translation

Using MappedPageTable

Create a new mapping

Function create_example_mapping

Fictitious FrameAllocator

Frame Selection

Резюме

What's next?

Also popular now:

Function `create_example_mapping`

Fictitious `FrameAllocator`