Virtual memory in ARMv7
Hello!
The article provides an overview of the virtual memory system architecture ARMv7.
The virtual memory system performs several tasks. First, it allows you to place user processes in separate, isolated from each other, memory spaces. This allows you to increase the reliability of the system, the errors of one process do not affect the operation of other processes. Secondly, the OS can provide the process with more memory than the system has. Unused pages of memory are displaced in a permanent repository, and then loaded the desired, forming the illusion of great e of memory than is actually available. Thirdly, continuous virtual space makes it easy to write custom software. All processes are executed in the same space, the OS hides from them the real memory configuration in the system.
The following definitions are used in the article:
Virtual Address — The address used by the processor core. The stack pointer, the instruction counter, the return register use a virtual address.
Physical address - the output address on the processor bus.
A page is a unit of addressing virtual memory.
Section - similar to the page, has a larger size.
A frame is a unit of physical memory addressing.
Page Table - an array of entries for address translation.
ASID - address space identifier.
TLB - fast address translation buffer.
MMU - memory management unit.
TLB is a very fast hardware buffer containing the results of the latest address translation. The kernel request for the translation of the page address and the current ASID is sent to the TLB. If there is a valid entry, then the permissions to access this memory are checked, the access method and the corresponding frame address is returned to the MMU block. If memory access is denied, a hardware exception is generated. If a TLB miss occurred (no record was found), then further behavior depends on the TTBCR register. You can search in the page tables or generate an exception.
It is important to note that when manipulating with page tables it is necessary to correctly reset the TLB, since irrelevant information may remain there.
Updating entries in TLB is transparent to the programmer using the round-robin algorithm.
It is also possible to download and fix some TLB entries to prevent them from being preempted.
Figure 1. TLB
ARMv7 is a 32-bit architecture, so 4GB of addressable virtual memory is available to us.
Page tables are divided into 2 levels - L1 and L2.
Table L1 describes all 4GB of address space. It consists of 4096 records 32bit long, each of which describes 1Mb. Entries in the table are selected by the high 12 bits of the virtual address.
Fig. 2 Search for an entry in table L1
Table L1 is located in physical memory and is aligned with the 16Kb boundary. There are 4 variants of these records: for the description of pages, sections and supersections. Well, an empty entry for the memory, which has not yet been zamapleno.
Fig. 3 Types of records in L1
Bits 0 and 1 indicate the type of record 00b-Fault, 01b - descriptor of pages, 10b - descriptor of sections (and supersections).
If physical memory is paginated, then table L1 stores the address of table L2 (physical, aligned to 1Kb). Bit 9 is determined by the manufacturer (Implementation defined), bits [8: 5] - for the domain mechanism (Deprecated in ARMv7), SBZ - zeros.
If we decide to divide the memory into sections, then the corresponding physical address must be written in L1. The section directly refers to the 1MB physical memory alignment area. No need for table L2. A supersection is a special case of partitioning, the record in the L1 table should be repeated 16 times, the alignment of the allocated blocks of physical and virtual memory is also 16Mb.
Table L2 consists of 256 records of 32bit. It should be aligned to 1Kb.
Fig. 4 Finding an entry in table L2
The indices in table L2 are formed from the middle 8 bits [19:12] of the virtual address. Each entry in the table contains the frame address.
Fig. 5 Types of entries in L2
Pages can be in two sizes: 64Kb (Large page) and 4Kb (Small page).
The APX and APX bits set read / write permissions in privileged / unprivileged mode (kernel / user). Bits TEX, C, B, S are responsible for the type of memory, its caching and read-write buffering. Bit nG - nonGlobal allows access to the page for all processes or only for one specific ASID.
The use of large pages reduces the number of entries in the TLB. Instead of 16 entries (4Kb * 16 = 64Kb), only one will be stored there. However, it is necessary to make 16 identical entries in table L2.
The ability to address different block sizes allows, on the one hand, to allocate memory with the desired granularity, on the other hand, to reduce the number of calls to page tables in a relatively slow memory.
A special co-processor CP15 is designed to control the system (including the MMU block) in the ARM architecture. By the management of memory and a half dozen of its regists. We are interested in several of them - Control, TTBR0 / 1, TTBCR, ContextID.
In the Control register, the low-order bit is responsible for the on / off MMU, everything is simple.
The TTBR0 / 1 register pair contains the physical addresses of the first level tables. At these addresses, the MMU starts searching for the desired page.
The TTBCR register allows you to divide the entire address space into 2 parts between TTBR0 and TTBR1. Each of them will broadcast its part of the addresses. To set the size, bits [2: 0] are used. The recorded number (from 0 to 7 decimal) masks the most significant part of the virtual addresses. If its value is "0" - all addresses are broadcast via TTBR0. If "1" - 31bit addresses are masked and the bottom 2GB of virtual space pass through TTBR0, the top one - through TTBR1. “2” - 31 and 30 bits are masked and the division into 1 GB and 3 GB is obtained, respectively. Thus, the lower part of the addresses can be used for user applications, overloading the TTBR0 register for the new process, and leaving the upper part for system needs.
Fig. 6 Split address space
Bits [5: 4] are responsible for the behavior of TLB miss - a search in the page tables or an exception.
The ContextID register contains the ASID field for the current process. It needs to be changed along with the contents of the TTBR0 register when the context changes.
The algorithm for converting virtual to physical addresses is as follows:
If the kernel previously requested a virtual page, then it is stored in the TLB. In this case, the MMU gets it out of the cache and nothing needs to be done. If the page is requested for the first time (or it was ousted from there - TLB is not very large), then the search in the tables L1-L2 takes place. Thus, the virtual and physical address mapping is as follows:
In total, the virtual memory subsystem consists of the following parts:
ARM Architecture Reference Manual ARMv7-A and ARMv7-R Edition
ARM Cortex-A Series Programmer's Guide
The article provides an overview of the virtual memory system architecture ARMv7.
Spoiler header
Здесь не рассмотрены тонкости кэширования, DMA, LPAE и подобное. За более подробным описанием можно обратиться к литературе в конце статьи.
Introduction
The virtual memory system performs several tasks. First, it allows you to place user processes in separate, isolated from each other, memory spaces. This allows you to increase the reliability of the system, the errors of one process do not affect the operation of other processes. Secondly, the OS can provide the process with more memory than the system has. Unused pages of memory are displaced in a permanent repository, and then loaded the desired, forming the illusion of great e of memory than is actually available. Thirdly, continuous virtual space makes it easy to write custom software. All processes are executed in the same space, the OS hides from them the real memory configuration in the system.
Definitions
The following definitions are used in the article:
Virtual Address — The address used by the processor core. The stack pointer, the instruction counter, the return register use a virtual address.
Physical address - the output address on the processor bus.
A page is a unit of addressing virtual memory.
Section - similar to the page, has a larger size.
A frame is a unit of physical memory addressing.
Page Table - an array of entries for address translation.
ASID - address space identifier.
TLB - fast address translation buffer.
MMU - memory management unit.
Tlb
TLB is a very fast hardware buffer containing the results of the latest address translation. The kernel request for the translation of the page address and the current ASID is sent to the TLB. If there is a valid entry, then the permissions to access this memory are checked, the access method and the corresponding frame address is returned to the MMU block. If memory access is denied, a hardware exception is generated. If a TLB miss occurred (no record was found), then further behavior depends on the TTBCR register. You can search in the page tables or generate an exception.
It is important to note that when manipulating with page tables it is necessary to correctly reset the TLB, since irrelevant information may remain there.
Updating entries in TLB is transparent to the programmer using the round-robin algorithm.
It is also possible to download and fix some TLB entries to prevent them from being preempted.
Figure 1. TLB
Page tables
ARMv7 is a 32-bit architecture, so 4GB of addressable virtual memory is available to us.
Page tables are divided into 2 levels - L1 and L2.
Table L1 describes all 4GB of address space. It consists of 4096 records 32bit long, each of which describes 1Mb. Entries in the table are selected by the high 12 bits of the virtual address.
Fig. 2 Search for an entry in table L1
Table L1 is located in physical memory and is aligned with the 16Kb boundary. There are 4 variants of these records: for the description of pages, sections and supersections. Well, an empty entry for the memory, which has not yet been zamapleno.
Fig. 3 Types of records in L1
Bits 0 and 1 indicate the type of record 00b-Fault, 01b - descriptor of pages, 10b - descriptor of sections (and supersections).
If physical memory is paginated, then table L1 stores the address of table L2 (physical, aligned to 1Kb). Bit 9 is determined by the manufacturer (Implementation defined), bits [8: 5] - for the domain mechanism (Deprecated in ARMv7), SBZ - zeros.
If we decide to divide the memory into sections, then the corresponding physical address must be written in L1. The section directly refers to the 1MB physical memory alignment area. No need for table L2. A supersection is a special case of partitioning, the record in the L1 table should be repeated 16 times, the alignment of the allocated blocks of physical and virtual memory is also 16Mb.
Table L2 consists of 256 records of 32bit. It should be aligned to 1Kb.
Fig. 4 Finding an entry in table L2
The indices in table L2 are formed from the middle 8 bits [19:12] of the virtual address. Each entry in the table contains the frame address.
Fig. 5 Types of entries in L2
Pages can be in two sizes: 64Kb (Large page) and 4Kb (Small page).
The APX and APX bits set read / write permissions in privileged / unprivileged mode (kernel / user). Bits TEX, C, B, S are responsible for the type of memory, its caching and read-write buffering. Bit nG - nonGlobal allows access to the page for all processes or only for one specific ASID.
The use of large pages reduces the number of entries in the TLB. Instead of 16 entries (4Kb * 16 = 64Kb), only one will be stored there. However, it is necessary to make 16 identical entries in table L2.
The ability to address different block sizes allows, on the one hand, to allocate memory with the desired granularity, on the other hand, to reduce the number of calls to page tables in a relatively slow memory.
Registers
A special co-processor CP15 is designed to control the system (including the MMU block) in the ARM architecture. By the management of memory and a half dozen of its regists. We are interested in several of them - Control, TTBR0 / 1, TTBCR, ContextID.
In the Control register, the low-order bit is responsible for the on / off MMU, everything is simple.
The TTBR0 / 1 register pair contains the physical addresses of the first level tables. At these addresses, the MMU starts searching for the desired page.
The TTBCR register allows you to divide the entire address space into 2 parts between TTBR0 and TTBR1. Each of them will broadcast its part of the addresses. To set the size, bits [2: 0] are used. The recorded number (from 0 to 7 decimal) masks the most significant part of the virtual addresses. If its value is "0" - all addresses are broadcast via TTBR0. If "1" - 31bit addresses are masked and the bottom 2GB of virtual space pass through TTBR0, the top one - through TTBR1. “2” - 31 and 30 bits are masked and the division into 1 GB and 3 GB is obtained, respectively. Thus, the lower part of the addresses can be used for user applications, overloading the TTBR0 register for the new process, and leaving the upper part for system needs.
Fig. 6 Split address space
Bits [5: 4] are responsible for the behavior of TLB miss - a search in the page tables or an exception.
The ContextID register contains the ASID field for the current process. It needs to be changed along with the contents of the TTBR0 register when the context changes.
Address Translation
The algorithm for converting virtual to physical addresses is as follows:
- Search for the requested virtual address and ASID in the TLB buffer
- If the TLB does not have the required address, then a hardware search in the page tables occurs.
If the kernel previously requested a virtual page, then it is stored in the TLB. In this case, the MMU gets it out of the cache and nothing needs to be done. If the page is requested for the first time (or it was ousted from there - TLB is not very large), then the search in the tables L1-L2 takes place. Thus, the virtual and physical address mapping is as follows:
- In the register TTBR0 \ TTBR1, the address of the table L1 is searched.
- The upper 10 bits of the virtual address form an index in the table.
- a) If the entry corresponds to a section (supersection), then the section attributes are checked and, if everything is OK, the resulting physical address is composed of the base section address (supersection) and the lower 20 (24) bits of the virtual address. Spoiler header
Рис. 7 Трансляция адресов в суперсекции
b) If the entry is a table L2, then the search continues in it. The middle part of the page's virtual address forms the index of the table.Spoiler header
Рис. 8 Трансляция адресов в таблице L2 - TLB update in progress
In total, the virtual memory subsystem consists of the following parts:
- Several control registers CP15
- Page tables containing address translation rules
- TLB - successful broadcast cache
- MMU is a block dedicated to address translation.
Literature
ARM Architecture Reference Manual ARMv7-A and ARMv7-R Edition
ARM Cortex-A Series Programmer's Guide