
4 ways to write to a secure page
This refers to writing to a hardware-write-protected memory address in the x86 architecture. And the way this is done in the Linux operating system. And, of course, in Linux kernel mode, because in user space, such tricks are prohibited. It happens, you know, an irresistible desire to write to a protected area ... when you sit down to write a virus or a trojan ...
... but seriously, the problem of writing to write-protected pages of RAM occurs from time to time when programming kernel modules under Linux. For example, when modifying the sys_call_table system call selector table to modify, embed, implement, replace, intercept a system call, this action is called differently in different publications. But not only for these purposes ... In a very brief summary, the situation looks like this:
The relevance of the task of writing to a hardware-protected area of memory is indicated by a noticeable number of publications on this topic and the number of proposed methods for solving the problem. The rest of the review is devoted to the sequential consideration of the methods ... For each of the methods it will be given:
The simplest solution to this problem is to temporarily disable page protection by resetting the X86_CR0_WP bit of register CR0. I use this method for a good ten years, and it is mentioned in several publications of different years, for example, WP: Safe or Not? (Dan Rosenberg, 2011). One way to do this is with inline assembler inserts (macros, GCC compiler extension). In my version and in the demo test, this variant looks like this (file rw_cr0.c):
(Saving and restoring the eax register can be excluded, it is shown here ... solely for the purity of the experiment.)
The first thing that always object to such a method at first glance is that, since it is based on the management of a specific processor, in SMP systems between installing the CR0 register and writing to the protected area, the module execution can be rescheduled to another processor for which page protection not disconnected. The likelihood of such a combination of circumstances is no more than if you were bitten by a snake in the center of Moscow that escaped from the zoo. But the probability of this exists and it is finite, albeit vanishingly small. In order to prevent this situation from arising from assembly code, we prohibit local interruptions of the processor by the cli operation before writing, and release interrupts only after the recording is completed by the sti operation (Dan Rosenberg does the same in the publication cited).
What is much more unpleasant in the code shown is that it is written for 32-bit architecture (i386), and in 64-bit architecture it will not only run, but even compile. This can be resolved by having various architecture-specific codes:
You can do the same as before, but relying not on assembler code, but on the kernel API (rw_pax.c file). Here is a snippet of such code in almost the same form as Dan Rosenberg cites it:
The note “almost” refers to the fact that the preempt_enable_no_resched () call was available until kernel 3.13 (in 2011, when the article was written). Starting from kernel 3.14 onwards, this call is closed with this conditional preprocessor definition:
But the macros preempt_enable_no_resched () and preempt_count_dec () are almost identical in later kernels.
What is more unpleasant is the fact that the code shown runs safely in later versions (older than 3.14) of the kernel, but soon after its execution, warning messages from the kernel appear from other applications, of the form:
(I did not delve into what was happening in detail ... I did not consider it necessary, but it was somehow connected with the imbalance of work between SMP processors, or the assessment of such balancing.)
Even the warnings that appear in the kernel are serious enough, I would like to get rid of them. This can be achieved by repeating the trick with local interrupts from the previously described assembler code (rw_pai.c file):
This code successfully compiles and works in both 32 and 64 bit architectures, and this is its advantage over the previous one.
The next proposed method is to set the _PAGE_BIT_RW bit in a PTE record describing the memory page of interest to us (rw_pte.c file):
According to the execution logic, the code is absolutely clear. The code itself in the form of almost how it is shown here, I first met in a discussion on Habrahabr (Alexey Derlaft, Vladimir, 2013), and later, much more thoroughly, in a discussion on the forum modification of system calls (Max Filippov, St. Petersburg, 2015).
This code has been tested in both 32 and 64 bit architectures.
Another method (the last one considered today) is proposed in the article “Kosher way to modify write-protected areas of the Linux kernel” (Ilya V. Matveychikov, Moscow, late 2013). I will not say anything good or bad about the author’s culinary preferences in his national cuisine ... I don’t know, but with regard to the proposed technique, I should note that it is original and beautiful (rw_map.c file):
This method works in both 32 and 64 bit architecture. At some disadvantage, it can be attributed some cumbersomeness for solving a fairly simple task ("from a gun on sparrows"), despite the fact that, at first glance, it does not show significant advantages over previous methods. But this technique (and this code is practically unchanged) can be successfully used for a wider range of tasks than discussed.
And now, in order not to be unfounded, the time has come to check all of the above said by a full-scale experiment. To check, create a kernel module (srw.c file):
Some heaviness, cumbersome code is due only to the fact that:
And here is how it looks only in one of the architectures under test (at least 5 different architectures and kernel versions were actually tested), alternate use of all methods:
This review is not intended as a textbook or guide to action. Here, various techniques with essentially equivalent actions used by different authors are only systematically collected.
It would be interesting to continue the discussion regarding the advantages and disadvantages of each of these methods.
Or supplement the listed ways to perform the action with new options ... 5th, 6th, etc.
All codes discussed (for verification, or use, or further improvement) can be taken here or here .
Description of the problem
... but seriously, the problem of writing to write-protected pages of RAM occurs from time to time when programming kernel modules under Linux. For example, when modifying the sys_call_table system call selector table to modify, embed, implement, replace, intercept a system call, this action is called differently in different publications. But not only for these purposes ... In a very brief summary, the situation looks like this:
- The x86 architecture has a protective mechanism that, when trying to write to write-protected memory pages, throws an exception.
- Access rights to the page (write permission or prohibition) are described by the _PAGE_BIT_RW bit (1st) in the pte_t type structure corresponding to this page. Resetting this bit prohibits writing to the page.
- On the processor side, write protection is controlled by the X86_CR0_WP bit (16th) of the CR0 system control register - when this bit is set, an attempt to write to the write-protected page raises an exception for this processor.
The relevance of the task of writing to a hardware-protected area of memory is indicated by a noticeable number of publications on this topic and the number of proposed methods for solving the problem. The rest of the review is devoted to the sequential consideration of the methods ... For each of the methods it will be given:
- Sample code tested and usable;
- I know of references to the authorship of such code (although this is very relative, because there are quite a lot of independent sources for solving this problem);
Turn off page protection, assembler
The simplest solution to this problem is to temporarily disable page protection by resetting the X86_CR0_WP bit of register CR0. I use this method for a good ten years, and it is mentioned in several publications of different years, for example, WP: Safe or Not? (Dan Rosenberg, 2011). One way to do this is with inline assembler inserts (macros, GCC compiler extension). In my version and in the demo test, this variant looks like this (file rw_cr0.c):
static inline void rw_enable( void ) {
asm( "cli \n"
"pushl %eax \n"
"movl %cr0, %eax \n"
"andl $0xfffeffff, %eax \n"
"movl %eax, %cr0 \n"
"popl %eax" );
}
static inline void rw_disable( void ) {
asm( "pushl %eax \n"
"movl %cr0, %eax \n"
"orl $0x00010000, %eax \n"
"movl %eax, %cr0 \n"
"popl %eax \n"
"sti " );
}
(Saving and restoring the eax register can be excluded, it is shown here ... solely for the purity of the experiment.)
The first thing that always object to such a method at first glance is that, since it is based on the management of a specific processor, in SMP systems between installing the CR0 register and writing to the protected area, the module execution can be rescheduled to another processor for which page protection not disconnected. The likelihood of such a combination of circumstances is no more than if you were bitten by a snake in the center of Moscow that escaped from the zoo. But the probability of this exists and it is finite, albeit vanishingly small. In order to prevent this situation from arising from assembly code, we prohibit local interruptions of the processor by the cli operation before writing, and release interrupts only after the recording is completed by the sti operation (Dan Rosenberg does the same in the publication cited).
What is much more unpleasant in the code shown is that it is written for 32-bit architecture (i386), and in 64-bit architecture it will not only run, but even compile. This can be resolved by having various architecture-specific codes:
#ifdef __i386__
// ... то, что было показано выше
#else
static inline void rw_enable( void ) {
asm( "cli \n"
"pushq %rax \n"
"movq %cr0, %rax \n"
"andq $0xfffffffffffeffff, %rax \n"
"movq %rax, %cr0 \n"
"popq %rax " );
}
static inline void rw_disable( void ) {
asm( "pushq %rax \n"
"movq %cr0, %rax \n"
"xorq $0x0000000000001000, %rax \n"
"movq %rax, %cr0 \n"
"popq %rax \n"
"sti " );
}
#endif
Disable Page Protection, Kernel API
You can do the same as before, but relying not on assembler code, but on the kernel API (rw_pax.c file). Here is a snippet of such code in almost the same form as Dan Rosenberg cites it:
#include
#include
#include
#include
static inline unsigned long native_pax_open_kernel( void ) {
unsigned long cr0;
preempt_disable();
barrier();
cr0 = read_cr0() ^ X86_CR0_WP;
BUG_ON( unlikely( cr0 & X86_CR0_WP ) );
write_cr0( cr0 );
return cr0 ^ X86_CR0_WP;
}
static inline unsigned long native_pax_close_kernel( void ) {
unsigned long cr0;
cr0 = read_cr0() ^ X86_CR0_WP;
BUG_ON( unlikely( !( cr0 & X86_CR0_WP ) ) );
write_cr0( cr0 );
barrier();
#if LINUX_VERSION_CODE < KERNEL_VERSION(3,14,0)
preempt_enable_no_resched();
#else
preempt_count_dec();
#endif
return cr0 ^ X86_CR0_WP;
}
The note “almost” refers to the fact that the preempt_enable_no_resched () call was available until kernel 3.13 (in 2011, when the article was written). Starting from kernel 3.14 onwards, this call is closed with this conditional preprocessor definition:
#ifdef MODULE
/*
* Modules have no business playing preemption tricks.
*/
#undef sched_preempt_enable_no_resched
#undef preempt_enable_no_resched
But the macros preempt_enable_no_resched () and preempt_count_dec () are almost identical in later kernels.
What is more unpleasant is the fact that the code shown runs safely in later versions (older than 3.14) of the kernel, but soon after its execution, warning messages from the kernel appear from other applications, of the form:
[ 337.230937] ------------[ cut here ]------------
[ 337.230949] WARNING: CPU: 1 PID: 3410 at /build/buildd/linux-lts-utopic-3.16.0/init/main.c:802 do_one_initcall+0x1cb/0x1f0()
[ 337.230955] initcall rw_init+0x0/0x1000 [srw] returned with preemption imbalance
(I did not delve into what was happening in detail ... I did not consider it necessary, but it was somehow connected with the imbalance of work between SMP processors, or the assessment of such balancing.)
Even the warnings that appear in the kernel are serious enough, I would like to get rid of them. This can be achieved by repeating the trick with local interrupts from the previously described assembler code (rw_pai.c file):
static inline unsigned long native_pai_open_kernel( void ) {
unsigned long cr0;
local_irq_disable();
barrier();
cr0 = read_cr0() ^ X86_CR0_WP;
BUG_ON( unlikely( cr0 & X86_CR0_WP ) );
write_cr0( cr0 );
return cr0 ^ X86_CR0_WP;
}
static inline unsigned long native_pai_close_kernel( void ) {
unsigned long cr0;
cr0 = read_cr0() ^ X86_CR0_WP;
BUG_ON( unlikely( !( cr0 & X86_CR0_WP ) ) );
write_cr0( cr0 );
barrier();
local_irq_enable();
return cr0 ^ X86_CR0_WP;
}
This code successfully compiles and works in both 32 and 64 bit architectures, and this is its advantage over the previous one.
Unprotect a memory page
The next proposed method is to set the _PAGE_BIT_RW bit in a PTE record describing the memory page of interest to us (rw_pte.c file):
#include
#include
static inline void mem_setrw( void **table ) {
unsigned int l;
pte_t *pte = lookup_address( (long unsigned int)table, &l );
pte->pte |= _PAGE_RW;
__flush_tlb_one( (unsigned long)table );
}
static inline void mem_setro( void **table ) {
unsigned int l;
pte_t *pte = lookup_address( (long unsigned int)table, &l );
pte->pte &= ~_PAGE_RW;
__flush_tlb_one( (unsigned long)table );
}
According to the execution logic, the code is absolutely clear. The code itself in the form of almost how it is shown here, I first met in a discussion on Habrahabr (Alexey Derlaft, Vladimir, 2013), and later, much more thoroughly, in a discussion on the forum modification of system calls (Max Filippov, St. Petersburg, 2015).
This code has been tested in both 32 and 64 bit architectures.
Overlay display of a piece of memory
Another method (the last one considered today) is proposed in the article “Kosher way to modify write-protected areas of the Linux kernel” (Ilya V. Matveychikov, Moscow, late 2013). I will not say anything good or bad about the author’s culinary preferences in his national cuisine ... I don’t know, but with regard to the proposed technique, I should note that it is original and beautiful (rw_map.c file):
static void *map_writable( void *addr, size_t len ) {
void *vaddr;
int nr_pages = DIV_ROUND_UP( offset_in_page( addr ) + len, PAGE_SIZE );
struct page **pages = kmalloc( nr_pages * sizeof(*pages), GFP_KERNEL );
void *page_addr = (void*)( (unsigned long)addr & PAGE_MASK );
int i;
if( pages == NULL )
return NULL;
for( i = 0; i < nr_pages; i++ ) {
if( __module_address( (unsigned long)page_addr ) == NULL ) {
pages[ i ] = virt_to_page( page_addr );
WARN_ON( !PageReserved( pages[ i ] ) );
} else {
pages[i] = vmalloc_to_page(page_addr);
}
if( pages[ i ] == NULL ) {
kfree( pages );
return NULL;
}
page_addr += PAGE_SIZE;
}
vaddr = vmap( pages, nr_pages, VM_MAP, PAGE_KERNEL );
kfree( pages );
if( vaddr == NULL )
return NULL;
return vaddr + offset_in_page( addr );
}
static void unmap_writable( void *addr ) {
void *page_addr = (void*)( (unsigned long)addr & PAGE_MASK );
vfree( page_addr );
}
This method works in both 32 and 64 bit architecture. At some disadvantage, it can be attributed some cumbersomeness for solving a fairly simple task ("from a gun on sparrows"), despite the fact that, at first glance, it does not show significant advantages over previous methods. But this technique (and this code is practically unchanged) can be successfully used for a wider range of tasks than discussed.
Execution test
And now, in order not to be unfounded, the time has come to check all of the above said by a full-scale experiment. To check, create a kernel module (srw.c file):
#include "rw_cr0.c"
#include "rw_pte.c"
#include "rw_pax.c"
#include "rw_map.c"
#include "rw_pai.c"
#define PREFIX "! "
#define LOG(...) printk( KERN_INFO PREFIX __VA_ARGS__ )
#define ERR(...) printk( KERN_ERR PREFIX __VA_ARGS__ )
#define __NR_rw_test 31 // неиспользуемая позиция sys_call_table
static int mode = 0;
module_param( mode, uint, 0 );
#define do_write( addr, val ) { \
LOG( "writing address %p\n", addr ); \
*addr = val; \
}
static bool write( void** addr, void* val ) {
switch( mode ) {
case 0:
rw_enable();
do_write( addr, val );
rw_disable();
return true;
case 1:
native_pax_open_kernel();
do_write( addr, val );
native_pax_close_kernel();
return true;
case 2:
mem_setrw( addr );
do_write( addr, val );
mem_setro( addr );
return true;
case 3:
addr = map_writable( (void*)addr, sizeof( val ) );
if( NULL == addr ) {
ERR( "wrong mapping\n" );
return false;
}
do_write( addr, val );
unmap_writable( addr );
return true;
case 4:
native_pai_open_kernel();
do_write( addr, val );
native_pai_close_kernel();
return true;
default:
ERR( "illegal mode %d\n", mode );
return false;
}
}
static int __init rw_init( void ) {
void **taddr; // адрес sys_call_table
asmlinkage long (*sys_ni_syscall) ( void ); // оригинальный вызов __NR_rw_test
if( NULL == ( taddr = (void**)kallsyms_lookup_name( "sys_call_table" ) ) ) {
ERR( "sys_call_table not found\n" ); return -EFAULT;
}
LOG( "sys_call_table address = %p\n", taddr );
sys_ni_syscall = (void*)taddr[ __NR_rw_test ]; // сохранить оригинал
if( !write( taddr + __NR_rw_test, (void*)0x12345 ) ) return -EINVAL;
LOG( "modified sys_call_table[%d] = %p\n", __NR_rw_test, taddr[ __NR_rw_test ] );
if( !write( taddr + __NR_rw_test, (void*)sys_ni_syscall ) ) return -EINVAL;
LOG( "restored sys_call_table[%d] = %p\n", __NR_rw_test, taddr[ __NR_rw_test ] );
return -EPERM;
}
module_init( rw_init );
Some heaviness, cumbersome code is due only to the fact that:
- In a single code, it was necessary to coordinate different prototypes of write-enabled functions that belong to different methods (they are identical in operation, but are called in different ways).
- The implementation for different methods was kept as close as possible to how it was written by different authors (changes were made only to match the syntax with more recent versions of the kernels). This explains the variety of function prototypes.
And here is how it looks only in one of the architectures under test (at least 5 different architectures and kernel versions were actually tested), alternate use of all methods:
$ uname -r
3.16.0-48-generic
$ uname -m
x86_64
$ sudo insmod srw.ko mode=0
insmod: ERROR: could not insert module srw.ko: Operation not permitted
$ dmesg | tail -n6
[ 7258.575977] ! detected 64-bit platform
[ 7258.584504] ! sys_call_table address = ffffffff81801460
[ 7258.584579] ! writing address ffffffff81801558
[ 7258.584653] ! modified sys_call_table[31] = 0000000000012345
[ 7258.584654] ! writing address ffffffff81801558
[ 7258.584666] ! restored sys_call_table[31] = ffffffff812db550
$ sudo insmod srw.ko mode=2
insmod: ERROR: could not insert module srw.ko: Operation not permitted
$ dmesg | tail -n6
[ 7282.625539] ! detected 64-bit platform
[ 7282.633020] ! sys_call_table address = ffffffff81801460
[ 7282.633129] ! writing address ffffffff81801558
[ 7282.633178] ! modified sys_call_table[31] = 0000000000012345
[ 7282.633228] ! writing address ffffffff81801558
[ 7282.633291] ! restored sys_call_table[31] = ffffffff812db550
$ sudo insmod srw.ko mode=3
insmod: ERROR: could not insert module srw.ko: Operation not permitted
$ dmesg | tail -n6
[ 7297.040272] ! detected 64-bit platform
[ 7297.059764] ! sys_call_table address = ffffffff81801460
[ 7297.065930] ! writing address ffffc900001e6558
[ 7297.066000] ! modified sys_call_table[31] = 0000000000012345
[ 7297.066035] ! writing address ffffc9000033d558
[ 7297.066073] ! restored sys_call_table[31] = ffffffff812db550
$ sudo insmod srw.ko mode=4
insmod: ERROR: could not insert module srw.ko: Operation not permitted
$ dmesg | tail -n6
[ 7309.831119] ! detected 64-bit platform
[ 7309.836299] ! sys_call_table address = ffffffff81801460
[ 7309.836311] ! writing address ffffffff81801558
[ 7309.836359] ! modified sys_call_table[31] = 0000000000012345
[ 7309.836368] ! writing address ffffffff81801558
[ 7309.836424] ! restored sys_call_table[31] = ffffffff812db550
Discussion
This review is not intended as a textbook or guide to action. Here, various techniques with essentially equivalent actions used by different authors are only systematically collected.
It would be interesting to continue the discussion regarding the advantages and disadvantages of each of these methods.
Or supplement the listed ways to perform the action with new options ... 5th, 6th, etc.
All codes discussed (for verification, or use, or further improvement) can be taken here or here .