Embedding in the Linux kernel: function hooking
Interception of kernel functions is a basic method that allows you to redefine / complement its various mechanisms. Based on the fact that the Linux kernel is almost completely written in C, with the exception of small architecture-dependent parts, it can be argued that to embed in most of the kernel components it is enough to be able to intercept the corresponding functions.
This article is a continuation of the previously announced cycle devoted to particular issues of the implementation of imposed security features and, in particular, integration into software systems.
The purpose of intercepting any function is to gain control at the time of its call. Further actions depend on specific tasks. In some cases, it is necessary to replace the system implementation of the algorithm with your own, in others - to supplement. In this case, it is important to leave the possibility of using the intercepted function for your own purposes.
The approach to the use of the concept of “wrappers” has become traditional in the implementation of interception, which allows for pre- and post-processing while maintaining the ability to access the original functionality represented by the intercepted function.
The basis of most methods of intercepting functions is patching - modifying the kernel code in such a way as to provide the ability to transfer control to the interceptor function when the target function is called. At the same time, due to the developed system of x86 architecture commands, there are many possible variations in the execution flow (yes, JMP is only one of them: more ).
The essence of the described method for intercepting will be to modify the prologue (beginning) of the target function so that its execution by the processor leads to the transfer of control to the handler function.
In other words, for each objective function, we modify the prolog by writing the JMP command at its beginning. This will allow you to switch the execution flow from the target function to the corresponding handler.
For example, if before inception the inode_permission function has the form:
After interception, the prologue of this function will look like this:
It is the five-byte JMP command written over the original instructions with the code E9.XX.XX.XX.XX that leads to the transfer of control. This is the main essence of the described method of interception. Next, some features of its implementation in the Linux kernel will be considered.
As noted, the essence of patching is to modify the kernel code. The main problem that arises with this is that writing to memory pages containing code is not possible because x86 architecture has a special protective mechanism, according to which an attempt to write to write-protected memory areas can lead to an exception. This mechanism is called "page defense" and is the basis for the implementation of many kernel functions, such as, for example, COW . The behavior of the processor in this situation is determined by the WP bit of register CR0, and access rights to the page are described in the corresponding PTE descriptor structure. When the WP bit of register CR0 is set, an attempt to write to write-protected pages (the RW bit in PTE is reset) leads to the generation of the corresponding exception ( #GP ) by the processor .
Often, the solution to this problem is to temporarily disable page protection by resetting the WP bit of the CR0 register. This solution has a place to be, but it must be applied with caution, because, as noted, the page protection mechanism is the basis for many kernel mechanisms. In addition, on SMP systems, a thread running on one of the processors and removing the WP bit in the same place can be interrupted and transferred to another processor!
Better and sufficiently universal is the way to create temporary mappings. Due to the features of the MMU, for each physical memory frame several descriptors can be created that have different attributes. This allows you to create a writeable display for the target memory area. This method is used in the Ksplice project (fork on github 'e). Below is the map_writable function, which creates such a map:
Using this function allows you to create a writable display for any memory area. The region created in this way is released using the vfree function , the argument of which should be the address value aligned to the page border. Additional information regarding this method of modifying write-protected pages is presented in this article.
The next important point is that during the modification through patching one way or another the part of the prologue of the objective function is overwritten. You should not pay attention to this if you are not supposed to use this function further. However, if for some reason the algorithm implemented by the objective function can be useful after patching, it is worthwhile to ensure that the "old" code can be executed given the "corruption" of the existing prologue.
The following is an illustration which schematically shows the process of intercepting a function while maintaining the ability to access the original functionality.

In the given example, the number 1 indicates the transfer of control from the target function to the interceptor function (JMP command), the number 2 indicates the call to the original function using the stored part of the prolog (CALL command), the number 3 indicates control return to the part of the original function that has not been changed (command JMP), and finally, the number 4 - return control upon completion of the call to the original function from the interceptor (RET command). Thus, it is possible to use the opportunities realized by the intercepted function.
We will describe each intercepted function with the following structure :
Here, name is the name of the intercepted function (symbol name), length is the length of the erased sequence of prolog instructions, handler is the address of the intercept function, target is the address of the target function itself, target_map is the address of the projection of the target function that can be written, origin is the address of the adapter function used to access the original functionality, origin_map is the address of the projection of the corresponding adapter that is writable, usage is a sticking counter that takes into account the number of threads sleeping in the interception.
Each intercepted function should be represented by such a structure. To do this, in order to simplify the registration of interceptors, the macro DECLARE_KHOOK (...) is used , which is represented as follows:
Helper macros
When loading the kernel module, all registered hooks are enumerated (see khook_for_each ) represented by structures in the section named .khook. For each of them, the address of the corresponding symbol is searched (see get_symbol_address ), as well as the setting of auxiliary elements, including the creation of mappings (see map_witable ):
An important role is played by the init_origin_stub function , which initializes and builds the adapter used to call the original function after interception:
As can be seen, for determining the amount of mashed at patching prolog instructions used disassembler udis86 . In principle, any disassembler with a function for determining the length of an instruction (the so-called Length-Disassembler Engine, LDE) is suitable for this purpose. I use for this purpose a full-fledged udis86 disassembler, which has a BSD license and has proven itself well. As soon as the number of instructions is determined, they are copied to the address
The last element to make kernel code modification safe is the stop_machine mechanism :
The bottom line is that it
An example of use is illustrated by the interception of a function
To work out the macro,
Thus, by intercepting, it is possible to replace functions, as well as to replace the transferred parameters and the result of execution, which corresponds to the concept of embedding, which declares the possibility of overriding / supplementing the kernel mechanisms of the OS.
Traditionally, the kernel module code that implements the actions necessary to intercept functions is available on github .
This article is a continuation of the previously announced cycle devoted to particular issues of the implementation of imposed security features and, in particular, integration into software systems.
The purpose of intercepting any function is to gain control at the time of its call. Further actions depend on specific tasks. In some cases, it is necessary to replace the system implementation of the algorithm with your own, in others - to supplement. In this case, it is important to leave the possibility of using the intercepted function for your own purposes.
The approach to the use of the concept of “wrappers” has become traditional in the implementation of interception, which allows for pre- and post-processing while maintaining the ability to access the original functionality represented by the intercepted function.
The basis of most methods of intercepting functions is patching - modifying the kernel code in such a way as to provide the ability to transfer control to the interceptor function when the target function is called. At the same time, due to the developed system of x86 architecture commands, there are many possible variations in the execution flow (yes, JMP is only one of them: more ).
Interception Technique
The essence of the described method for intercepting will be to modify the prologue (beginning) of the target function so that its execution by the processor leads to the transfer of control to the handler function.
In other words, for each objective function, we modify the prolog by writing the JMP command at its beginning. This will allow you to switch the execution flow from the target function to the corresponding handler.
For example, if before inception the inode_permission function has the form:
inode_permission:
0xffffffff811c4530 <+0>: nopl 0x0(%rax,%rax,1)
0xffffffff811c4535 <+5>: push %rbp
0xffffffff811c4536 <+6>: test $0x2,%sil
0xffffffff811c453a <+10>: mov 0x28(%rdi),%rax
0xffffffff811c453e <+14>: mov %rsp,%rbp
0xffffffff811c4541 <+17>: jne 0xffffffff811c454a
0xffffffff811c4543 <+19>: callq 0xffffffff811c4470 <__inode_permission>
After interception, the prologue of this function will look like this:
inode_permission:
0xffffffff811c4530 <+0>: jmpq 0xffffffffa05a60e0 => ПЕРЕДАЧА УПРАВЛЕНИЯ НА ПЕРЕХВАТЧИК
0xffffffff811c4535 <+5>: push %rbp
0xffffffff811c4536 <+6>: test $0x2,%sil
0xffffffff811c453a <+10>: mov 0x28(%rdi),%rax
0xffffffff811c453e <+14>: mov %rsp,%rbp
0xffffffff811c4541 <+17>: jne 0xffffffff811c454a
0xffffffff811c4543 <+19>: callq 0xffffffff811c4470 <__inode_permission>
It is the five-byte JMP command written over the original instructions with the code E9.XX.XX.XX.XX that leads to the transfer of control. This is the main essence of the described method of interception. Next, some features of its implementation in the Linux kernel will be considered.
Features of the implementation of the interception of functions
As noted, the essence of patching is to modify the kernel code. The main problem that arises with this is that writing to memory pages containing code is not possible because x86 architecture has a special protective mechanism, according to which an attempt to write to write-protected memory areas can lead to an exception. This mechanism is called "page defense" and is the basis for the implementation of many kernel functions, such as, for example, COW . The behavior of the processor in this situation is determined by the WP bit of register CR0, and access rights to the page are described in the corresponding PTE descriptor structure. When the WP bit of register CR0 is set, an attempt to write to write-protected pages (the RW bit in PTE is reset) leads to the generation of the corresponding exception ( #GP ) by the processor .
Often, the solution to this problem is to temporarily disable page protection by resetting the WP bit of the CR0 register. This solution has a place to be, but it must be applied with caution, because, as noted, the page protection mechanism is the basis for many kernel mechanisms. In addition, on SMP systems, a thread running on one of the processors and removing the WP bit in the same place can be interrupted and transferred to another processor!
Better and sufficiently universal is the way to create temporary mappings. Due to the features of the MMU, for each physical memory frame several descriptors can be created that have different attributes. This allows you to create a writeable display for the target memory area. This method is used in the Ksplice project (fork on github 'e). Below is the map_writable function, which creates such a map:
/*
* map_writable creates a shadow page mapping of the range
* [addr, addr + len) so that we can write to code mapped read-only.
*
* It is similar to a generalized version of x86's text_poke. But
* because one cannot use vmalloc/vfree() inside stop_machine, we use
* map_writable to map the pages before stop_machine, then use the
* mapping inside stop_machine, and unmap the pages afterwards.
*
* STOLEN from: https://github.com/jirislaby/ksplice
*/
static void *map_writable(void *addr, size_t len)
{
void *vaddr;
int nr_pages = DIV_ROUND_UP(offset_in_page(addr) + len, PAGE_SIZE);
struct page **pages = kmalloc(nr_pages * sizeof(*pages), GFP_KERNEL);
void *page_addr = (void *)((unsigned long)addr & PAGE_MASK);
int i;
if (pages == NULL)
return NULL;
for (i = 0; i < nr_pages; i++) {
if (__module_address((unsigned long)page_addr) == NULL) {
pages[i] = virt_to_page(page_addr);
WARN_ON(!PageReserved(pages[i]));
} else {
pages[i] = vmalloc_to_page(page_addr);
}
if (pages[i] == NULL) {
kfree(pages);
return NULL;
}
page_addr += PAGE_SIZE;
}
vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
kfree(pages);
if (vaddr == NULL)
return NULL;
return vaddr + offset_in_page(addr);
}
Using this function allows you to create a writable display for any memory area. The region created in this way is released using the vfree function , the argument of which should be the address value aligned to the page border. Additional information regarding this method of modifying write-protected pages is presented in this article.
The next important point is that during the modification through patching one way or another the part of the prologue of the objective function is overwritten. You should not pay attention to this if you are not supposed to use this function further. However, if for some reason the algorithm implemented by the objective function can be useful after patching, it is worthwhile to ensure that the "old" code can be executed given the "corruption" of the existing prologue.
The following is an illustration which schematically shows the process of intercepting a function while maintaining the ability to access the original functionality.

In the given example, the number 1 indicates the transfer of control from the target function to the interceptor function (JMP command), the number 2 indicates the call to the original function using the stored part of the prolog (CALL command), the number 3 indicates control return to the part of the original function that has not been changed (command JMP), and finally, the number 4 - return control upon completion of the call to the original function from the interceptor (RET command). Thus, it is possible to use the opportunities realized by the intercepted function.
Implementing hooking functions
We will describe each intercepted function with the following structure :
typedef struct {
/* tagret's name */
char * name;
/* target's insn length */
int length;
/* target's handler address */
void * handler;
/* target's address and rw-mapping */
void * target;
void * target_map;
/* origin's address and rw-mapping */
void * origin;
void * origin_map;
atomic_t usage;
} khookstr_t;
Here, name is the name of the intercepted function (symbol name), length is the length of the erased sequence of prolog instructions, handler is the address of the intercept function, target is the address of the target function itself, target_map is the address of the projection of the target function that can be written, origin is the address of the adapter function used to access the original functionality, origin_map is the address of the projection of the corresponding adapter that is writable, usage is a sticking counter that takes into account the number of threads sleeping in the interception.
Each intercepted function should be represented by such a structure. To do this, in order to simplify the registration of interceptors, the macro DECLARE_KHOOK (...) is used , which is represented as follows:
#define __DECLARE_TARGET_ALIAS(t) \
void __attribute__((alias("khook_"#t))) khook_alias_##t(void)
#define __DECLARE_TARGET_ORIGIN(t) \
void notrace khook_origin_##t(void){\
asm volatile ( \
".rept 0x20\n" \
".byte 0x90\n" \
".endr\n" \
); \
}
#define __DECLARE_TARGET_STRUCT(t) \
khookstr_t __attribute__((unused,section(".khook"),aligned(1))) __khook_##t
#define DECLARE_KHOOK(t) \
__DECLARE_TARGET_ALIAS(t); \
__DECLARE_TARGET_ORIGIN(t); \
__DECLARE_TARGET_STRUCT(t) = { \
.name = #t, \
.handler = khook_alias_##t, \
.origin = khook_origin_##t, \
.usage = ATOMIC_INIT(0), \
}
Helper macros
__DECLARE_TARGET_ALIAS(...)
, __DECLARE_TARGET_ORIGIN(...)
declare an interceptor and an adapter (32 nop). The macro itself declares the structure __DECLARE_TARGET_STRUCT(...)
, by section
defining it in a special section ( .khook ). When loading the kernel module, all registered hooks are enumerated (see khook_for_each ) represented by structures in the section named .khook. For each of them, the address of the corresponding symbol is searched (see get_symbol_address ), as well as the setting of auxiliary elements, including the creation of mappings (see map_witable ):
static int init_hooks(void)
{
khookstr_t * s;
khook_for_each(s) {
s->target = get_symbol_address(s->name);
if (s->target) {
s->target_map = map_writable(s->target, 32);
s->origin_map = map_writable(s->origin, 32);
if (s->target_map && s->origin_map) {
if (init_origin_stub(s) == 0) {
atomic_inc(&s->usage);
continue;
}
}
}
debug("Failed to initalize \"%s\" hook\n", s->name);
}
/* apply patches */
stop_machine(do_init_hooks, NULL, NULL);
return 0;
}
An important role is played by the init_origin_stub function , which initializes and builds the adapter used to call the original function after interception:
static int init_origin_stub(khookstr_t * s)
{
ud_t ud;
ud_initialize(&ud, BITS_PER_LONG, \
UD_VENDOR_ANY, (void *)s->target, 32);
while (ud_disassemble(&ud) && ud.mnemonic != UD_Iret) {
if (ud.mnemonic == UD_Ijmp || ud.mnemonic == UD_Iint3) {
debug("It seems that \"%s\" is not a hooking virgin\n", s->name);
return -EINVAL;
}
#define JMP_INSN_LEN (1 + 4)
s->length += ud_insn_len(&ud);
if (s->length >= JMP_INSN_LEN) {
memcpy(s->origin_map, s->target, s->length);
x86_put_jmp(s->origin_map + s->length, s->origin + s->length, s->target + s->length);
break;
}
}
return 0;
}
As can be seen, for determining the amount of mashed at patching prolog instructions used disassembler udis86 . In principle, any disassembler with a function for determining the length of an instruction (the so-called Length-Disassembler Engine, LDE) is suitable for this purpose. I use for this purpose a full-fledged udis86 disassembler, which has a BSD license and has proven itself well. As soon as the number of instructions is determined, they are copied to the address
origin_map
, which corresponds to the RW projection of the 32-byte adapter origin
. Finally , after saved commands using x86_put_jmp, a command is inserted that returns control to the original code of the objective function that has not been changed.The last element to make kernel code modification safe is the stop_machine mechanism :
#include
int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
The bottom line is that it
stop_machine
performs the function fn
with a given set of active processors at the time of execution, which is set by the corresponding cpumask mask. This is what allows us to use this mechanism to modify the kernel code, because setting the appropriate mask automatically eliminates the need to track kernel threads whose execution may affect the modified code.Using
An example of use is illustrated by the interception of a function
inode_permission
. Based on the macros considered, the sequence of function interception will be as follows:#include
DECLARE_KHOOK(inode_permission);
int khook_inode_permission(struct inode * inode, int mode)
{
int result;
KHOOK_USAGE_INC(inode_permission);
debug("%s(%pK,%08x) [%s]\n", __func__, inode, mode, current->comm);
result = KHOOK_ORIGIN(inode_permission, inode, mode);
debug("%s(%pK,%08x) [%s] = %d\n", __func__, inode, mode, current->comm, result);
KHOOK_USAGE_DEC(inode_permission);
return result;
}
To work out the macro,
DECLARE_KHOOK(...)
it is necessary that there is a prototype of the intercepted function ( linux/fs.h
for inode_permission
). Further, in the implementation of the intercept function (having a prefix khook_
), you can do anything. For example, I print a debug message before and after calling the original function inode_permission
. Thus, by intercepting, it is possible to replace functions, as well as to replace the transferred parameters and the result of execution, which corresponds to the concept of embedding, which declares the possibility of overriding / supplementing the kernel mechanisms of the OS.
Traditionally, the kernel module code that implements the actions necessary to intercept functions is available on github .