Make all Linux kernel characters available. Part 1

From the sandbox

The state of affairs

This discussion refers to the kernel of the Linux operating system, and is of interest to developers of kernel modules and drivers for this operating system. For all others, these notes are hardly of interest.

Everyone who wrote their simplest Linux kernel module knows, and it is written in all existing books on the technique of writing Linux drivers, that you can use only those names (mainly kernel API functions) that are exported by the kernel in the module’s own code. This is one of the most confusing concepts from the Linux kernel domain - exporting kernel symbols. In order for a name from the kernel space to be available for binding in another module, two conditions must be met for this name: a). the name must have global scope (in your module such names should not be declared static) and b). the name must be explicitly declared exported, it must be explicitly written with the macro parameter EXPORT_SYMBOL (or EXPORT_SYMBOL_GPL, which is far from the same in consequences).

All names known in the kernel are dynamically displayed in the / proc / kallsyms pseudo-file, and their number is huge:

$ uname -r 
3.13.0-37-generic 
$ cat /proc/kallsyms | wc -l 
108960

The number of names exported by the kernel (provided for use in the program code of the modules) is significantly less:

$ cat /lib/modules/`uname -r`/build/Module.symvers | wc -l 
17533

As you can see, several hundred thousand names are defined in the kernel (depending on the version of the kernel). But only a small part (about 10%) of these names are declared as exported, and are available for use (binding) in the kernel module code.

Recall that kernel API calls are made at the absolute location of the name. Each name exported by the kernel (or by any module) has an address associated with it, which is used to bind when loading a module using this name. This is the main mechanism of interaction between the module and the kernel. When the system is executed, the module is dynamically loaded and becomes an integral part of the kernel code. This explains that the kernel module in Linux can only be compiled for a specific kernel (usually at the installation location), and an attempt to load such a binary module with another kernel will crash the operating system.

As a result of this brief excursion, we can state that the Linux kernel developers provide for developers of extensions (kernel modules) a very limited (and extremely poorly documented) set of APIs, which, in their opinion, is sufficient for writing kernel extensions. But this opinion may not coincide with the opinion of the driver developers themselves, who would like to have in their hands the entire arsenal of the kernel. And it’s quite possible to use it, which we will discuss in the remainder of the text.

Search address by name

Let's take a look at the structure of the record line of any (from 108960) kernel name in / proc / kallsyms:

$ sudo cat /proc/kallsyms | grep ' T ' | grep sys_close 
c1176ff0 T sys_close

This is the export name of the system call handler (POSIX) close (). (On some Linux distributions, the addresses in the line will be filled only if the read is done with root privileges, for other users a zero value will be shown in the address field.)

We could well use the call to the sys_close () function in the code of our module. But we will not be able to do this with a completely symmetric call to sys_open (), because this name is not exported by the kernel. When assembling such a module, we will receive a warning similar to the following:

$ make 
...
  MODPOST 2 modules 
WARNING: "sys_open" [/home/olej/2011_WORK/LINUX-books/examples.DRAFT/sys_call_table/md_0o.ko] 
 undefined! 
...

But an attempt to load such a module will fail:

$ sudo insmod md_0o.ko 
insmod: errorinserting 'md_0o.ko': -1 Unknownsymbolinmodule 
$ dmesgmd_0o: Unknownsymbolsys_open

Such a module cannot be loaded because it contradicts the kernel integrity rules: it contains an unresolved external symbol - this symbol is not exported by the kernel for binding (that is, a warning from the point of view of the compiler looks like a critical error from the point of view of the developer).

Does the above mean that only exported kernel symbols are available in the code of our module. No, this only means that the recommended method of binding by name (by the absolute address of the name) applies only to exported names. Exporting provides another additional control line for ensuring the integrity of the kernel - minimal incorrectness leads to a complete crash of the operating system, sometimes even without having time to make a message: Oops ...

Since all kernel symbols are displayed in the / proc / kallsyms pseudo-file, the module code could take them from there. Moreover, this means that the kernel API has methods for localizing all names, and these methods can be used in your code for the same purposes. Omitting the path of intermediate solutions, we consider only 2 options, 2 exported calls (all definitions in <linux / kallsyms.h> in the kernel, or see lxr.free-electrons.com/source/include/linux/kallsyms.h ):

Call :

unsignedlongkallsyms_lookup_name( constchar *name );

Here name is the name we are looking for, and its absolute address is returned. The disadvantage of this option is that it appears in the kernel somewhere between kernel versions 2.6.32 and 2.6.35 (or roughly between package distributions of the summer of 2010 and spring of 2011), more precisely, it was present earlier, but was not exported. For embedded and small systems, this can be a serious obstacle.

More general call:

intkallsyms_on_each_symbol( int (*fn)(void*, constchar*, struct module*, unsignedlong), void *data );

This challenge is more complicated, and brief explanations are needed here. The first parameter (fn), it receives a pointer to your user-defined function, which will be called sequentially (in a loop) for all characters in the kernel table, and the second (data) - a pointer to an arbitrary block of data (parameters) that will be passed to each call of this function fn ().

The prototype of the user-defined function fn, which is called cyclically for each name:

intfunc( void *data, constchar *symb, struct module *mod, unsignedlong addr );

Here:
data is a parameter block filled in the calling unit, and passed from the call to the kallsyms_on_each_symbol () function (2nd call parameter), as described above, here, it’s good to pass the name of the character we are looking for;
symb - a symbolic image (string) of the name from the kernel name table, which is processed on the current func call;
mod - the kernel module to which the character being processed belongs;
addr - the address of the symbol in the address space of the kernel (this, in fact, is what we are looking for);

Enumeration of the names of the kernel table can be interrupted at the current step and no longer continue (for efficiency reasons, if we have already processed the characters we need) if the user-defined function func returns a non-zero value.

To use the call kallsyms_on_each_symbol (), we will prepare our own wrapper function, similar in meaning to kallsyms_lookup_name ():

staticvoid* find_sym( constchar *sym ){  // find address kernel symbol sym staticunsignedlong faddr = 0;          // static !!! // ----------- nested functions are a GCC extension --------- intsymb_fn( void* data, constchar* sym, struct module* mod, unsignedlong addr ){ 
      if( 0 == strcmp( (char*)data, sym ) ) { 
         faddr = addr; 
         return1; 
      } 
      elsereturn0; 
   }; 
   // -------------------------------------------------------- 
   kallsyms_on_each_symbol( symb_fn, (void*)sym ); 
   return (void*)faddr; 
}

Here we used a trick with an embedded definition of the symb_fn () function, which is completely legal using the GCC compiler extension (relative to the C language standard), but we use GCC exclusively to compile kernel modules. This code avoids declaring a global intermediate variable, prevents clogging of the namespace, and helps localize the code.

Usage example

One of the most sacred places in the Linux operating system is the sys_call_table selector table through which any system call takes place: after preparing the parameters accordingly, writing the number (selector) of the system call with the 1st parameter, the system executes the command to go to the kernel: int 80h (in older versions) or sysenter, which is essentially the same thing. The system call number (selector, 1st parameter) and is the index in the sys_call_table (array) of pointers to the kernel system call processing functions. We can see the numbers of all system calls, for example, for the i386 architecture:

$ cat /usr/include/i386-linux-gnu/asm/unistd_32.h 
...
#define __NR_restart_syscall 0 #define __NR_exit 1 #define __NR_fork 2 #define __NR_read 3 #define __NR_write 4 #define __NR_open 5 #define __NR_close 6 #define __NR_waitpid 7 #define __NR_creat 8 
...

Here is a table of indexes (numbers) of system calls used in the user's address space, implemented by the standard library C libc.so. An exact analogue of this table is also present in the kernel header files, in the address space of the kernel. And similar tables of system call indexes are present for all architectures supported by Linux (tables for different architectures differ in size, composition, and numerical values of indexes for similar calls!).

Since version 2.6 of the kernel, the sys_call_table symbol has been excluded from export, based on security considerations very peculiarly understood by the kernel development team (I can assume that security was supposed to be interpreted here in the sense: the piece of bread of the kernel developers is protected from third-party programmers). All books on writing Linux drivers state that sys_call_table cannot be used in driver code. Now, and even more in the subsequent parts of the discussion, we will show that this is not so!

For a fairly long time (since 2011), working with the subject under discussion, I have read many publications on this subject. Virus writers and any other trash that scare themselves with the scary word hacker, which they just did not invent for sys_cal_table search - they even dynamically decode dumps of binary memory fragments occupied by the kernel by scanning sections of the kernel memory (in searches, for example, sys_close () position, which is exported is always). As will be shown now, all this is done much more simply. The only secret to Linux resilience is not that. that dirty tricks cannot find something there, but that the regulation of access rights will not allow (without root rights) to do any nasty things outside this regulation ... and nobody gives root rights to dirty tricks.

But back to the task of resolving non-exportable kernel characters. The first option (mod_kct.c file) demonstrates the use of kallsyms_lookup_name () (for simplicity and shortening, the inclusion of header files is not shown, the necessary macros like MODULE _ * () ... - all this is in the archive files):

static int __init ksys_call_tbl_init( void ) { 
   void** sct = (void**)kallsyms_lookup_name( "sys_call_table" ); 
   printk( "+ sys_call_table address = %p\n", sct ); 
   if( sct ) { 
      int i; 
      chartable[ 120 ] = "sys_call_table : "; 
      for( i = 0; i < 10; i++ ) 
         sprintf( table + strlen( table ), "%p ", sct[ i ] ); 
      printk( "+ %s ...\n", table ); 
   } 
   return -EPERM; 
} 
module_init( ksys_call_tbl_init );

Here, the address of the sys_call_table table is retrieved and then the addresses of the handlers of the first 10 system calls (__NR_restart_syscall ... __NR_link) contained in it are extracted:

$ sudo insmod mod_kct.ko 
insmod: ERROR: could not insert module mod_kct.ko: Operation not permitted 
$ dmesg | tail -n 2 
[39473.496040] + sys_call_table address = c1666140 
[39473.496045] + sys_call_table : c1067840 c1059280 c1055eb0 c1179ee0 c1179f70 c1178cb0 c1176ff0 c1059570 c1178d10 c1188860  ...

(The error 'Operation not permitted' should not be confusing - we were not going to load the module, as indicated by the non-zero return code -EPERM, we just execute our code in privileged mode, supervisor, processor protection ring zero).

Make sure what the found addresses are at the beginning of the sys_call_table array:

$ sudo cat /proc/kallsyms | grep c1067840 
c1067840 T sys_restart_syscall 
$ sudo cat /proc/kallsyms | grep c1059280 
c1059280 T SyS_exit 
c1059280 T sys_exit 
$ sudo cat /proc/kallsyms | grep c1055eb0 
c1055eb0 T sys_fork

... and so on (compare with the table of system call numbers shown earlier).

The next option will be a little more difficult to understand, it uses the kallsyms_on_each_symbol () function, but it is also more universal (mod_koes.c file):

static int __init ksys_call_tbl_init( void ) { 
   void **sct = find_sym( "sys_call_table" );   // table sys_call_table address 
   printk( "+ sys_call_table address = %p\n", sct ); 
   if( sct != NULL ) { 
      int i; 
      chartable[ 120 ] = "sys_call_table : "; 
      for( i = 0; i < 10; i++ ) 
         sprintf( table + strlen( table ), "%p ", sct[ i ] ); 
      printk( "+ %s ...\n", table ); 
   } 
   return -EPERM; 
} 
module_init( ksys_call_tbl_init );

Textually, it almost completely repeats the previous one, the find_sym () function, which is given and discussed above, does all the productive work. The result of the execution is invariably the same:

$ sudo insmod mod_koes.ko 
insmod: ERROR: could not insert module mod_koes.ko: Operation not permitted 
$ dmesg | tail -n2 
[42451.186648] + sys_call_table address = c1666140 
[42451.186654] + sys_call_table : c1067840 c1059280 c1055eb0 c1179ee0 c1179f70 c1178cb0 c1176ff0 c1059570 c1178d10 c1188860  ...

Discussion

A skeptic may object: “So what?” And the fact that the necessary and sufficient mechanisms are shown in order to use any kernel API in the actual kernel module code loaded dynamically. The technique shown expands the range of capabilities of the author of the kernel module by orders of magnitude! These are so voluminous prospects that for their consideration we will need the subsequent parts of this discussion.

... but so that the completion of the story is not so boring, we will show one of the simple but impressive applications - executing system call code (generally speaking, any) of a user library from the kernel module code.

Have you been told that the kernel module code provides output to the system log (printk ()) and cannot output to the terminal (printf ())? Now we will show that this is not so ... Here is such a simple kernel module that outputs to the terminal:

static asmlinkage long(*sys_write)( 
   unsignedint, constchar __user *, size_t ); 
staticint __init wr_init( void ){ 
   char buf[ 80 ] = "Hello from kernel!\n"; 
   int len = strlen( buf ), n; 
   sys_write = find_sym( "sys_write" ); 
   printk( "+ sys_write address = %p\n", sys_write ); 
   printk( "+ [%d]: %s", len, buf ); 
   if( sys_write != NULL ) { 
      mm_segment_t fs = get_fs(); 
      set_fs( get_ds() ); 
      n = sys_write( 1, buf, len ); 
      set_fs( fs ); 
      printk( "+ printf() return : %d\n", n ); 
   } 
   return -EPERM; 
} 
module_init( wr_init );

And here is its execution (attempt to boot with emergency exit code):

$ sudo insmod mod_wrc.ko 
Hello from kernel! 
insmod: ERROR: couldnotinsertmodulemod_wrc.ko: Operationnotpermitted 
$ dmesg | tail -n3 
[23942.974587] + sys_writeaddress = c1179f70 
[23942.974591] + [19]: Hellofromkernel! 
[23942.974612] + printf() return : 19

The first line here is the write () system call. Naturally, the output is made to the control terminal of the user process insmod, but the important thing here is that we execute the write () system call from the kernel space code. Here, some details may require additional explanations:

Where did I get such a “tricky” prototype for describing the address variable sys_write? Of course, I shamelessly wrote it off from the original definition of the sys_write () function in the kernel, in the <linux / syscalls.h> header file, as shown by the comment in the code (in the full code, in the archive):

/* <linux/syscalls.h> 
asmlinkage long sys_write( unsigned int fd, 
                           const char __user *buf, 
                           size_t count ); */

And this is the only way to act for all used non-exported kernel names - writing off prototypes of implementing functions from the corresponding header files. Any minimal prototype mismatch will lead to an immediate crash of the operating system!

What do several similar calls of the form mean: get_ds (), get_fs (), set_fs ()? This is a small trick consisting in the temporary substitution of data segments in the kernel. The fact is that in the prototype of the sys_write () system call handler there is the __user qualifier, which indicates that the pointer points to data in user space. The system call code checks whether it belongs (only to the range of the numerical value of the address), and if the address points to a region of kernel space (as in our case) it will cause an abnormal termination. With this trick, we show the control code that our address should be interpreted as belonging to the user's space. In such cases, this trick can be used mechanically, without really thinking about its meaning.

Notes

Experiments with similar codes, and even more so in more detailed cases, which I intend to discuss later, are fraught with troubles - even minor errors in the code instantly overwhelm the operating system. Even worse, the system crashes in an unspecified unstable state, and there is a finite (not high) probability that the system will not recover even after a reboot.

During experiments with similar codes, I was constantly concerned with the question: is it possible to work out and test them in a virtual machine? This is despite the fact that we will have to perform (subsequently) very machine-dependent things, such as writing to hidden processor hardware registers, for example CR0.

I can state with satisfaction that all the discussed codes are adequately executed in virtual machines in the Oracle VirtualBox environment, at least in the relatively latest versions, starting from the 2013 state.

Therefore, I strongly recommend that you work with such codes initially in virtual machines in order to avoid serious troubles.
Mention of Oracle VirtualBox does not mean at all that this state of affairs will not be saved in other virtual machine managers, I just did not check the codes in these managers (almost certainly everything will be safe in QEMU / KVM, since VirtualBox takes the virtualization code from QEMU).

The archive of files (codes) for experiments, which is mentioned in the text, can be taken here or here .

Tags: