Identification of Loadable Linux Kernel Modules [Part 1]: Sources

    In this post, I will talk about my search for signs of how you can determine if a Linux kernel module (LKM) is loaded from some source files , rather than a regular executable.
    Suppose that there is no information about the purpose of the source, or they are trying to deliberately hide it.
    Upd : The amount of code> 4 GB and you need to quickly select only those sources that implement kernel modules.

    # 01 __KERNEL__ 

    When building the source code, the preprocessor symbol __KERNEL__ is defined .

    As Alessandro Rubini and Jonathan Corbet write in Linux Device Drivers:

    “Since the module does not link to any of the standard libraries, the source code of the module should not include regular header files. Kernel modules can only use functions that are exported by the kernel. All the header files that are related to the kernel are located in the include / linux and include / asm directories, inside the directory tree with the kernel sources (usually this is the / usr / src / linux directory).
    Earlier versions of Linux (based on libc version 5 and earlier) installed symbolic links from / usr / include / linux and / usr / include / asm to actual directories from the kernel sources, so the libc header tree could refer to the kernel header files . This made it possible to include kernel header files in user applications when the need arose.
    But even now, when the kernel header files are separated from the header files used by application programs, it is still sometimes necessary to include them in programs running in user space in order to take advantage of definitions that are not found in ordinary header files. However, most of the definitions from the kernel header files refer exclusively to the kernel and are “invisible” for regular applications, since access to these definitions is enclosed in #ifdef __KERNEL__ blocks. This, by the way, is one of the reasons why you need to define the __KERNEL__ symbol when building the module. ”

    For example, the line “CFLAGS = -D__KERNEL__” may be present in the makefile.
    Or "-D__KERNEL__" can be found in build logs.

    # 02 MODULE

    If the module does not link to the kernel statically, then the string "-DMODULE" will be present in the CFLAGS variable. This preprocessor symbol must be defined before the linux / module.h file is included .

    # 03 All names are declared static and have a unique prefix

    Thus, the developer avoids "pollution" of the kernel namespace - otherwise, when debugging, he would have to catch the names of his module among all the kernel names. Using the prefix frees one from the obligation to come up with unique names that will not coincide with the names already present in the kernel namespace.

    # 04 printk ()

    In the source code, the printk () function is used instead of the printf () function. “Linux device drivers” says:
    “The printk function is defined in the kernel and in its behavior resembles the printf function from the standard C library. Why, then, does the kernel have its own function? Everything is simple - the kernel, this is a stand-alone code that is built without the help of C libraries. ”

    # 05 init_module and cleanup_module

    “Linux device drivers” says:

    “The application runs as a complete task, from start to finish. The module simply registers itself in the kernel, preparing it for servicing possible requests and its “main” function finishes its work immediately after the call. In other words, the task of the init_module function (entry point) is to prepare the module functions for subsequent calls. It seems to say to the core: “Hey! I'm here! Here is what I can do! ” The second entry point to the module - cleanup_module - is called immediately before the module is unloaded. She tells the core: “I'm leaving! Don't ask me for anything else! ” „

    Upd: A more reliable sign is the presence of the cleanup_module function in the text , because functions with this name are found about 20 times less often than with the name “ init_module ”. Apparently, the name " init_module " is popular not only among kernel module writers.

    # 06 Using current->

    “Linux device drivers” says:

    “<...> Kernel code can determine the current process that has accessed the module through the current global element - a pointer to a struct task_struct , which is declared in the 2.4 kernel in the file. The current pointer refers to the current user process. When making system calls, such as read or write , the current process is the one that made the call. The kernel can take advantage of information about the current process using the current pointer if the need arises. <...>
    In fact, current is no longer a global kernel variable, as it was before. The developers optimized access to the structure that describes the current process by moving it to the stack. You will find the current implementation in the file. But before you go exploring this file, you should remember that Linux is an SMP-compatible system (from the English SMP - Symmetric Multi-Processing) and therefore a simple global variable is simply not applicable here. Implementation details are in other kernel subsystems and yet, the device driver can include the header fileand access the current pointer .
    From the module’s point of view, current is a regular external link, such as printk . A module can call current whenever it sees fit. For example, the following code will display the identifier (ID) of the process and the name of the command that started the process:
    printk ("The process is \"% s \ "(pid% i) \ n", current-> comm, current-> pid);

    The command name is stored in the current-> comm field and represents the name of the program file.

    And what other differences between the kernel module and the executable file at the source level?

    Related links:

    “Writing your driver for Linux” by iznakurnozh
    “Writing a driver for an LCD display for embedded linux” by alexzoidberg
    “Overview of the SPI bus and development of a driver for the slave SPI device for embedded Linux (Part one, overview)” by Lampus
    “Overview of the SPI bus and driver development Slave SPI device for embedded Linux (Part Two, practical) ” by Lampus
    Working with Linux kernel modules from Vorb
    Learning to write a kernel module (Netfilter) or Transparent proxy for HTTPS from sindo
    “ Writing a file system in the Linux kernel ” from kmu1990
    “Simple masking of the Linux kernel module using DKOM” by milabs

    Also popular now: