On the issue of driver settings in Linux, or how I spent the weekend

    "We are lazy and incurious"




    This time, the reason for the post was an article in a good Linux journal (hereinafter referred to as L), in which the involved “expert” praised the LCD connection driver to the Raspbery board. Since such things (connection, not OS) are in the sphere of my professional interests, I looked through the article with attention, then I found the actual text of the “driver” and was slightly surprised that IT could be praised. Well, in general, the level of the expert can be determined if only because he persistently called the program a driver, despite the fact that it is not in any way. It would seem, and figs with him, you never know what he writes for himself, but to lay out the like in open access - “I didn’t know that it was possible.”

    Particularly pleased with the fact that the address of the device on the I2C bus was directly specified in the program text and to change it, it was necessary to recompile it (well, not all of the kernel). Incidentally, I noticed that in the forums dedicated to L, the most popular answer to any question about problems in software is “re-compile the latest version of the kernel”. This approach seems somewhat strange to me, I guess I don’t know something. But, nevertheless, the question arose, how the driver's parameterization is actually implemented (inside, not outside - everything is simple and clear) driver's parameterization, to which this post is devoted to the answer.

    Not that I constantly wrote drivers for L, but the process as a whole is familiar and googling confirmed vague memories that there is a set of macros that should be used when creating the source text of a module in order to be able to pass it parameters of operation, for example, the device address to tire. However, the mechanics of the process itself was not described anywhere. In numerous links I saw the same text (by the way, an interesting question - why do it, that is, to place someone else’s piece of text on my resource - I don’t really understand the meaning of this operation), which described the above macros. I did not find any mention of the mechanism for performing the operation, for another known operating system (Windows) I would have to admit the fact and restrict myself to this but one of the advantages of L is the availability of source texts and the ability to find an answer to any question about its internal structure, which we will do. Immediately, I’ll note that I will try not to duplicate information that you can gather from other sources, and I will limit myself only to what is necessary to understand the text.

    But, before looking at the source code, we’ll first think a bit, but how would we do it if we received a similar task (if they suddenly invite me after this post to the maintainers of L - and you don’t give up). So, it is possible to create a module — some kind of specially designed software unit that can be loaded into memory for execution with the help of some system utility (insmode — hereafter, AND), with a string of characters being passed as launch parameters. This string may contain strictly defined lexical units, the format description of which is specified when creating the source text of the module, and these units contain information allowing to change the value of the internal variables of this module.

    Let us consider more closely the method of describing the above lexical units; we need this to consider various solutions. A parsing unit is defined by calling a macro, which is informed of the necessary information - the name of the variable to be modified during the setup process, its external name (usually the same as the previous one), the type of variable from the limited set and the access rights to the variable in the rw-rw-rw style. Optionally, a (optional) text string describing the variable can be specified. Obviously, this information is necessary and sufficient (in conjunction with the rules for the designation of syntactic units - separators and tokens) for constructing a parser of the list of parameters specified as a text string, but leaves room for the realization of the distribution of functions between the process participant.

    To configure the module we need:

    1. form (well, it is at the compilation stage, you can do as you like, although it’s still interesting how exactly) and store the table of the above settings,
    2. parse input parameters according to this table and
    3. change certain areas of memory in accordance with the result of parsing the syntactic unit.

    We speculate a little in the style of "if I were the director" and come up with possible implementations. How we could implement this behavior of the system utility and the module - we will start analyzing the options in order of increasing complexity.

    The first solution is the utility And does almost nothing, simply calls the module indicated to it and passes the rest of the parameters in the command line style, and the module already parses them, relying on the information it contains and makes the necessary modifications. This solution is simple, understandable and quite realizable, but the following circumstance should be taken into account: the analysis of parameters by the author of the module should not be left in any way, since this will give it unacceptable space, and in fact two programmers will always write three parser options. We already went to meet him, admitting parameters of an undefined type, which have a text string as a value, will suffice from it.

    Therefore, a certain standard parser should be included in the text of the module automatically, it is easy to implement at the macro substitution level.

    This solution has two drawbacks:

    1. it is not clear why we need any AND, you can immediately call a module with parameters from the command line,
    2. the module code (initialization part) must contain all three sections of the necessary information, and this information is necessary only when the module is started and is not used in the future, but always takes its place. Immediately make a reservation that this information necessarily takes place in the file, but it may not go to memory when loading a module if everything is done carefully. In order to do just that, we recall the directives _init and _initdata (by the way, and how they work, we need to figure out - that's the topic of the next post - will you wait for it with impatience?). But in the latter case, sections 2 and 3 of the information in the file are clearly redundant, since the same code will be present in many modules, maliciously violating the DRY principle.

    Due to the noted shortcomings, the implementation of this option is highly unlikely. Moreover, it is not clear why then in the macro to set information about the type of the parameter, because the module itself knows perfectly well what it modifies (although it may be necessary for the parser when checking parameters). The overall estimate of the likelihood of such a decision is 2-3 percent.

    The necessary digression about the noted deficiency number 2 - I was formed, as a specialist, in those days, when 256 KB of RAM was enough for the organization of 4 workplaces, in 56 KB there was a two-task OS, and a single-task OS started working at 16 KB. Well, 650 KB, which should be enough for any program, were generally something from the field of non-scientific fiction. Therefore, I’m used to thinking that RAM is a scarce resource and I’m extremely disapproving of its wasteful use, unless it is caused by extreme necessity (as a rule, requirements for speed), but in this case I don’t observe this situation. Since most of my readers were formed in different realities, you may have your own assessments of the preference of one or another option.

    The second solution - the parser itself is transferred to the AND, which transmits the extracted data to the module (its initialization part) - the parameter number and value. Then we preserve the uniformity of setting the parameters and reduce the size requirements of the module. The question remains how to ensure the receipt and list of possible parameters, but this is provided by macros by creating a predetermined module structure and block location in a particular place (file or memory). The solution is better than the previous one, but still the excess memory in the module remains. In general, I like the solution, since my parser (and the worse I am of all other programmers, I have my own parser, not without flaws, but definitely not fatal) works according to this scheme, returning the number of the identified rule and the value to the main program parameter.

    A sub-variant of the second solution is to transfer the extracted parameters not to the starting part of the module, but directly to its loaded working part, for example, via ioctl - the same memory requirements. We have a unique opportunity to change parameters “on the fly”, which is not realized in other variants. It is not very clear why we might need such a feature, but it looks beautiful. The disadvantage is 1) it will be necessary to reserve in advance a part of the function area for a possibly unused query and 2) the modifier code should be present in the memory permanently. Estimation of probability of realization - percent 5.

    The third solution is transferring to And also the modification of the parameters. Then, in the process of loading the binary code of the module, And can modify the data in the intermediate memory and load the driver code with modified parameters to the permanent location, or make these modifications directly in the memory into which the binary was loaded, and the parameter table present in the file is in memory may or may not load it (remember the directives). The decision responsible, will require, like the previous one, the presence of a predetermined area of ​​communication between the module and AND to store the description of the parameters, but it further reduces the requirements for excess memory in the module. Immediately, we note the main drawback of this solution - the inability to control the values ​​of the parameters and their consistency, but there's nothing to be done.

    The third solution variant - information about the parameters is not stored in the module itself, but in some auxiliary file, then there is simply no excess memory in the module. In principle, the same can be done in the previous version, when the module contains the configuration part, which is used AND in the boot process, but is not loaded into the RAM containing the actually executed part of the module. Compared to the previous version, an extra file was added and it is not clear what we are paying for, but maybe they did it before the invention of the initialization directives - 5 percent.

    Leave the remaining 7 percent for other options that I could not think of. And now, when our imagination has exhausted itself (mine for sure, if there are more ideas, please in the comment), let's start studying the source code L.

    For a start, I note that, apparently, the art of distributing source texts among files is lost along with the OS, which fits in 16 kb, since the directory structure, their names and file names are associated with the content a little more than nothing. Given the presence of nested inclusions, the classical study of downloaded sources with the help of the editor turns into a weird quest and will be unproductive. Fortunately, there is the charming Elixir utility available online that allows you to perform contextual search, and with it the process becomes much more interesting and fruitful. I conducted my further research on the site elixir.bootlin.com. Yes, this site is not an official collection of kernel cheeses, unlike kernel.org, but let's hope that the source for them is identical.

    First, let's take a look at the macro of defining parameters - firstly, we know its name, and secondly, it should be easier (aha, now). It is located in the moduleparam.h file - quite reasonable, but this is a pleasant surprise, considering what we will see later. Macro

    {0}module_param(name,type,perm)

    is a wrapper over

     {0a}module_param_named(n,n,t,p)

    - syntactic sugar for the most common case. At the same time, for some reason, the enumeration of the permissible values ​​of one of the parameters, namely the type of the variable, is given in the comments before the wrapper text, and not the second macro, which actually does the work and can be used directly.

    The macro {0a} contains a call to three macros.

    {1}param_check_##t(n,&v)

    (there is a set of macros for all valid types here),

    {2}module_param_cb(n,&op##t,&v,p)

    and

    {3}__MODULE_PARM_TYPE(n,t)

    (pay attention to the names, however, it is lovely), and the first of them is not used in other places, that is, the recommendations of Ockham and the KISS principle are also ignored by the creators of L - apparently, some groundwork for the future. Of course, these are just macros, but they are not worth anything, but still ...

    The first of the three {1} macros, as is easy to understand from the name, checks the consistency of parameter types and wraps

    __param_check(n,p,t)

    Note that in the first stage of wrapping, the level of abstraction of the macro decreases, and in the second it probably goes no other way, and it only seems to me that it could be simpler and more logical, especially considering that the average macro is not used anywhere else. Ok, let's put another way to check the parameters of the macro and go on.

    But the following two macros actually generate an element of the parameter table. Why two, not one - don't ask me, I have long ceased to understand the logic of the creators of L. Most likely, based on the difference in the style of these two macros, starting with the names, the second one was added later to extend the functionality, and modify the existing structure it was impossible, because initially they wished to allocate space for specifying the option of parameters. The macro {2}, as always, disguises the macro from us

    {2a}_module_param_call(MODULE_PARAM_PREFIX,n,ops,arg,p,-1,0) 

    (funny that this macro is not called directly anywhere, except for 8250_core.c, and it is called there with the same additional parameters), but the latter already produces the source code.

    A small note is that during the search we are convinced that navigation through the texts works well, but there are two unpleasant circumstances: the search by the name fragment does not work (check_param_ was not found, although check_param_byte was found) and the search only works on object declarations (the variable is not found, then found in this file by ctrF, but the built-in source search does not show up). Not too encouraging, because we may need to search for an object outside the current file, but "in the end, we have no other."

    As a result of the work {1} in the text of the compiled module with the following two lines

    module_param_named(name, c, byte, 0x444);
    module_param_named(name1, i, int, 0x444);

    a fragment like the one below appears

    staticconstchar __param_str_name[] = "MODULE"".""name";
    staticstructkernel_paramconst __param_name \
      __attribute__((__used__)) \
      __attribute__ ((unused,__section__ ("__param"),aligned(sizeof(void *)))) \
      = { __param_str_name, ((struct module *)0), &param_ops_byte, (0x444), -1, 0, { &c } };
    staticconstchar __UNIQUE_ID_nametype72[] \
      __attribute__((__used__)) __attribute__((section(".modinfo"), unused, aligned(1))) \
      = "parmtype""=""name"":""byte";
    staticconstchar __param_str_name1[] = "MODULE"".""name1"; 
    staticstructkernel_paramconst __param_name1 \
      __attribute__((__used__)) \
      __attribute__ ((unused,__section__ ("__param"),aligned(sizeof(void *)))) \
      = { __param_str_name1, ((struct module *)0), &param_ops_int, (0x444), -1, 0, { &i } };
    staticconstchar __UNIQUE_ID_name1type73[] __attribute__((__used__)) \
      __attribute__((section(".modinfo"), unused, aligned(1))) \
      = "parmtype""=""name1"":""int";

    (in fact, one-liners are generated there, I broke them into lines for convenience of consideration) and we can immediately state that there is no hint that the program section of the parser or the module assigning values ​​to parameters appears in the source text, so that options 1 and 2 can be considered excluded from further consideration. The presence of special attributes for the linker seems to hint at the existence of a communication region located in a certain predetermined place through which the description of the parameters is transmitted. At the same time, we are surprised to note the complete absence of any description of a formed block of possible parameters in the form of text that could be used by the parser module. It is clear that well-written code is self-documented, but not to the same extent, which again does not raise the probability of option 1 or 2,

    The combination of __used__ and unused attributes looks funny in the last generated line at the same time, especially if you look at the following macro code fragment

    #if GCC_VERSION < 30300# define __used			__attribute__((__unused__))#else# define __used			__attribute__((__used__))#endif

    What is so cracking are the developers of L, it is painfully winding the course of their thoughts, embodied in the code. I know that it is possible to use both forms of attribute recording, but I don’t understand why it should be done in one line.

    One more interesting feature of the resulting code can be noted - duplication of information about the name of a variable and its type. It is not yet clear why this was done, but the fact itself is beyond doubt. Of course, this information is coherent because it is built automatically, and this coherence will be maintained when the source text changes (and this is good), but it is duplicated (and this is bad), maybe later we will understand the need for such a solution. It also remains unclear whether it is necessary to form a unique name using the line number of the source code, because the first generated string did without it.

    Another note - finding out what exactly the parameter definition turns into was not an entirely trivial task, but thanks to MinGW, it was all the same completed. Stringification and double gluing of parameters, formation of unique names, as well as other clever tricks for working with macros remained under the hood, but I present only the results. Summing up the intermediate result, I can say that the study of macros L is not what I would like to earn a living, this is possible only as entertainment, but we continue.

    Further study of macros will not advance us in understanding the task, so we turn to the source text of the AND utility and try to understand what it does.

    First of all, we are amazed to see that the required curds are not included in the kernel sources. Yes, I’m ready to agree that AND is a utility and interacts with the kernel through the entry point to load the module, but any driver book L tells us about this utility, so the absence of an “official” version of its sources somewhere alongside the kernel sources causes misunderstanding me Well, okay, Google did not disappoint, and we still came out on the curds.

    The second surprising thing is that this utility is formed from a package whose name is in no way associated with its name, there are more than one such package, and each is named differently in different places — funny, to say the least. If you have L installed, then with a team - you can find out from which package the utility I assembled and continue to look for him, but if we carry out theoretical research (I personally do not keep L on my own computer due to a number of considerations, some of which I He expounded his posts, such a theoretical boxer), then this method is not available to us and only an Internet search remains, fortunately, it gives results.

    Well, the third surprising thing is that the actual name of the utility does not appear anywhere in the source code, is not used in the file names and is found only in the make file, I know that in C we are obliged to call the main function main, and this is not discussed (personally I’m not delighted with this, because Pascal is spoiled, but they didn’t ask my opinion when designing the language), but at least in the comments I could write the external name of the utility. The necessary note - a lot of things in the C language are done according to the principle “this is our way”, it was probably difficult to do something else, or even impossible, so what can you do now, dragging a suitcase without a handle further.

    We find two packages containing the source text And, also we find cheese on github, we see that they are identical and take it on faith that this is exactly what the source code of the utility looks like. Next, we study only the file on git, especially since here it is just called insmod.c, we find that And first, it converts the parameter list into one long null-terminated string, in which the individual elements are separated by spaces. Following this, it calls two functions, the first of which is called grub_file and obviously opens the binary, while the second has the name init_module and accepts a pointer to an open file with the module’s binary and a parameter string called load_module, which suggests with parameter modification.

    We refer to the text of the second function, which lies in the file ... and here is a bummer - none of the files of the repository under investigation on the GT (well, this is just logical, it is part of the kernel and its place is not here). Google again hurries to the rescue and returns us to the kernel cheeses under the Elixir and the module.c file. It should be noted that, surprisingly, the name of the file containing the functions of working with modules looks logical, I don’t even understand how to explain this, probably, it happened by chance.

    Now we understand the lack of text And next to the kernel - it actually does almost nothing, only translates the parameters from one form to another and transfers control to the core itself, so it’s not worthy to even lie side by side. From this point on, it becomes clear that there is no intelligible external information about the structure of parameters, since the kernel threw them on itself through its own macros and knows everything about them perfectly, and the rest do not need to know anything about the internal structure (in the light of the fact that are available for review, few comments would not hurt, but in principle it is really all further clear without them), but for the realization of the actual mechanism of execution it almost does not throw light at all.

    Remark - I got a bit overwhelmed about transferring control to the kernel, we can see for sure the use of the function, determine the source code of the kernel, and whether the binary part is linked to the module, or lie in the kernel image itself, as yet unknown, it is necessary to investigate further. The fact that the entry point to the processing of this function is framed in a certain special way, through SYSCALL_DEFINE3, indirectly argues in favor of the second option, but I have long understood that my ideas about logical and illogical, acceptable and unacceptable, as well as permissible and unacceptable are very essential at variance with those of developers L.

    Note - another pebble in the garden of the built-in search - when searching for the definition of this macro, I saw many places of its use as functions, among which the definition itself was hidden very modestly as a macro.

    For example, I don’t understand at all why to make an external utility to translate parameters from a standard form for an operating system (agrc, argv) into the form of a null-terminated string with spaces as separators, which is further processed by the system module - this approach is somewhat superior to mine cognitive abilities. Especially, if we take into account the fact that the user enters a string of parameters in the form of a null-terminated string with spaces as separators and the utility in the kernel converts it into a form (argc, argv). Strongly reminiscent of the old joke "Remove the kettle from the stove, pour water out of it and get a task, the solution of which is already known." And since I try to adhere to the principle “Consider the interlocutor no more stupid than yourself, until he proves the opposite. And even after that, you can be mistaken ", and with regard to the developers of L, the first phrase is unambiguously fair, this means that I misunderstand something, and I am not used to it. If anyone can offer a reasonable explanation for the stated fact of double conversion, then I ask in the comment. But we will continue the investigation.

    Prospects for implementation of options 1 and 2 become “very poorly viewed” (charming wording from a recent article on the development prospects of domestic high-speed ADCs), since it would be very strange to load the module into memory using the kernel function, and then transfer control to it functions built into his body. And sure enough, in the text of the load_module function, we quickly find the call to parse_args, it looks like we're on the right track. Then we quickly go through the call chain (as always, we will see both wrapper functions and wrapper macros, but we’ve become accustomed to close our eyes to such cute developer pranks) and find the parse_one function, which places the required parameter in the right place.

    Note that there is no check for the admissibility of parameters, as expected, because the kernel, unlike the module itself, does not know anything about their purpose. There are syntax checks and the number of elements in the array (yes, there may be an array of integers as a parameter) and when errors of this type are detected, the loading of the module stops, but only. However, all is not lost, because after loading control is transferred to the function init_module, which can carry out the necessary validation of the set parameters and, if the saving throw fails , terminate the loading process.

    However, we completely overlooked the question of how parsing functions gain access to an array of parameter samples, because without this, parsing is somewhat difficult. A quick scan of the code shows that a dirty hack is applied — an obvious technique — in a binary file, the find_module_sections function searches for a named __param section, divides its size by the record size (does a lot of things), and returns the necessary data through the structure. I would still put the letters p in front of the names of the parameters of this function, but this is a matter of taste.

    It seems that everything is clear and understandable, the only thing that is alarming is the absence of the __initdata attribute in the generated data, do they remain in memory after initialization; probably, this attribute is described somewhere in the common part, for example, in the linker data, I already, frankly lazy to look, see the epigraph.

    Summing up - the weekend was well spent, it was interesting to understand the source code for L, I remembered something and learned something, but knowledge is not superfluous.
    Well, in my assumptions I didn’t guess, the variant in L was implemented, which turned out to be in 7 remaining percentages, but it was not too painful.

    So in conclusion, the crying of Yaroslavna (as without it) why the necessary information (I do not mean the internal kitchen, but the external presentation) has to be searched for from various sources that do not have official status, where a document similar to the book
    "Software SM Computer . Operating system with separation of functions.
    Rafos A system programmer’s guide. ”Or are they no longer doing this?

    Only registered users can participate in the survey. Sign in , please.

    Traditional feedback


    Also popular now: