Chromium Sixth Check, Afterword

    strict unicorn

    At the beginning of 2018, a series of articles appeared in our blog dedicated to the sixth verification of the Chromium project source code. The cycle includes 8 articles on errors and recommendations for their prevention. Two articles provoked a heated discussion, and until now I have received occasional comments on the topics covered in them. Perhaps we should give some additional explanations and, as they say, dot the i.

    A year has passed since the writing of the series of articles devoted to the next verification of the source code of the Chromium project:

    1. Chromium: the sixth project check and 250 bugs
    2. Beautiful Chromium and gnarled memset
    3. break and fallthrough
    4. Chromium: memory leaks
    5. Chromium typos
    6. Chromium: using unreliable data
    7. Why is it important to check that the malloc function returned
    8. Chromium: other errors

    Memset and malloc articles provoked and continue to provoke discussions that seem strange to me. Apparently there was some misunderstanding due to the fact that I did not clearly formulated my thoughts. I decided to return to these articles and make some clarifications.

    memset


    Let's start with the article about memset , since everything is simple here. There were disputes about how best to initialize the structure. Quite a lot of programmers have written that it is better to give a recommendation to write not:

    HDHITTESTINFO hhti = {};

    And so:

    HDHITTESTINFO hhti = { 0 };

    Arguments:

    1. The {0} construct is easier to see when reading a code than {}.
    2. The {0} construct is more intuitive than {}. That is, 0 suggests that the structure is filled with zeros.

    Accordingly, I am offered to change this example of initialization in the article. I do not agree with the arguments and do not plan to make changes to the article. Now I justify my position.

    About visibility. I think this is a matter of taste and habit. I do not think that the presence of 0 inside the curly brackets fundamentally changes the situation.

    But with the second argument, I do not agree. Writing the form {0} gives reason to misinterpret the code. For example, you can calculate that if you replace 0 with 1, all fields will be initialized to ones. Therefore, this style of writing is more harmful than useful.

    In the PVS-Studio analyzer, there is even a related diagnosis V1009 on this topic , which I will describe now.

    V1009. Check the array initialization. Only the first element is initialized explicitly.

    The analyzer detected a possible error related to the fact that when declaring an array, a value was specified for only one element. Thus, the remaining elements will be initialized implicitly by zero or the default constructor.

    Consider an example of a suspicious code:

    int arr[3] = {1};

    Perhaps the programmer expected that arr would consist of one units, but this is not the case. The array will consist of the values ​​1, 0, 0. The

    correct code is:

    int arr[3] = {1, 1, 1};

    Such confusion may occur due to the similarity with the arr = {0} construct , which initializes the entire array to zero.

    If in your project such constructions are actively used, you can disable this diagnostics.

    It is also not recommended to neglect the visibility of the code.

    For example, the code for coding color values ​​is written as follows:

    int White[3] = { 0xff, 0xff, 0xff };
    int Black[3] = { 0x00 };
    int Green[3] = { 0x00, 0xff };

    Due to the implicit initialization, all colors are set correctly, but it is better to rewrite the code more clearly:

    int White[3] = { 0xff, 0xff, 0xff };
    int Black[3] = { 0x00, 0x00, 0x00 };
    int Green[3] = { 0x00, 0xff, 0x00 };

    malloc


    Before reading further, I ask you to refresh the contents of the article “ Why it is important to check that the malloc function returned ”. This article spawned a lot of discussion and criticism. Here are some of the discussions: reddit.com/r/cpp , reddit.com/r/C_Programming , habr.com . Occasionally they write to me about this article in the mail and now.

    The article is criticized by readers for the following points:

    1. If malloc returns NULL , then it is better to immediately terminate the program operation than to write a bunch of if -s and try to somehow handle the lack of memory, which often makes it impossible to execute the program.

    I didn’t at all urge to fight the consequences of the lack of memory until recently, by sending the error higher and higher. If the application is allowed to complete the work without warning, then let it be. All that is needed is a single check immediately after malloc or using xmalloc (see the next paragraph).

    I objected and warned about the absence of checks, because of which the program continues to work "as if nothing had happened." This is completely different. This is dangerous because it leads to undefined behavior, data corruption, and so on.

    2. Not told about the solution, which is to write a wrapper function for memory allocation, followed by testing or using existing functions, such as xmalloc .

    I agree, I missed this moment. When writing an article, I just did not think about how to correct the situation. It was more important for me to convey to the reader what the danger of a lack of verification is. How to correct a mistake is already a question of taste and implementation details.

    The xmalloc function is not part of the standard C library (see " What is the difference between xmalloc and malloc? "). However, this function may be declared in other libraries, for example, in the GNU utils library ( GNU libiberty ).

    The essence of the function is that the program terminates if memory cannot be allocated. The implementation of this function may look like this:

    void* xmalloc(size_t s)
    {
      void* p = malloc(s);
      if (!p) {
        fprintf (stderr, "fatal: out of memory (xmalloc(%zu)).\n", s);
        exit(EXIT_FAILURE);
      }
      return p;
    }

    Accordingly, by calling xmalloc everywhere, instead of malloc, you can be sure that there will be no undefined behavior in the program due to any use of the null pointer.

    Unfortunately, xmalloc is also not a panacea for all ills. It must be remembered that the use of xmalloc is unacceptable when it comes to writing code libraries. I will talk about this later.

    3. The most comments were of the following form: “in practice, malloc never returns NULL ”.

    Fortunately, I am not the only one who understands that this is the wrong approach. I really liked this comment in my support:

    From the experience of discussing such a topic, it seems that there are two sects on the Internet. The adherents of the first are firmly convinced that under Linux, malloc never returns NULL. The adherents of the second are firmly convinced that if the memory in the program could not be allocated, but nothing can be done in principle, you just need to fall. It is impossible to convince them in any way. Especially when these two sects intersect. One can only take it for granted. And it doesn’t matter which core resource the discussion is on.

    I thought and decided to follow the advice and will not try to persuade :). Let's hope that these development teams write only non-critical programs. If, for example, some data gets messed up in the game or the game crashes, it's not scary.

    The only thing that is important is that the developers of libraries, databases, etc. do not do this.

    Воззвание к разработчикам библиотек и ответственного кода


    If you are developing a library or other responsible code, then always check the value of the pointer returned by the malloc / realloc function and return the error code to the outside if memory could not be allocated.

    In libraries, you cannot call the exit function if memory allocation failed. For the same reason, xmalloc cannot be used . For many applications, it is unacceptable to just take them and crash them. Because of this, for example, the database may be corrupted. Data that has been considered for many hours may be lost. Because of this, the program may be exposed to the denial of service vulnerabilities, when instead of somehow correctly processing the increasing load, the work of a multi-threaded application simply terminates.

    It is impossible to assume how and in which projects the library will be used. Therefore, it should be assumed that the application can solve very important tasks. And just “kill” him by calling exit , is no good. Most likely, such a program is written taking into account the possibility of a lack of memory and can do something in this case. For example, a certain CAD system cannot allocate sufficient memory buffer for the next operation due to severe memory fragmentation. But this is not a reason for it to end in emergency mode with data loss. The program can give the opportunity to save the project and restart itself in the normal mode.

    In no case can not rely on the fact that malloccan always allocate memory. It is not known on which platform and how the library will be used. If on one platform the lack of memory is exotic, then on the other it can be a very common situation.

    You can’t hope that if malloc returns NULL , the program will crash. Anything can happen. As I described in the article , the program can write data not at zero address at all. As a result, some data may be corrupted, leading to unpredictable consequences. Even memset is dangerous. If the data is filled in the reverse order, then at first some data is corrupted, and only then the program will fall. But the fall may happen too late. If at the time of the memset functioncorrupted data will be used in parallel streams, then the consequences can be fatal. You can get a corrupt transaction in the database or send a command to delete "unnecessary" files. Anything can happen. I suggest the reader to fantasize on his own what the use of garbage in memory may lead to.

    Thus, the library has only one correct way of working with malloc functions . It is necessary DIRECTLY to check that the function has returned, and if this is NULL, then return the error status.

    Additional links


    1. OOM handling .
    2. Fun with NULL pointers: part 1 , part 2 .
    3. What Every C Programmer Should Know About Undefined Behavior: part 1 , part 2 , part 3 .



    If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov. Sixth Chromium Check, Afterword .

    Also popular now: