Why is it important to check what the malloc function returned

    malloc

    We bring to your attention a series of articles devoted to recommendations for writing high-quality code on the example of errors found in the Chromium project. This is the sixth part that will be devoted to the malloc function. Or rather, why you should definitely check the pointer returned by this function. Most likely, you do not know what the catch is connected with malloc, therefore we recommend that you familiarize yourself with this article.

    Note. In the article, under the malloc function, it will often be understood that we are talking not only about this function, but also about calloc , realloc , _aligned_malloc , _recalloc , strdupetc. I do not want to clutter up the text of the article, constantly repeating the names of all these functions. What they have in common is that they can return a null pointer.

    malloc


    If the malloc function could not allocate a memory buffer, then it returns NULL . Any normal program should check the pointers returned by the malloc function and handle the situation when memory allocation fails.

    Unfortunately, many programmers are careless in checking pointers, and sometimes they deliberately do not check if memory has been allocated or not. Their logic is as follows:
    If the malloc function could not allocate memory, then it is unlikely that my program will continue to function properly. Most likely, the memory will not be enough for other operations, so you can don’t bother with memory allocation errors at all. The very first access to memory by a null pointer will lead to the generation of a Structured Exception on Windows, or the process will receive a SIGSEGV signal if it comes to Unix-like systems. As a result, the program will drop, which suits me. Since there is no memory, then there is nothing to suffer. Alternatively, you can catch the structural exception / signal and handle the null pointer dereferencing more centrally. This is more convenient than writing thousands of checks.
    I don’t think up, I’ve often talked with people who consider this approach appropriate and who never consciously check the result returned by the malloc function .

    By the way, there is another excuse for developers why they don’t check what the malloc function returned . The malloc function only reserves memory, but there is no guarantee at all that there will be enough physical memory when we start using the allocated memory buffer. Therefore, since there is still no guarantee, then there is no need to check. For example, this is exactly how Carsten Haitzler, one of the developers of the EFL Core library, explained why I counted more than 500 places in the library code where there are no checks. Here is his comment on the article :
    OK so this is a general acceptance that at least on Linux which was always our primary focus and for a long time was our only target, returns from malloc / calloc / realloc can't be trusted especially for small amounts. Linux overcommits memory by default. That means you get new memory but the kernel has not actually assigned real physical memory pages to it yet. Only virtual space. Not until you touch it. If the kernel cannot service this request your program crashes anyway trying to access memory in what looks like a valid pointer. So all in all the value of checking returns of allocs that are small at least on Linux is low. Sometimes we do it ... sometimes not. But the returns cannot be trusted in general UNLESS its for very large amounts of memory and your alloc is never going to be serviced - eg your alloc cannot fit in virtual address space at all (happens sometimes on 32bit). Yes overcommit can be tuned but it comes at a cost that most people never want to pay or no one even knows they can tune. Secondly, fi an alloc fails for a small chunk of memory - eg a linked list node ... realistically if NULL is returned ... crashing is about as good as anything you can do. Your memory is so low that you can crash, call abort () like glib does with g_malloc because if you can't allocate 20-40 bytes ... your system is going to fall over anyway as you have no working memory left anyway. I'm not talking about tiny embedded systems here, but large machines with virtual memory and a few megabytes of memory etc. which has been our target. I can see why PVS-Studio doesn't like this. Strictly it is actually correct, but in reality code spent on handling this stuff is kind of a waste of code given the reality of the situation. I'll get more into that later.
    The arguments given by programmers are incorrect, and I will explain in detail below why. But first you need to answer the question: “where does Chromium come from?”

    Chromium


    Chromium here despite the fact that the libraries used in it have at least 70 errors related to the lack of verification after calling functions such as malloc , calloc , realloc . Yes, in Chromium itself these functions are almost never used. In Chromium, only containers or operator new are used . However, since there are errors in the libraries used, it means that we can say that they are in Chromium. Of course, some parts of the libraries may not be used when working with Chromium, but to determine this is difficult and unnecessary. All the same, all errors must be corrected.

    I will not cite many code fragments with errors in the article, since they are of the same type. I will give for example only one error found in the Yasm library:

    static SubStr *
    SubStr_new_u(unsigned char *s, unsigned int l)
    {
        SubStr *r = malloc(sizeof(SubStr));
        r->str = (char*)s;
        r->len = l;
        return r;
    }

    PVS-Studio Warning: V522 CWE-690 There might be dereferencing of a potential null pointer 'r'. Check lines: 52, 51. substr.h 52

    There is no protection against a null pointer in the code. Other similar errors from Chromium and used libraries I put together in a file and posted them here: chromium_malloc.txt . The file mentions 72 errors, but in fact there may be more. As I wrote in the introductory article , I looked at the report only superficially.

    According to the Common Weakness Enumeration, PVS-Studio classifies detected errors as:

    1. CWE-690 : Unchecked Return Value to NULL Pointer Dereference.
    2. CWE-628 : Function Call with Incorrectly Specified Arguments.
    3. CWE-119 : Improper Restriction of Operations within the Bounds of a Memory Buffer

    As you can see, even in such a high-quality project as Chromium, you can notice a lot of defects associated with the lack of checks. Now I turn to the most interesting and tell you why the checks are necessary.

    Why is verification necessary?


    There are 4 reasons at once, each of which is enough to make checks after calling the malloc function . If someone on the team does not write checks, then be sure to make him read this article.

    Before you begin, a little theoretical background on why structural exceptions or signals occur if dereferencing a null pointer occurs. This will be important for further narration.

    At the beginning of the address space, one or more pages of memory are protected by the operating system from being written. This allows you to identify memory access errors by a null pointer, or a pointer whose value is close to 0.

    Different operating systems reserve a different amount of memory for these purposes. However, in some operating systems this value can be configured. Therefore, it makes no sense to name a specific number of reserved bytes of memory. But in order to somehow orient the reader, I will say that on Linux systems the typical value is 64 Kbytes.

    It is important that by adding a sufficiently large number to the null pointer, you can “miss” past the control pages of memory and accidentally get into some pages that are not protected from writing. Thus, you can spoil some data somewhere, but the operating system will not notice this and it will not generate any signal / exception.

    Brew coffee, we begin!

    Null pointer dereferencing is undefined behavior


    From a C and C ++ point of view, dereferencing a null pointer results in undefined behavior . Undefined behavior is anything. Do not assume that you know how the program will behave if dereferencing nullptr occurs . Modern compilers are engaged in serious optimizations, as a result of which it is impossible to predict how this or that error will manifest itself in the code.

    Undefined program behavior is very bad. You should not allow it in your code.

    Do not think that you can cope with dereferencing a null pointer using structural exception handlers (SEH on Windows) or signals (on UNIX-like systems). Once the dereferencing of the null pointer was, then the program is already broken, and anything can happen. Let's look at an abstract example of why you can't rely on SEH handlers, etc.

    size_t *ptr = (size_t *)malloc(sizeof(size_t) * N * 2);
    for (size_t i = 0; i != N; ++i)
    {
      ptr[i] = i;
      ptr[N * 2 - i - 1] = i;
    }

    This code fills the array from the edges to the center. To the center, the values ​​of the elements increase. This is an example invented in 1 minute, so do not guess why someone needs such an array. I myself don’t know. It was important for me that in the next lines of the program there was a record at the beginning of the array and somewhere at its end. This is sometimes necessary in practical problems, and we will consider the real code when we get to the 4th reason.

    Once again, carefully look at these two lines:

    ptr[i] = i;
    ptr[N * 2 - i - 1] = i;

    From the point of view of the programmer, at the beginning of the loop, writing to the ptr [0] element will occur , and a structural exception / signal will occur. It will be processed, and everything will be fine.

    However, for some optimization purposes, the compiler can rearrange the assignments. He has every right to do so. From the point of view of the compiler, if a pointer is dereferenced, then it cannot be nullptr . If the pointer is null, then this is undefined behavior, and the compiler is not required to think about the consequences of optimization.

    So, the compiler can decide that for optimization purposes it is more profitable to perform assignments like this:

    ptr[N * 2 - i - 1] = i;
    ptr[i] = i;

    As a result, at the beginning there will be a record at the address ((size_t *) nullptr) [N * 2 - 0 - 1] . If the value of N is large enough, then the protection page at the beginning of the memory will be “jumped” and the value of the variable i can be written to some cell that is writable. In general, some data will be corrupted.

    And only after that the assignment will be performed at the address ((size_t *) nullptr) [0] . The operating system will notice an attempt to write to the area it controls and will generate a signal / exception.

    A program can handle this structural exception / signal. But it's' too late. Somewhere in the memory there is corrupted data. Moreover, it is unclear what data is corrupted and what consequences this can lead to!

    Is the compiler to blame for swapping assignment operations? Not. The programmer allowed the dereferencing of the null pointer and thereby brought the program into a state of undefined behavior. In this particular case, the indefinite behavior of the program will be that somewhere in the memory the data is corrupted.

    Conclusion

    Proceed from the axiom: any dereferencing of a null pointer is an undefined behavior of a program. There is no “harmless" indefinite behavior. Any undefined behavior is unacceptable.

    Avoid dereferencing pointers returned by the malloc functionand its analogues, without prior verification. Do not rely on any other methods of intercepting dereferencing the null pointer. Use only the good old if statement .

    Null pointer dereferencing is a vulnerability


    What some developers do not consider as a mistake at all, others perceive it as a vulnerability. This is exactly the case with dereferencing a null pointer.

    One is normal if the program crashes due to dereferencing the null pointer, or if the error is handled in some general way by means of signal interception / structural exception.

    Others believe that dereferencing a null pointer results in a denial of service error and is a vulnerability. Instead of properly handling the lack of memory, the program, or one of the threads of the program, terminates. This can lead to data loss, data integrity violation, and so on. In other words, the CAD system will stupidly close if it cannot allocate memory for some complex operation without inviting the user to even save the result of their work.

    I will not be unfounded. There is such a program as Ytnef, designed to decode TNEF streams, for example, created in Outlook. So, application developers consider the lack of verification after calling calloc a vulnerability of CVE-2017-6298 .

    All the corrected places in which dereferencing of the null pointer could occur were approximately the same:

    vl->data = calloc(vl->size, sizeof(WORD));
    temp_word = SwapWord((BYTE*)d, sizeof(WORD));
    memcpy(vl->data, &temp_word, vl->size);

    Conclusions

    If you are developing an irresponsible application for which falling in the process of work is not a disaster, then yes, writing checks is optional.

    However, if you are developing some kind of library, then the absence of checks is unacceptable! Your library can be used not only by lazy programmers creating irresponsible applications, such as the Tetris game. It is necessary to take care of both normal programmers and normal programs.

    Therefore, I ideologically disagree, for example, with Carsten Haitzler that there are no checks in the EFL Core library (details in the article ). This does not allow building reliable applications on the basis of such libraries.

    In general, if you are creating a library, then remember that in some applications, dereferencing a null pointer is a vulnerability. It is necessary to handle memory allocation errors and regularly return information about the failure.

    Where are the guarantees that the null pointer will be dereferenced?


    Those who are too lazy to write checks, for some reason, think that dereferencing affects exactly null pointers. Yes, often that is what happens. But can a programmer vouch for the code for the entire application? Sure no.

    Now I will show with practical examples what I mean. Take, for example, the code from the LLVM-subzero library, which is used by Chromium. To be honest, I’m guessing what is the connection between the Chromium project and LLVM, but it is.

    void StringMapImpl::init(unsigned InitSize) {
      assert((InitSize & (InitSize-1)) == 0 &&
             "Init Size must be a power of 2 or zero!");
      NumBuckets = InitSize ? InitSize : 16;
      NumItems = 0;
      NumTombstones = 0;
      TheTable = (StringMapEntryBase **)
                 calloc(NumBuckets+1,
                        sizeof(StringMapEntryBase **) + 
                        sizeof(unsigned));
      // Allocate one extra bucket, set it to look filled
      // so the iterators stop at end.
      TheTable[NumBuckets] = (StringMapEntryBase*)2;
    }

    PVS-Studio Warning: V522 CWE-690 There might be dereferencing of a potential null pointer 'TheTable'. Check lines: 65, 59. stringmap.cpp 65

    Immediately after allocating the memory buffer, it writes to TheTable [NumBuckets] cell . If the value of the NumBuckets variable is large enough, then we will spoil some data with unpredictable consequences. After such damage, it makes no sense at all to speculate on how the program will work. The most unexpected consequences may follow.

    I see similar dangerous assignments in two more places of this project:

    • V522 CWE-690 There might be dereferencing of a potential null pointer 'Buckets'. Check lines: 219, 217. foldingset.cpp 219
    • V769 CWE-119 The 'NewTableArray' pointer in the 'NewTableArray + NewSize' expression could be nullptr. In such case, resulting value will be senseless and it should not be used. Check lines: 218, 216. stringmap.cpp 218

    So this is not a unique case, but rather a typical situation when data is recorded not exactly at the null pointer, but at some arbitrary offset.

    I will continue the discussion in absentia with Carsten Haitzler. He claims that they understand what they are doing when they don’t check the result of calling the malloc function . No, they don’t understand. Let's take a look at, for example, this snippet of code from the EFL library:

    static void
    st_collections_group_parts_part_description_filter_data(void)
    {
      ....
       filter->data_count++;
       array = realloc(filter->data,
         sizeof(Edje_Part_Description_Spec_Filter_Data) *
         filter->data_count);
       array[filter->data_count - 1].name = name;
       array[filter->data_count - 1].value = value;
       filter->data = array;
    }

    PVS-Studio Warning: V522 There might be dereferencing of a potential null pointer 'array'. edje_cc_handlers.c 14249

    Note . I use the old source EFL Core Libraries, which I have remained since the time of writing the article about this library. Therefore, the code or line numbers may no longer correspond to what is now. However, this is not important for the narrative.

    This is a typical situation: the buffer does not have enough free space for data storage, and it should be increased. To increase the size of the buffer, the realloc function is used , which can return NULL .

    If this happens, then a structural exception / signal does not necessarily occur due to dereferencing the null pointer. Let's take a look at these lines:

    array[filter->data_count - 1].name = name;
    array[filter->data_count - 1].value = value;

    If the value of the filter-> data_count variable is large enough, then the values ​​will be written to some strange address.

    Some data will be corrupted in the memory, and the program will continue to execute. The consequences are again unpredictable, but nothing good will come of it.

    I did not carefully study the old report regarding EFL Core Libraries, but this is definitely not the only such error. I noticed at least two more similar places where, after realloc, the data is added at some index.

    Conclusion

    I again ask the question: “where are the guarantees that it will be dereferencing exactly the null pointer?”. There is no such guarantee. It is impossible, while developing or modifying the code, to remember the nuance just considered. You can easily spoil something in memory, while the program will continue to run as if nothing had happened.

    The only way to write reliable and correct code is to always check the result returned by the malloc function . Check and live calmly.

    Where are the guarantees that memset fills memory in direct order?


    There is someone who says something like this:
    I understand perfectly about realloc and everything else that is written in the article. But I am a professional and, allocating memory, I immediately fill it with zeros using memset . Where really necessary, I use checks. But I will not write extra checks after each malloc .
    In general, filling the memory immediately after allocating the buffer is a rather strange idea. Strange because there is a calloc function . However, this is done very often. You don’t need to go far for an example, here is the code from the WebRTC library used in Chromium:

    int Resampler::Reset(int inFreq, int outFreq, size_t num_channels) {
      ....
      state1_ = malloc(8 * sizeof(int32_t));
      memset(state1_, 0, 8 * sizeof(int32_t));
      ....
    }

    Memory is allocated, then the buffer is filled with zeros. A very common practice, although, in fact, two lines can be reduced to one using calloc . But all this is not important.

    The main thing is that even such code is not safe! The memset function is not required to start filling the memory from the beginning and thereby cause the dereference of the null pointer.

    The memset function has the right to start filling the buffer from the end. And, if a large buffer was allocated, then some useful data may be erased. Yes, filling memory, memset functionsooner or later it will reach a write-protected page, and the operating system will generate a structural exception / signal. However, processing them no longer makes sense. At this point, a large fragment of memory will be corrupted, and further work of the program will be unpredictable.

    The reader may object that all this is purely theoretical. Yes, the memset function can theoretically fill the buffer starting at the end of the buffer, but in practice, no one will implement this function.

    I agree that such an implementation of memset is really exotic, and I even asked a question on StackOverflow on this topic. The answer says:

    The Linux kernel's memset for the SuperH architecture has this property: link.

    Unfortunately, this is a code on an unfamiliar version of assembler, so I don’t presume to talk about it. But there is still such an interesting implementation in the C language. I will give the beginning of this function:

    void *memset(void *dest, int c, size_t n)
    {
      unsigned char *s = dest;
      size_t k;
      if (!n) return dest;
      s[0] = c;
      s[n-1] = c;
      ....
    }

    Pay attention to:

    s[0] = c;
    s[n-1] = c;

    Here we return to reason N1, "Dereferencing a null pointer is undefined behavior." There is no guarantee that the compiler will not swap the assignments for optimization purposes. If the compiler does this, and the argument n is of great importance, then some byte of memory will be corrupted at first. And only then will the dereferencing of the null pointer occur.

    Again not convincing? Well, how do you like this implementation :

    void *memset(void *dest, int c, size_t n)
    {
      size_t k;
      if (!n) return dest;
      s[0] = s[n-1] = c;
      if (n <= 2) return dest;
      ....
    }

    Conclusion

    Even the memset function cannot be trusted . Yes, this is largely an artificial and far-fetched problem. I just wanted to show how many nuances exist if you do not check the value of the pointer. It is simply impossible to take all this into account. Therefore, it is not necessary to show off, but it is necessary to carefully check each pointer that the malloc function and the like returned . And just then you will become a professional.

    Conclusion


    Always check immediately the pointer returned by the malloc function or similar.

    As you can see, the PVS-Studio analyzer is not in vain warning that there is no pointer check after calling malloc . It is not possible to write reliable code without checking. This is especially important and relevant for library developers.

    I hope that now you will take a fresh look at the malloc function , checking pointers in the code, and warning the PVS-Studio analyzer. Do not forget to show this article to your colleagues and start using PVS-Studio . I wish everyone less bugs.


    If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov. Why it is important to check what the malloc function returned .

    Also popular now: