Null pointer dereferencing leads to undefined behavior

    Silent NULL (Dereferencing a null pointer results in undefined behavior)
    Inadvertently, I gave rise to a big discussion regarding whether it is permissible to use the expression & P-> m_foo in C / C ++ if P is a null pointer. Programmers are divided into two camps. Some confidently argued that you can’t write like that, others just as confidently argued that you can. Various arguments and references were given. And I realized that it was necessary to bring final clarity to this issue. To do this, I turned to Microsoft MVP experts and Visual C ++ developers who communicate through a private mailing list. They helped prepare this article, and I present it to everyone. For impatient: this code is not correct.

    Let me remind you the discussion history


    It all started with an article on checking the Linux kernel using the PVS-Studio analyzer. But the kernel check itself has nothing to do with it. The fact is that in the article I cited the following snippet from Linux code:
    static int podhd_try_init(struct usb_interface *interface,
            struct usb_line6_podhd *podhd)
    {
      int err;
      struct usb_line6 *line6 = &podhd->line6;
      if ((interface == NULL) || (podhd == NULL))
        return -ENODEV;
      ....
    }

    I called this code dangerous, because I thought that there was an undefined behavior .

    On this occasion, I received many objections from readers, and even at one time I was ready to succumb to their convincing speeches in letters and comments. For example, as a proof of the correctness of the code, the offsetof macro device was used , which is often implemented as follows:
    #define offsetof(st, m) ((size_t)(&((st *)0)->m))

    This is where the null pointer dereference takes place, but the code works successfully. There were other letters with the argument that since there is no access by a null pointer, then there is no problem.

    Although I am gullible, I try to verify the information. I began to deal with this topic and as a result I wrote a short article: “ Reflections on the dereferencing of a null pointer ”.

    It appeared that I was right. You can’t write like that. However, I could not finally substantiate my position and provide the necessary links to the standard.

    After the article, letters of objection again followed, and I realized that I had to deal with this topic completely. I asked the experts a question to get their opinion. This article is their generalized answer.

    About C language


    The expression '& podhd-> line6' is an undefined behavior in C if 'podhd' is a null pointer.

    Here's what the C99 standard operator for taking an address is

    . an expression pointing to an object that is not a bit field and does not contain a register class class specifier in the declaration.

    The expression 'podhd-> line6' is definitely not a function pointer, the result of the [] or * operator. This is just an lvalue expression. However, when the 'podhd' pointer is zero, the expression does not point to an object, as in Section 6.3.2.

    If the constant of a null pointer is cast to a type of pointers, then the resulting pointer, called null, is guaranteed to be not equal to a pointer to any object or function.

    If “an lvalue expression does not point to an object in its calculation, undefined behavior occurs” (C99 Standard, Section 6.3.2.1 “Lvalue Expressions, Arrays, and Function Pointers”): an

    lvalue is an expression of an object type or an incomplete type other than void ; if the lvalue expression does not point to an object in its calculation, undefined behavior occurs.

    Briefly again:

    When the -> operator was applied to the pointer, its result was an lvalue expression for which there is no object, and as a result we are dealing with undefined behavior.

    About C ++


    In C ++, everything is exactly the same. The expression '& podhd-> line6' is an undefined behavior in C ++ if 'podhd' is a null pointer.

    The discussion on WG21 ( 232. Is indirection through a null pointer undefined behavior? ), Which I referred to in a previous article, is a little confusing . They insist that such an expression is not an indefinite behavior. However, no one has ever found any rules in C ++ standards that would allow the use of “poldh-> line6” when “polhd” is a null pointer.

    The “polhd” pointer violates the basic restriction (Section 5.2.5 / 4, second item on the list) that it must point to an object. No object in C ++ can have a nullptr address.

    Total


    struct usb_line6 *line6 = &podhd->line6;

    This code is incorrect in C and C ++ if the podhd pointer is 0. If the pointer is 0, then undefined behavior occurs.

    That the program can work is luck. Indefinite behavior can manifest itself, as you like. Including, the program can work as the programmer wanted. This is one of the special cases, but nothing more.

    You can’t write like that. The pointer must be checked before dereferencing.

    Miscellaneous in addition


    • When considering the idiomatic implementation of offsetof (), it should be borne in mind that the compiler is allowed to use intolerable techniques to implement this functionality. The fact that the library implementation in the compiler uses the null pointer constant when implementing offsetof () does not mean at all that in the user code you can safely use & podhd-> line6 in the case where podhd is a null pointer.
    • GCC can (and does it) carry out optimization based on the assumption that no indefinite behavior can occur, and remove in this case checks for pointers to zero - therefore, the kernel is compiled with a set of keys that tell the compiler not to do this. For example, experts refer to the article “ What Every C Programmer Should Know About Undefined Behavior # 2/3 ” as an example .
    • You may also be interested to know that in this way the null pointer was involved in the kernel exploit using the TUN / TAP drive. Details can be found on the link " Fun with NULL pointers ". Some may decide that these two examples have little in common, because in the second case there is a significant difference: in the bug of the TUN / TAP driver, instead of simply taking the address of the structure field that the null pointer accessed, this field was explicitly taken as the value for initialization variable. However, from the point of view of standard C, taking a field address using a null pointer is also an undefined behavior.
    • But is there any situation when, with P == nullptr, we write & P-> m_foo and everything will be fine? Yes, for example, this could be an argument to the sizeof: sizeof operator (& P-> m_foo).

    Acknowledgments


    In the preparation of the article, experts helped me to doubt the competence, which is no reason. I am grateful for the help in writing the article to the following people:
    • Michael Burr is an ardent fan of C / C ++ and a specialist in system and firmware, including Windows services, working with networks and device drivers. He is actively involved in the life of the StackOverflow community , answering questions from C and C ++ programmers (and sometimes some simple questions about C #). He has 6 Microsoft MVP awards in the Visual C ++ nomination.
    • Billy O'Neill is a C ++ software developer (primarily) and an active member of the StackOverflow community . He is a software development engineer in the Microsoft Security Trustworthy Computing Team. Prior to that, he worked at several software security companies, including Malware Bytes and PreEmptive Solutions.
    • Giovanni Dicanio is a programmer specializing in the development of Windows. Author of articles for programmers on C ++, OpenGL and other topics in a number of Italian computer magazines. Also wrote code for some open source projects. Giovanni helps colleagues by giving advice on solving programming problems related to C and C ++ on Microsoft MSDN forums and, for some time now, on StackOverflow. He has 8 Microsoft MVP awards in the Visual C ++ nomination.
    • Gabriel Dus Reis is Microsoft's chief software engineer. He is also a researcher and long-term member of the C ++ community. One of the areas of his scientific interests and research is the means of developing reliable software. Before joining Microsoft, he worked as a senior lecturer at Texas A&M University. In 2012, Dr. Dus Reis was awarded the National Science Foundation CAREER Award for his research on compilers of reliable software in the field of computational mathematics and for educational activities. He is a member of the C ++ language standardization committee.

    Sitelinks


    1. Wikipedia Undefined behavior .
    2. A Guide to Undefined Behavior in C and C ++. Part 1 , 2 , 3 .
    3. Wikipedia offsetof .
    4. LLVM Blog. What Every C Programmer Should Know About Undefined Behavior # 2/3 .
    5. LWN. Fun with NULL pointers. Part 1 , 2 .

    Also popular now: