Reflections on Null Pointer Dereferencing
It turns out that the question of whether or not such a code & ((T *) (0) -> x) is correct is very difficult. I decided to write a little note about it.
In a recent article about testing the Linux kernel using the PVS-Studio analyzer, I wrote that I found this code fragment:
Also in the article I wrote that such a code, in my opinion, is incorrect. Details can be found in the article.
After that, letters were sent to me in the mail saying that I was wrong, and this code is completely correct. Many have indicated that if podhd == 0, then this code essentially implements the “offsetof” idiom, and nothing bad can happen. In order not to write a lot of answers, I decided to draw up the answer in the form of a small blog post.
Naturally, I decided to study this topic in more detail. But, to be honest, as a result, I just got even more confused. Therefore, I will not give you an exact answer whether you can write like that or not. I will only provide some links and share my opinion.
When I wrote an article about checking Linux, I thought like that.
Any dereferencing of a null pointer is undefined behavior. One of the manifestations of undefined behavior can be such optimization of the code when the check (podhd == NULL) disappears. This is the scenario I described in the article.
In the letters, some developers wrote that they could not achieve this behavior on their compilers. However, this does not prove anything. The expected correct operation of the program is just one of the variants of undefined behavior.
Some also wrote that the ffsetof () macro works just like this:
However, this does not prove anything. Such macros are specially designed to work correctly in the right compiler. If we write similar code, it is not at all necessary that it will work.
Moreover, here the compiler clearly sees 0 and can guess what the programmer wants from it. When 0 is stored in a variable, this is a completely different matter, and the compiler may behave in an unexpected way.
Here's what Wikipedia says about offsetof :
The "traditional" implementation of the macro relied on the compiler being not especially picky about pointers; it obtained the offset of a member by specifying a hypothetical structure that begins at address zero:
#define offsetof (st, m) ((size_t) (& ((st *) 0) -> m))
This works by casting a null pointer into a pointer to structure st, and then obtaining the address of member m within said structure. While this works correctly in many compilers, it has undefined behavior according to the C standard, since it involves a dereference of a null pointer (although, one might argue that no dereferencing takes place, because the whole expression is calculated at compile time). It also tends to produce confusing compiler diagnostics if one of the arguments is misspelled. Some modern compilers (such as GCC) define the macro using a special form instead, eg
#define offsetof (st, m) __builtin_offsetof (st, m)
As you can see, according to Wikipedia, I'm right. You can’t write like that. This is undefined behavior. Some on the StackOverflow website also think:Address of members of a struct via NULL pointer .
However, I am confused by the fact that, although everyone is talking about indefinite behavior, there is no exact explanation anywhere on this subject. For example, in Wikipedia there is a note that the statement requires confirmation [citation needed].
Similar issues have been discussed on the forums many times, but nowhere have I seen an unambiguous explanation supported by references to the C or C ++ standard.
There is still such an old discussion of the standard, which also did not add clarity: 232. Is indirection through a null pointer undefined behavior?
So, at the moment, the final question is not clear to me. However, I still think this code is bad and should be refactored.
If someone sends me good notes on this topic, I will add them at the end of this article.
UPDATE: Continued: habrahabr.ru/company/pvs-studio/blog/250701
In a recent article about testing the Linux kernel using the PVS-Studio analyzer, I wrote that I found this code fragment:
static int podhd_try_init(struct usb_interface *interface,
struct usb_line6_podhd *podhd)
{
int err;
struct usb_line6 *line6 = &podhd->line6;
if ((interface == NULL) || (podhd == NULL))
return -ENODEV;
....
}
Also in the article I wrote that such a code, in my opinion, is incorrect. Details can be found in the article.
After that, letters were sent to me in the mail saying that I was wrong, and this code is completely correct. Many have indicated that if podhd == 0, then this code essentially implements the “offsetof” idiom, and nothing bad can happen. In order not to write a lot of answers, I decided to draw up the answer in the form of a small blog post.
Naturally, I decided to study this topic in more detail. But, to be honest, as a result, I just got even more confused. Therefore, I will not give you an exact answer whether you can write like that or not. I will only provide some links and share my opinion.
When I wrote an article about checking Linux, I thought like that.
Any dereferencing of a null pointer is undefined behavior. One of the manifestations of undefined behavior can be such optimization of the code when the check (podhd == NULL) disappears. This is the scenario I described in the article.
In the letters, some developers wrote that they could not achieve this behavior on their compilers. However, this does not prove anything. The expected correct operation of the program is just one of the variants of undefined behavior.
Some also wrote that the ffsetof () macro works just like this:
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
However, this does not prove anything. Such macros are specially designed to work correctly in the right compiler. If we write similar code, it is not at all necessary that it will work.
Moreover, here the compiler clearly sees 0 and can guess what the programmer wants from it. When 0 is stored in a variable, this is a completely different matter, and the compiler may behave in an unexpected way.
Here's what Wikipedia says about offsetof :
The "traditional" implementation of the macro relied on the compiler being not especially picky about pointers; it obtained the offset of a member by specifying a hypothetical structure that begins at address zero:
#define offsetof (st, m) ((size_t) (& ((st *) 0) -> m))
This works by casting a null pointer into a pointer to structure st, and then obtaining the address of member m within said structure. While this works correctly in many compilers, it has undefined behavior according to the C standard, since it involves a dereference of a null pointer (although, one might argue that no dereferencing takes place, because the whole expression is calculated at compile time). It also tends to produce confusing compiler diagnostics if one of the arguments is misspelled. Some modern compilers (such as GCC) define the macro using a special form instead, eg
#define offsetof (st, m) __builtin_offsetof (st, m)
As you can see, according to Wikipedia, I'm right. You can’t write like that. This is undefined behavior. Some on the StackOverflow website also think:Address of members of a struct via NULL pointer .
However, I am confused by the fact that, although everyone is talking about indefinite behavior, there is no exact explanation anywhere on this subject. For example, in Wikipedia there is a note that the statement requires confirmation [citation needed].
Similar issues have been discussed on the forums many times, but nowhere have I seen an unambiguous explanation supported by references to the C or C ++ standard.
There is still such an old discussion of the standard, which also did not add clarity: 232. Is indirection through a null pointer undefined behavior?
So, at the moment, the final question is not clear to me. However, I still think this code is bad and should be refactored.
If someone sends me good notes on this topic, I will add them at the end of this article.
UPDATE: Continued: habrahabr.ru/company/pvs-studio/blog/250701