I don't know si
The purpose of this article is to get everyone, especially C programmers, to say “I don't know C”.
I want to show that dark corners in C are much closer than it seems and even trivial lines of code carry undefined behavior.
The article is organized as a set of questions. Answers are written in white. All examples are separate source code files.
1.
Q: Is this the correct code? (Will there be an error related to the fact that the variable is defined twice? I remind you that this is a separate source code file, not at the function level or compound statement)
A: Yes, this is the correct code. The first line is the tentative definition, which becomes the declaration after the compiler has processed the definition (second line).
2.
Q: It turned out that bar () is called even when x is a null pointer (and the program does not crash). Optimizer error or is everything correct?
A: Yes, everything is correct. If x is a null pointer, then undefined behavior appears in line (1), and no one is obligated to the programmer: the program is not obliged to either fall in line (1) or make a return in line (2) if the line ( 1). If we talk about what rules the compiler was guided by, then everything happened like that. After analyzing the string (1), the compiler considers that x cannot be a null pointer and deletes (2) and (3) as inaccessible code (dead code elimination). The variable y is deleted as unused and since the type * x is not qualified volatile, then the reading from memory is also deleted.
This is how an unused variable removed the check for a null pointer.
3.
There was such a function:
They wanted to optimize it like this:
Q: Is it possible to call the original and optimized function so that we get different results in zp?
A: Yes, let yp == zp.
4.
Q: Can this function return inf (infinity)? Assume that floating point numbers are implemented according to IEEE 754 (the vast majority of machines). assert enabled (NDEBUG undefined).
A: Yes. It is enough to pass denormalized x, for example, 1e-309.
5.
Q: The above function should return the length of a null-terminated string. Find the mistake.
A: Using the int type to store the size of objects is erroneous: it is not guaranteed that int will be able to accommodate the size of any object. You should use size_t.
6.
Q: The cycle is eternal. Why?
A: size_t is an unsigned type. If i is unsigned, then i> = 0 is always satisfied.
7.
This program was compiled by two different compilers and launched on a little-endian machine. Got two different results:
Q: How to explain the second result?
A: This program has undefined behavior, namely, strict aliasing rules are violated. In line (2) int is changed, so we can assume that any long has not changed. (You cannot dereference a pointer that aliases another pointer of an incompatible type.) Therefore, the compiler can pass in line (3) the same long that was read during the execution of line (1).
8.
Q: Is this the correct code? If there is no undefined behavior, then what does it output?
A: Yes, the comma operator is used here. First, the left argument is calculated with a comma and discarded, then the right argument is calculated and used as the value of the entire operator. Conclusion: 10 2 10.
Note that the comma character in the function call (for example, f (a (), b ())) is not a comma operator and therefore does not guarantee the calculation order: a (), b () can be called in any order.
9.
Q: What is the result of add (UINT_MAX, 1)?
A: Overflow of unsigned numbers is defined, calculated modulo 2 ^ (CHAR_BIT * sizeof (unsigned int)). The result is 0.
10.
Q: What is the result of add (INT_MAX, 1)?
A: Sign number overflow - undefined behavior.
eleven.
Q: Is undefined behavior possible here? If so, with what arguments?
A: neg (INT_MIN). If the computer represents negative numbers in the additional code (the English twos complement, the vast majority of machines), then the absolute value of INT_MIN is one more than the absolute value of INT_MAX. In this case, -INT_MIN causes a sign overflow - undefined behavior.
12.
Q: Is undefined behavior possible here? If so, with what arguments?
A: If the computer represents negative numbers in the additional code, then div (INT_MIN, -1) - see the previous question.
- Dmitri Gribenko

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License .
I want to show that dark corners in C are much closer than it seems and even trivial lines of code carry undefined behavior.
The article is organized as a set of questions. Answers are written in white. All examples are separate source code files.
1.
int i;
int i = 10;
Q: Is this the correct code? (Will there be an error related to the fact that the variable is defined twice? I remind you that this is a separate source code file, not at the function level or compound statement)
A: Yes, this is the correct code. The first line is the tentative definition, which becomes the declaration after the compiler has processed the definition (second line).
2.
extern void bar(void);
void foo(int *x)
{
int y = *x; /* (1) */
if(!x) /* (2) */
{
return; /* (3) */
}
bar();
return;
}
Q: It turned out that bar () is called even when x is a null pointer (and the program does not crash). Optimizer error or is everything correct?
A: Yes, everything is correct. If x is a null pointer, then undefined behavior appears in line (1), and no one is obligated to the programmer: the program is not obliged to either fall in line (1) or make a return in line (2) if the line ( 1). If we talk about what rules the compiler was guided by, then everything happened like that. After analyzing the string (1), the compiler considers that x cannot be a null pointer and deletes (2) and (3) as inaccessible code (dead code elimination). The variable y is deleted as unused and since the type * x is not qualified volatile, then the reading from memory is also deleted.
This is how an unused variable removed the check for a null pointer.
3.
There was such a function:
#define ZP_COUNT 10
void func_original(int *xp, int *yp, int *zp)
{
int i;
for(i = 0; i < ZP_COUNT; i++)
{
*zp++ = *xp + *yp;
}
}
They wanted to optimize it like this:
void func_optimized(int *xp, int *yp, int *zp)
{
int tmp = *xp + *yp;
int i;
for(i = 0; i < ZP_COUNT; i++)
{
*zp++ = tmp;
}
}
Q: Is it possible to call the original and optimized function so that we get different results in zp?
A: Yes, let yp == zp.
4.
double f(double x)
{
assert(x != 0.);
return 1. / x;
}
Q: Can this function return inf (infinity)? Assume that floating point numbers are implemented according to IEEE 754 (the vast majority of machines). assert enabled (NDEBUG undefined).
A: Yes. It is enough to pass denormalized x, for example, 1e-309.
5.
int my_strlen(const char *x)
{
int res = 0;
while(*x)
{
res++;
x++;
}
return res;
}
Q: The above function should return the length of a null-terminated string. Find the mistake.
A: Using the int type to store the size of objects is erroneous: it is not guaranteed that int will be able to accommodate the size of any object. You should use size_t.
6.
#include
#include
int main()
{
const char *str = "hello";
size_t length = strlen(str);
size_t i;
for(i = length - 1; i >= 0; i--)
{
putchar(str[i]);
}
putchar('\n');
return 0;
}
Q: The cycle is eternal. Why?
A: size_t is an unsigned type. If i is unsigned, then i> = 0 is always satisfied.
7.
#include
void f(int *i, long *l)
{
printf("1. v=%ld\n", *l); /* (1) */
*i = 11; /* (2) */
printf("2. v=%ld\n", *l); /* (3) */
}
int main()
{
long a = 10;
f((int *) &a, &a);
printf("3. v=%ld\n", a);
return 0;
}
This program was compiled by two different compilers and launched on a little-endian machine. Got two different results:
1. v=10 2. v=11 3. v=11
1. v=10 2. v=10 3. v=11
Q: How to explain the second result?
A: This program has undefined behavior, namely, strict aliasing rules are violated. In line (2) int is changed, so we can assume that any long has not changed. (You cannot dereference a pointer that aliases another pointer of an incompatible type.) Therefore, the compiler can pass in line (3) the same long that was read during the execution of line (1).
8.
#include
int main()
{
int array[] = { 0, 1, 2 };
printf("%d %d %d\n", 10, (5, array[1, 2]), 10);
}
Q: Is this the correct code? If there is no undefined behavior, then what does it output?
A: Yes, the comma operator is used here. First, the left argument is calculated with a comma and discarded, then the right argument is calculated and used as the value of the entire operator. Conclusion: 10 2 10.
Note that the comma character in the function call (for example, f (a (), b ())) is not a comma operator and therefore does not guarantee the calculation order: a (), b () can be called in any order.
9.
unsigned int add(unsigned int a, unsigned int b)
{
return a + b;
}
Q: What is the result of add (UINT_MAX, 1)?
A: Overflow of unsigned numbers is defined, calculated modulo 2 ^ (CHAR_BIT * sizeof (unsigned int)). The result is 0.
10.
int add(int a, int b)
{
return a + b;
}
Q: What is the result of add (INT_MAX, 1)?
A: Sign number overflow - undefined behavior.
eleven.
int neg(int a)
{
return -a;
}
Q: Is undefined behavior possible here? If so, with what arguments?
A: neg (INT_MIN). If the computer represents negative numbers in the additional code (the English twos complement, the vast majority of machines), then the absolute value of INT_MIN is one more than the absolute value of INT_MAX. In this case, -INT_MIN causes a sign overflow - undefined behavior.
12.
int div(int a, int b)
{
assert(b != 0);
return a / b;
}
Q: Is undefined behavior possible here? If so, with what arguments?
A: If the computer represents negative numbers in the additional code, then div (INT_MIN, -1) - see the previous question.
- Dmitri Gribenko

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License .