gribozavr January 15, 2012 at 23:28

I don't know si

The purpose of this article is to get everyone, especially C programmers, to say “I don't know C”.
I want to show that dark corners in C are much closer than it seems and even trivial lines of code carry undefined behavior.

The article is organized as a set of questions. Answers are written in white. All examples are separate source code files.

1.

int i;
int i = 10;

Q: Is this the correct code? (Will there be an error related to the fact that the variable is defined twice? I remind you that this is a separate source code file, not at the function level or compound statement)

A: Yes, this is the correct code. The first line is the tentative definition, which becomes the declaration after the compiler has processed the definition (second line).

2.

extern void bar(void);
void foo(int *x)
{
  int y = *x;  /* (1) */
  if(!x)       /* (2) */
  {
    return;    /* (3) */
  }
  bar();
  return;
}

Q: It turned out that bar () is called even when x is a null pointer (and the program does not crash). Optimizer error or is everything correct?

A: Yes, everything is correct. If x is a null pointer, then undefined behavior appears in line (1), and no one is obligated to the programmer: the program is not obliged to either fall in line (1) or make a return in line (2) if the line ( 1). If we talk about what rules the compiler was guided by, then everything happened like that. After analyzing the string (1), the compiler considers that x cannot be a null pointer and deletes (2) and (3) as inaccessible code (dead code elimination). The variable y is deleted as unused and since the type * x is not qualified volatile, then the reading from memory is also deleted.

This is how an unused variable removed the check for a null pointer.

3.
There was such a function:

#define ZP_COUNT 10
void func_original(int *xp, int *yp, int *zp)
{
  int i;
  for(i = 0; i < ZP_COUNT; i++)
  {
    *zp++ = *xp + *yp;
  }
}

They wanted to optimize it like this:

void func_optimized(int *xp, int *yp, int *zp)
{
  int tmp = *xp + *yp;
  int i;
  for(i = 0; i < ZP_COUNT; i++)
  {
    *zp++ = tmp;
  }
}

Q: Is it possible to call the original and optimized function so that we get different results in zp?

A: Yes, let yp == zp.

4.

double f(double x)
{
  assert(x != 0.);
  return 1. / x;
}

Q: Can this function return inf (infinity)? Assume that floating point numbers are implemented according to IEEE 754 (the vast majority of machines). assert enabled (NDEBUG undefined).

A: Yes. It is enough to pass denormalized x, for example, 1e-309.

5.

int my_strlen(const char *x)
{
  int res = 0;
  while(*x)
  {
    res++;
    x++;
  }
  return res;
}

Q: The above function should return the length of a null-terminated string. Find the mistake.

A: Using the int type to store the size of objects is erroneous: it is not guaranteed that int will be able to accommodate the size of any object. You should use size_t.

6.

#include 
#include 
int main()
{
  const char *str = "hello";
  size_t length = strlen(str);
  size_t i;
  for(i = length - 1; i >= 0; i--)
  {
    putchar(str[i]);
  }
  putchar('\n');
  return 0;
}

Q: The cycle is eternal. Why?

A: size_t is an unsigned type. If i is unsigned, then i> = 0 is always satisfied.

7.

#include 
void f(int *i, long *l)
{
  printf("1. v=%ld\n", *l); /* (1) */
  *i = 11;                  /* (2) */
  printf("2. v=%ld\n", *l); /* (3) */
}
int main()
{
  long a = 10;
  f((int *) &a, &a);
  printf("3. v=%ld\n", a);
  return 0;
}

This program was compiled by two different compilers and launched on a little-endian machine. Got two different results:

1. v=10    2. v=11    3. v=11
1. v=10    2. v=10    3. v=11

Q: How to explain the second result?

A: This program has undefined behavior, namely, strict aliasing rules are violated. In line (2) int is changed, so we can assume that any long has not changed. (You cannot dereference a pointer that aliases another pointer of an incompatible type.) Therefore, the compiler can pass in line (3) the same long that was read during the execution of line (1).

8.

#include 
int main()
{
  int array[] = { 0, 1, 2 };
  printf("%d %d %d\n", 10, (5, array[1, 2]), 10);
}

Q: Is this the correct code? If there is no undefined behavior, then what does it output?

A: Yes, the comma operator is used here. First, the left argument is calculated with a comma and discarded, then the right argument is calculated and used as the value of the entire operator. Conclusion: 10 2 10.

Note that the comma character in the function call (for example, f (a (), b ())) is not a comma operator and therefore does not guarantee the calculation order: a (), b () can be called in any order.

9.

unsigned int add(unsigned int a, unsigned int b)
{
  return a + b;
}

Q: What is the result of add (UINT_MAX, 1)?

A: Overflow of unsigned numbers is defined, calculated modulo 2 ^ (CHAR_BIT * sizeof (unsigned int)). The result is 0.

10.

int add(int a, int b)
{
  return a + b;
}

Q: What is the result of add (INT_MAX, 1)?

A: Sign number overflow - undefined behavior.

eleven.

int neg(int a)
{
  return -a;
}

Q: Is undefined behavior possible here? If so, with what arguments?

A: neg (INT_MIN). If the computer represents negative numbers in the additional code (the English twos complement, the vast majority of machines), then the absolute value of INT_MIN is one more than the absolute value of INT_MAX. In this case, -INT_MIN causes a sign overflow - undefined behavior.

12.

int div(int a, int b)
{
  assert(b != 0);
  return a / b;
}

Q: Is undefined behavior possible here? If so, with what arguments?

A: If the computer represents negative numbers in the additional code, then div (INT_MIN, -1) - see the previous question.

- Dmitri Gribenko

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License .

Tags:

I don't know si

Also popular now: