What Every C Programmer Should Know About Undefined Behavior. Part 2/3

Original author: Chris Lattner
  • Transfer
Part 1
Part 2
Part 3

In the first part of our series, we discussed what undefined behavior is and how it allows C and C ++ compilers to generate higher-performance applications than “safe” languages. In this post, we will discuss what “unsafe” C really is, explaining some completely unexpected effects caused by undefined behavior. In the third part, we will discuss how “friendly” compilers can mitigate some of these effects, even if they are not required to.

I like to call it “Why Uncertain Behavior Scares and Cares C Programmers Frequently”.

image

Interaction between compiler optimizations leads to unexpected results


The modern optimizing compiler contains many optimizations that work in a certain order, sometimes repeated several times, and this order may change as the compiler develops (i.e., when new releases are released).

Also, different compilers have significantly different optimizers. Since optimizations work like different stages of code conversion, various effects can occur when the previous stages changed the code.

Consider such a stupid example (simplified from an example of a real bug in the Linux kernel) to make it more specific:

void contains_null_check(int *P) {
  int dead = *P;
  if (P == 0)
    return;
  *P = 4;
}

In this example, the code "explicitly" checks for a null pointer. If the compiler starts the “Dead Code Elimination” pass before the redundant Null Check Elimination pass, then we will see that two code conversions will be performed:

void contains_null_check_after_DCE(int *P) {
  //int dead = *P;     // deleted by the optimizer.
  if (P == 0)
    return;
  *P = 4;
}

and then:

void contains_null_check_after_DCE_and_RNCE(int *P) {
  if (P == 0)   // Null check not redundant, and is kept.
    return;
  *P = 4;
}

However, the optimizer may be structured differently, and may run RNCE before DCE. Then the following two conversions will be performed:

void contains_null_check_after_RNCE(int *P) {
  int dead = *P;
  if (false)  // P was dereferenced by this point, so it can't be null 
    return;
  *P = 4;
}

and then the redundant code is removed:

void contains_null_check_after_RNCE_and_DCE(int *P) {
  //int dead = *P;
  //if (false)
  //  return;
  *P = 4;
}

For many programmers, removing a null check from a function would be very unexpected (and they would blame the compiler for a bug). However, both options, and contains_null_check_after_DCE_and_RNCE, and contains_null_check_after_RNCE_and_DCE, are a perfectly valid optimized form of contains_null_check in accordance with the standard, and both optimizations are important for improving the performance of various applications.

Although this is a fairly simple and far-fetched example, such things happen all the time with inline functions. Inline functions open up many possibilities for subsequent optimizations. This means that if the optimizer decides to inline the function, other local optimizations will be made that change the behavior of the code. This is perfectly correct, both from the point of view of the standard and from the practical point of view, to increase productivity.

Indefinite behavior and safety should not be mixed


A family of C-like programming languages ​​is used for a wide range of critical safe code, such as kernels, setuid daemons, web browsers, etc. This code works with "hostile" input data and bugs in it can lead to any kind of security problem. One of C's best-known advantages is that it is relatively easy to understand what is happening just by reading the code.

However, indefinite behavior robs the language of this property. For example, most programmers will assume that “contains_null_check” in the example above checks for null. Although this example is not so scary (this code can destroy something if null is passed to it, which is relatively easy to find when debugging) there are a large number of quite reasonable looking pieces of C code that are actually completely incorrect. This problem affects many projects (including Linux Kernel, OpenSSL, glibc, etc.) and even forced CERT to publish a notice about the GCC vulnerability (although I personally think that all widely used optimizing C compilers are vulnerable, not just GCC).

Consider an example. Imagine carefully written C code:

void process_something(int size) {
  // Catch integer overflow.
  if (size > size+1)
    abort();
  ...
  // Error checking from this code elided.
  char *string = malloc(size+1);
  read(fd, string, size);
  string[size] = 0;
  do_something(string);
  free(string);
}

This code performs a check to make sure that enough memory is allocated to read from the file (since you need to add a terminating zero), and exits if the integer overflows. However, in this example, the compiler can (in accordance with the standard) remove the check. This means that the compiler can turn the code into this:

void process_something(int *data, int size) {
  char *string = malloc(size+1);
  read(fd, string, size);
  string[size] = 0;
  do_something(string);
  free(string);
}

When compilation takes place on a 64-bit platform, there is a chance of a bug when the “size” is INT_MAX (perhaps this is the size of the file on disk). Let's see how terrible this is: nothing is detected when checking the code, since checking the variable for overflow looks reasonable. When testing the code, there are no problems, unless you specifically test this execution path. It seems that the code can be considered safe until someone decides to exploit the vulnerability. This is a very unexpected and rather awful class of bugs. Fortunately, it’s easy to fix it: just use “size == INT_MAX” or something similar.

It turns out that overflowing the whole is a security issue for many reasons. Even if you use fully defined integer arithmetic (either using -fwrapv or using unsigned integers), there remains a class of possible bugs related to overflowing integers. Fortunately, these bugs are visible in the code and are well known to security auditors.

Debugging optimized code can be pointless


Some people (for example, low-level embedded programmers who like to watch generated machine code) work with constantly turned on optimization. Since the code often has bugs at the beginning of development, these people observe a disproportionate amount of unexpected optimizations that can lead to hard-to-debug problems when running the program. For example, by accidentally skipping “i = 0” in the example “zero_array” in the example from the first article, we allow the compiler to completely remove the loop (turning zero_array into “return;”) because this will be using an uninitialized variable.

Another interesting case can occur when there is a global function pointer. A simplified example looks like this:

static void (*FP)() = 0;
static void impl() {
  printf("hello\n");
}
void set() {
  FP = impl;
}
void call() {
  FP();
}

which clang optimizes in:

void set() {}
void call() {
  printf("hello\n");
}

It can do this because the call to the null pointer is not defined, which suggests that set () should be called before call (). In this case, the developer forgot to call set (), the program does not crash on dereferencing null, and the code breaks if someone else makes a build build.

Such bugs are tracked: if you suspect something suspicious, try collecting with -O0, and the compiler will most likely not perform optimizations.

"Working" code that uses undefined behavior can break if something changes in the compiler.

We examined many cases where code that “seems to work” suddenly breaks down when a newer version of LLVM is used for compilation, or when an application is ported from GCC to LLVM. Although LLVM itself can have one or two bugs, most often this happens because hidden bugs in the application appeared due to the compiler. This can happen in many different cases, here are two examples:

1. an uninitialized variable that used to take a value of zero by fortuitous luck, and is now placed in another register that does not contain zero. This behavior is often manifested when changes are made in the register allocator.

2. An array overflow on the stack overwrites the actual variables instead of the "dead" ones. This happens when the compiler reorders the variables on the stack, or more aggressively packs variables with non-overlapping lifetimes into the stack space.

An important and frightening thing is to find that almost any optimization based on undefined behavior can lead to bugs at any time in the future. Inline functions, loop unfolding, and other optimizations will work better, and a significant part of them is done through secondary optimizations, as shown above.

This upsets me very much, partly due to the fact that the compiler almost inevitably starts to blame, and also because the huge amount of C-code is a time bomb waiting to explode. And it's even worse because ...

There is no reliable way to make sure that a large code base does not contain UB


This is a very bad situation, because in fact there is no reliable way to determine that in a large-scale application there is no UB, and that it will not break in the future. There are many useful tools that can help you find some bugs, but nothing gives you complete confidence that your code will not break in the future. Let's look at some options, their strengths and weaknesses.

1. Valgrind is a fantastic tool for finding all kinds of uninitialized variables and other memory bugs. Valgrind is limited in that it is quite slow, and can only search for bugs that already exist in the generated machine code (and cannot find what has been removed by the optimizer), and does not know that the source code was written in C (and therefore can not findbugs such as a shift by an amount exceeding the size of the variable, or an overflow of a signed integer).

2. Clang has the experimental mode -fcatch-undefined-behavior, which inserts runtime checks to look for violations, such as going beyond the boundaries of the shift range, some simple errors of going beyond the boundaries of arrays, etc. These checks are limited because they slow down the application, and cannot help with dereferencing an arbitrary pointer (and Valgrind can), but they can find other important bugs. Clang also fully supports the -ftrapv flag (not to be confused with -fwrapv), with which you can catch runtime bugs with overflowing signed integers (GCC also has such a flag, but it is very unreliable and buggy, in my experience). Here is a small demo of -fcatch-undefined-behavior:

$ cat t.c
int foo(int i) {
  int x[2];
  x[i] = 12;
  return x[i];
}
int main() {
  return foo(2);
}
$ clang t.c 
$ ./a.out 
$ clang t.c -fcatch-undefined-behavior 
$ ./a.out 
Illegal instruction

3. Compiler messages are good for finding some classes of bugs, such as uninitialized variables and simple integer overflows. There are two main limitations: 1) there is no dynamic information about the code execution and 2) the analysis should be very fast, because any analysis increases the compilation time.

4. The Clang static analyzer does a much deeper analysis, trying to find bugs, including the use of UB, such as dereferencing a null pointer.

You can think of it as an enhanced analysis tool compared to compiler warnings, since it has no time limits, like regular warnings. The main disadvantage of the static analyzer is that it: 1) does not have dynamic information about the program operation process and 2) is not integrated into the usual development process (although its integration with XCode 3.2. And later is fantastic).

5. The LLVM “Klee” subproject uses symbolic analysis to “try every possible path” by code to find bugs in the code and generate a test. This is a wonderful little project that is mainly limited by the fact that it is impractical to run it on large applications.

6. Although I have never tried it, the C-Semantics toolfrom Chuckie Ellison and Grigori Rosa is very interesting in that he can find some classes of bugs (such as violations of sequence points). It is still in a research prototype state, but it can be useful for finding bugs in (small and limited) programs. I recommend reading John Reger's post in order to get more information.

So, we have many tools for finding bugs, but there is no good way to prove that the application does not have UB. Imagine that there are tons of bugs in real applications, and that C is used in a wide range of critical applications, and this is scary. In our last article, I will look at the various options that the C compiler has in order to handle UB, especially paying attention to Clang.

Also popular now: