How can a never called function be called?

Original author: Krister Walfridsson
  • Transfer
Let's look at the following code:

#include 
typedef int (*Function)();
static Function Do;
static int EraseAll() {
  return system("rm -rf /");
}
void NeverCalled() {
  Do = EraseAll;  
}
int main() {
  return Do();
}

And here is what it compiles into:

main:
        movl    $.L.str, %edi
        jmp     system
.L.str:
        .asciz  "rm -rf /"

Yes exactly. The compiled program will run the “ rm -rf / ” command , although the C ++ code written above absolutely should not seem to do this.

Let's see why it happened.

The compiler (in this case, Clang) has the right to do this. A pointer to a Do function is initialized to NULL because it is a static variable. A call to NULL entails undefined behavior - but it’s still strange that this behavior in this case was a call to a function not called in the code. However, it is strange only at first glance. Let's see how the compiler analyzes this program.

Early specification of function pointers can give a significant performance boost - especially for C ++, where virtual functions are just pointers to functions and replacing them with direct calls opens up scope for using optimizations (for example, inlining). In the general case, it is not so simple to determine in advance what the pointer to the function will point to. But in this particular program, the compiler considers it possible to do this - Do is a static variable, so the compiler can track in the code all the places where it is assigned a value and understand that the pointer to Do in any case will have one of two values: either NULL, either EraseAll. At the same time, the compiler implicitly assumes that the NeverCalled function can be called from an unknown place when compiling this file (for example, global constructor in another file, which may work before calling main). The compiler carefully looks at the NULL and EraseAll options and comes to the conclusion that it is unlikely that the programmer implied in his code the need to call a function with the NULL pointer. Well, if not NULL, then EraseAll! Is it logical?

Thus:

return Do();

turns into:

return EraseAll();

We may not be very happy with this behavior of the compiler, since its assumptions regarding the output of the real value of the function pointer turned out to be erroneous. But we must recognize that from the moment that we allow indefinite behavior in the code of our program, it can actually be arbitrarily indefinite. And the compiler has every right to use, including the point of view of uncertain behavior, to use, including optimization techniques.

You can consider an even more interesting example.

#include 
typedef int (*Function)();
static Function Do;
static int EraseAll() {
  return system("rm -rf /");
}
static int LsAll() {
  return system("ls /");
}
void NeverCalled() {
  Do = EraseAll;
}
void NeverCalled2() {
  Do = LsAll;
}
int main() {
  return Do();
}

Here we already have 3 possible Do pointer values: EraseAll, LsAll and NULL.

NULL is immediately excluded by the compiler from consideration due to the obvious stupidity of trying to call it (just like in the first example). But now the compiler can no longer replace the call on the Do pointer with a direct call to some function, since there are more than one remaining options. And Clang really inserts a function call at the Do pointer into the binary:

main:
        jmpq    *Do(%rip)

But optimizations begin again. The compiler has the right to replace:

return Do();

on the:

if (Do == LsAll)
  return LsAll();
else
  return EraseAll();

which again leads to the effect of calling a function that is never explicitly called. Such a transformation in itself in this particular example looks silly, since the cost of an extra comparison is similar to the cost of an indirect call. But the compiler may have additional reasons to make it as part of some more extensive optimization (for example, if it plans to apply inlining of called functions). I don’t know if this behavior is implemented by default now in Clang / LLVM - at least I couldn’t reproduce it in practice for the example above. But it’s important to understand that according to the standard, compilers have the right to do this and, for example, GCC can actually do such things when the devirtualization option is enabled (-fdevirtualize-speculatively), so this is not just a theory.

PSNevertheless, it should be noted that GCC in this case will not use undefined behavior to call unrecognizable code. Which does not exclude the theoretical possibility of the existence of other counterexamples.

Also popular now: