main () {printf (& unix ["\ 021% six \ 012 \ 0"], (unix) ["have"] + "fun" -0x60);}

Original author: eric
  • Transfer

Having fun unraveling C code

Challenge: Before you get under the cut, compile the heading of the article in your head, what does it give in the output?

image

When I was once again looking through the Expert C programming book, I suddenly stumbled upon the light relief section in the international competition for the most complicated C code ( IOCCC ). This is a competition to write as unreadable code as possible. The fact that such contests are held for C probably says something about this language. I wanted to see the work of the participants of this competition. Not finding any information on the Internet, I decided to search for them myself.

The IOCCC was invented by Stephen Born when he decided to use the C preprocessor and write a Unix shell in C language, but more like Algol-68, with its explicit statement endings, for example:

if
  ...
fi 

He achieved this by doing:

#define IF if(
#define THEN ){
#define ELSE } else {
#define FI ;}

What allowed him to write like this:

IF *s2++ == 0
THEN return(0);
FI


[Publication Support - Edison , a company that develops an Electronic Transmission Service for Prisoners and has implemented a viral distribution of information .]

image

Expert C programming says the following:

Avoid any use of the C preprocessor that changes the base language.



One of the first winners in 1987 was David Korn, creator of the Korn shell (what is wrong with these shell-writers?), Who wrote just one line:

main(){printf(&unix["\021%six\012\0"], (unix)["have"]+"fun"-0x60);}

That's all. Try compiling this. What will be displayed?

This code will not run on Microsoft (hint!), But here is a link to an online compiler that will handle this task. A few lines were added there to make it work, but otherwise it’s the same.

The code only outputs:

unix

But why? There is something in the code that looks like an array with a name unix, but it has not been declared. Then unixis that a keyword? Does it somehow display the variable name?

I blindly tried to verify this by adding:

printf(unix);

And he brought me an error, saying that he printfaccepts char *, and not int.

When I deduced this variable as int, it became clear that its value is 1. This led me to think that it was overridden as if the code was compiled in a Unix system. Searching on gcc source code , I found it to be a run-time target specification . This explains why the code will not run on Windows.

unix- this is just 1. Having rewritten, we get:

main(){printf(&1["\021%six\012\0"], (1)["have"]+"fun"-0x60);}

So, unixit was not a variable name. But then how does 1 [] work? I have seen this before, and this is one of my favorite facts about the C language.

image

C originates in the BCPL language. Its creator, Dr. Martin Richards, wrote :

Indirect call operator! takes a pointer as an argument and returns the contents of the cell that it points to. If v is a pointer, then! (V + i) will access the cell with the address v + i. The binary version of the operator! defined so that v! i =! (v + i). v! i behaves like an indexed representation, where v is a one-dimensional array and i is an integer index. Note that in BCPL, v5 =! (V + 5) =! (5 + v) = 5! V. The same thing happens in the C language: v [5] = 5 [v].

In other words, indices simply add up with pointers, and since addition is commutative, the index operator is also commutative. Let's try to change this too:

int x[] = {1, 2, 3};
printf("%d\n%d\n", x[1], 1[x]);

Then what is there 1["\021%six\012\0"]? Having written in the usual form, we see the access to the array elements through the subscript operator: "\021%six\012\0"[1]. It’s atypical anyway, but it’s already clear that this is array[index], although, as a rule, string literals do not use it that way. But it works, so try the following:

printf("%c\n", "hello, world"[1]); 

Let's rewrite only the first array, while we deal with this.

main() {
  char str[] = "\021%six\012\0";
  printf(&str[1], (1)["have"]+"fun"-0x60);
}

Still works the same. Looking at it str, I thought about \0which is a null character (or a NUL character?). I thought that string literals in C have a null character by default. Let's see what happens if we remove it:

printf("%s", "\021%six\012");

Outputs:

%six

I use string formatting "%s"because the string I'm trying to output contains a formatting character %. (A small hint: do not output lines such as printf(myStr)when they have formatting characters. The output through is %sshown above.)

It seems to still work without \0. Maybe in some pre-ANSI C you had to add null characters to string literals yourself? I think not, because the other lines in the program do not have them. Or does it just look more confusing? Okay, let’s leave this one for now \0.

Since we stopped at this line, let's look at the rest of it. \xxx- representation of each character in the octal number system, \021- a certain control character, and\012- a line feed character or \n, as we used to see it, at the end of output lines.

Knowing that \021- this is just one symbol, we understand that str[1]there is %. Then &str[1]is the line starting with %. So the string can actually be simple %six\n, without a control character, which is not clear why it is needed here.

main() {
  char str[] = "%six\n";
  printf(str, (1)["have"]+"fun"-0x60);
}

The first line passed to printf, this is a formatting line, %smeans "put the next line instead of this." Since this line ends with ix, we can assume that the next one passed to the printfline should somehow look like un. We simply get rid of the array of characters that we used to pass the format string, and we get:

main() {
  printf("%six\n", (1)["have"]+"fun"-0x60);
}

In the next line we have (1)["have"]+"fun"-0x60. There is unwhat is contained in the word fun, so let's analyze it.

Again, we see this trick with indexing: (1)["have"]. Parentheses around 1 are not needed. Again, was this required in old C or was it made for more unreadability? "have"[1]- it is a. In hexadecimal representation, it looks like 0x61, subtract 0x60. Then stay 1+"fun".

Just as before, it "fun"stands for char *. Appendix 1 gives us a line starting with the second character, that is un. Then everything turns into this:

main() {
  printf("%six\n", "un");
}

Here is the readable code.

I like it when semantics play a big role in code complexity, that is, when, for example, they use a certain word unixto confuse you and make you think that it is overridden and somehow displays its name. The symbol \021is similar to inverted \012and may make you believe that it is necessary, although, in fact, is not used. There is also a formatting line %sixcontaining the word "six", apparently you would take% s not for formatting, but for something else.

Translation: Alena Karnaukhova

Read more



Also popular now: