Arrays, pointers, and other quantum phenomena around us

    I do not want to say that we all live in a matrix, but the same sound of a rolling ball is suspiciously used to simulate neighbors.



    This post is fully consistent with its title. To begin with, it will be shown that, contrary to the assertion of the standard, as well as the classics of C Kernigan and Ritchie, the use of array indices is absolutely not equivalent to the use of corresponding pointers, and the choice of an epigraph will be clear at the very end. And yes - the middle of the post is also not empty.

    It all started with a refutation of Fermat's theorem and finding a number with a cosine greater than one from a strongly-thinking post about undefined behavior of the compiler (undefined behavior or UB). That is, there were several such posts, but in this ( translation into habrahabr ) the following simple code and far from obvious result of its work were given:

    int table[4];
     bool exists_in_table(int v) { 
    for (int i = 0; i <= 4; i++) { 
    if (table[i] == v) return true;
     } return false;
     }

    There is an error in the loop condition: "<= 4" instead of "<4".
    Now if you fill the table with zeros, and even make sure that the illegal fifth element of this array is 0, and then carefully look for 42 in the table, i.e. call exists_in_table (42) , then it will certainly be found there - in the case of a well-optimizing C compiler, this function will return true in full accordance with the language standard!

    The explanation is simple: exists_in_table () either must return true on one of the first four iterations, or it will read table [4] , which is UB, and in this case exists_in_table () can do anything according to the C language standard - including , and returnto true . That is, the compiler can optimize all exists_in_table () code to return true;

    And this is not an abstract theory, but a concrete practice. This is how the code is optimized, i.e. exists_in_table (42) returns true when compiled, for example, gcc 6.3 (via MinGW) using the -O2 or -O3 options. All the results below relate to this particular configuration.

    Now, let's at least once in our lives read the standard C.
    Yes, in the section on arrays, he honestly warns that access to elements outside the boundaries of the array is UB. But, as everyone who has ever studied C knows, access to array elements is possible through pointers, or, in the language of the ISO / IEC 9899 standard, “the definition of the index operator [] is as follows: E1 [E2] is equivalent to (* ((E1) + (E2))). ”(“ The definition of the subscript operator [] is that E1 [E2] is identical to (* ((E1) + (E2)) ”).

    And here something interesting is discovered.“ Identical "- that is, identically, interchangeably under any circumstances. And even with UB? Let's find out what happens if, in the example above, pointers are used to access array elements.
    But first, let's make the code really safe, for which we wrap the table array in a structure so that the place allocated by us is guaranteed next to it, and going beyond the border of the array doesn’t bother anyone, but, on the contrary, helps - for example, check the contents of the neighboring array in one cycle .

    struct taddummy{
    int table[4];
    int dummy[4];
    } tadam;
    int exists_in_table(int v)
    {
        for (int i = 0; i < 8; i++) {
            if (tadam.table[i] == v) return true;
        }
        return false;
    }
    int main()
    {
    	for (int i = 0; i < 8; i++) {
             tadam.table[i] =0;
    	}
    printf("%d\n",exists_in_table(42));
    }
    

    You can believe or verify that this code will also return true - from the point of view of UB, nothing has changed in it.

    And now, in accordance with the standard, replace tadam.table [i] with * (tadam.table + i) .

    int exists_in_table(int v)
    {
        for (int i = 0; i < 8; i++) {
            if (*(tadam.table+i) == v) return true;
        }
        return false;
    }
    int main()
    {
    	for (int i = 0; i < 8; i++) {
              *(tadam.table+i)=0;
    	}
    printf("%d\n",exists_in_table(42));
    }
    

    Run the program and make sure that exists_in_table now returns 0, that is, the number 42 in our arrays is really not. In reality, everything is not as it really is.

    And this is simply explained - according to the C standard, the use and dereferencing of pointers also has their own UB cases - accessing the allocated memory, dereferencing a null pointer, etc., etc. ... but there is no case of “going beyond the boundaries of the array” among them - pointers they don’t know anything about arrays, and they behave in the expected way.

    That is, contrary to a common misconception, the array index and the corresponding pointer for the compiler are not at all the same thing, it is rather like an electron that sometimes exhibits the properties of a particle (within the array) and sometimes waves (in the UB region )

    Moreover, this analogy can be further developed: if we are surprised by the presence of 42 in the array, we ask exists_in_table to specify a specific element number with a value of 42, respectively changing its code to

    if (table[i] == v) return i;

    or, leaving this line unchanged, just insert the value of the corresponding table [i] element on the screen before it , then when the array leaves the boundaries, the UB effect disappears completely - the function will return false, which is absolutely equivalent to having an observer in an experiment with electrons - in his presence no waves, but only hardcore, only particles.

    Now, if you go to a higher energy level - get distracted from arrays and pointers and even UB, and look at the behavior of the compiler as a whole, then this effect will continue.

    For example, part of my work at Intel is to optimize the performance of critical components in third-party code. Since this code is a trade secret, usuallyafter the completion of the project they kill me; I get from them only a function of several tens of lines, which is proposed to be accelerated. And, since usually the function itself works for such an insignificant period of time that no profiler can show it with sufficient accuracy, and it is simply a bottleneck in the code because of millions of calls, I have to use it to evaluate the optimization result something like

    	i1 = __rdtsc();
    	for (i = 0; i < 100000;i++) {
    		performance_test();
    	}
    	i2 = __rdtsc();

    But it’s “something of type”, not this particular code. If we use the optimizing compiler for the code fragment above, it will simply reduce the entire cycle to a single iteration, throwing out all the others, and the execution time will be far from expected. For the honest execution of the whole cycle, we must either output something or simply refer to a volatile variable, which is also a form of interaction with the outside world:

    volatile res = performance_test();// приглашенный наблюдатель. 

    That is, in the person of an optimizing compiler, we have a system that behaves according to strange, counterintuitive laws (but laws!), For which only external manifestations are important, and the internal state can be considered any, even indefinite, and most importantly - with any attempt to observe this internal state, the magic disappears and the system begins to behave boringly and predictably, as in the textbook.



    But this is quantum mechanics from the word “completely” - from the Schrödinger cat (number 42), which we are looking for beyond the boundaries of the array, and until we look inside, we can assume that there is anything there, to some entangled particles and quantum teleportation (e.g. calling a non-callable function) And, finally, the appearance of a quantum-mechanical “observer”, that is, a situation where when the code interacts with something (external), the behavior of which is not known to the optimizer, decoherence occurs, and the system collapses to the classical one - deep optimization of this place, and therefore, “ magic "becomes impossible.

    It was this pair of previous paragraphs that I began to tell when I met dibr . Dima and I are connected by a stormy youth , so I immediately tried to interrupt him, saying that I was in the know - I also completely independently and repeatedly thought about it. But stopping Dima is not so easy, and sometimes it is very cool.

    So, continued dibr- “this means that quantum mechanics in our universe turned out to be possible only because our world” .... And then I came to the end of the phrase that I repeated with him in chorus - “was assembled with deep optimization. That is, we live in the final release, and not in the debug version! ”

    Which, on the one hand, of course, is good - although the behavior of the debugging version is completely predictable, but in it, in the process of work, usually there are an order of magnitude more bugs. But, on the other hand, it is perplexing: firstly, optimization is needed where iron resources are enough “close to” without a reserve, and our future, given the possible development of civilization, is very vague. And secondly, it turns out that according to the documentation of the development of the Universe, it was assumed that only classical mechanics would be available to us - that is, nothing more than arithmometers, and the modern rapid development of electronics in general and computers in particular, was caused by the fact that we discovered internal "features Of the Universe ”, associated with deep optimization, were able to understand their system, and create, in fact, exploits that use them to our advantage.

    In this concept, it becomes clear why quantum mechanics is so non-obvious and “inhuman” - it should not be like this, because it is a “hack” of the insides of the universe not intended for our eyes, and the standard interface to the world is classical mechanics, in which everything is simple and is logical.

    And, in conclusion, I will quote Dmitry again: “And now the Developer of the Universe is probably sitting at the terminal, in a stupor of what is happening (he was expecting a sluggishly developing civilization with classical physics, and not this explosive development and“ breaking the universe ”) , but the process does not interrupt: firstly, a lot of calculating resources have already been invested, it’s a pity to throw it away, and secondly, the development branch is interesting, you can observe it, you see a non-trivial result for an article on universology, it turns out, again - when compiling the next universe it will be clear what to avoid."

    Also popular now: