Data Compression Paradoxes

    The task of data compression in its simplest form can relate to numbers and their notation. Numbers can be denoted by numerals ( “eleven” for the number 11), mathematical expressions ( “two in the twentieth” for 1048576), string expressions ( “five nines” for 99999), proper names ( “the number of the beast” for 666, “Turing's death year” " For 1954), or their arbitrary combinations. Any designation by which the interlocutor can unambiguously determine what kind of speech is suitable is suitable. Obviously, informing the interlocutor of “factorial eight” is more effective than the equivalent designation “forty thousand three hundred twenty”. This raises the logical question: what is the shortest designation for a given number?

    The philosopher Bertrand Russell in 1908 published the “Berry Paradox,” which addresses the issue of notation of numbers on the opposite side: what is the smallest number to indicate which eighty letters are not enough?
    Such a number must exist: out of eighty Russian letters and spaces you can make a total of 34 80 designations, which means that using eighty letters you can designate no more than 34 80 numbers. Therefore, a certain number, not more than 34 80 , cannot be designated in this way.

    So, the number will correspond to this number"The smallest number to indicate which eighty letters is not enough" , in which there are only 78 letters! On the one hand, this number must exist; on the other hand, if this number exists, then its designation does not correspond to it. Paradox!

    The easiest way to dismiss this paradox is to refer to the informality of verbal designations. Like, if only a specific set of expressions were allowed in the notation, then “the smallest number for which eighty letters are not enough” would not be a valid designation, while practically useful designations of the “factorial eight” type would remain valid.

    Are there formal ways to describe the sequence (algorithm) of actions on numbers? There are, and in abundance - they are called programming languages. Instead of verbal notation, we will use programs (for example, in Python) that print the necessary numbers. For example, for five nines, a program is suitable print("9"*5). We will continue to be interested in the shortest program for a given number. The length of such a program is called the Kolmogorov complexity of the number; This is the theoretical limit to which a given number can be compressed.

    Instead of the Berry paradox, one can now consider a similar one: what is the smallest number for which a kilobyte program is not enough?

    We will argue the same way as before: there are 256 1024kilobyte texts, so kilobyte programs can output no more than 256 1024 numbers. This means that a certain number, no greater than 256 1024 , cannot be deduced in this way.

    But we write in Python a program that generates all possible kilobyte texts, launches them for execution, and if they print a certain number, it adds this number to the achievable dictionary. After checking all 256 1024 possibilities, no matter how long it takes - the program looks for the smallest number that is not in the dictionary and displays this number. It seems obvious that such a program will fit in a kilobyte of code - and will output the very number that cannot be displayed in a kilobyte program!

    What is the catch now? It can no longer be attributed to informality of notation!

    If you are confused that our program will require an astronomical amount of memory to work - a dictionary (or bitmap) of 256 1024 elements - then you can do the same without it: for each of 256 1024 numbers, iterate through all 256 1024 possible programs until a suitable one is found. It doesn’t matter that such an enumeration will last a very long time: after checking less than (256 1024 ) 2 pairs from the number and the program, it will end and find that very cherished number.

    Or will it not end? Indeed, among all the programs that will be tried, there will bewhile True: pass(and its functional counterparts) - the matter will not go further than checking such a program!

    Unlike the Berry paradox, where the catch was in informality of notation, in the second case we have a well-disguised reformulation of the “stopping problem” . The fact is that according to the program it is impossible to determine its conclusion in a finite time. In particular, Kolmogorov complexity is not computable : there is no algorithm that would allow for a given number to find the length of the shortest program that outputs this number; which means that there is no solution for the Berry problem - to find for a given number the length of the shortest word designation.

    Also popular now: