
18th century Copiale Cipher cipher decrypted using statistical machine translation
- Transfer
More than 60 years ago, Warren Weaver, a pioneer in the field of machine translation, first proposed the use of the cryptanalysis technique for interpreting foreign language texts.
In a famous 1947 letter to the mathematician Norbert Wiener, he wrote: “It is quite natural to ask whether the problem of translation can be considered as a problem of cryptography. When I see a text in Russian, I say: “Actually it is written in English, but encoded by some strange characters. I’ll try to decipher it now. ”
This conjecture ultimately led to the development of a whole generation of statistical machine translation programs, such as Google Translate - and, not by chance, to the emergence of new tools for analyzing historical ciphers. ”
Now a group of Swedish and American linguists have used statistical machine translation techniques to crack one of the most difficult ciphers: Copiale Cipher , a handwritten 105-page manuscript from the late 18th century. Scientists published their work on the eve of the conference of the Association of Computational Linguistics in Portland.

Discovered among the scientific archives of East Germany, the artfully bound gold and green brocade volume contains 75,000 characters of text, in an incomprehensible combination of mysterious characters and Latin script. The name of the Copiale Cipher manuscript is assigned to one of only two unencrypted labels that are present in the document.
Kevin Knight, a specialist from the Institute of Information Sciences at the University of Southern California, together with colleagues Beata Megyesi and Christiane Schaefer from Uppsala University (Sweden) were able to decrypt the first 16 pages. They contain a detailed description of the ritual of a secret society, which was interested in eye surgery and ophthalmology. The first page of the manuscript The second and third pages of the manuscript Work began this year as a weekend hobby, said Dr. Knight during the interview and added: “I do not have much experience in cryptography. My studies are mainly related to computer linguistics and machine translation. ”


Without knowing the original language, the researchers made several blind assumptions before they began to test their guesses. Firstly, they suggested that all information is contained only in Latin characters (
in the illustration), that is, they just tried to ignore abstract characters. They took Latin characters and checked the text in 80 languages of the world.
When this approach failed, scientists discovered that the text was actually created by a substitution cipher - a cipher in which each character of the original is replaced with a different character. And they suggested that the original language is German, as the manuscript was found in Germany.
In the end, they came to the conclusion that Latin characters are actually the so-called "empty values" that are intended to mislead the decoder, and that some special characters indicate spaces between words. The second breakthrough was the discovery that the colon means doubling the previous consonant.
After that, the researchers used well-known machine translation techniques, such as analyzing the expected frequency of the characters, to suggest which characters are the equivalent of the letters of the German alphabet. First of all, they calculated which combination of characters corresponds to the combination of ch , which is often found in German .

When this turned out, the frequency analysis suggested which character corresponds to the letter t, which in German most often follows the combination of ch . And so on, step by step, all other characters were matched. Scientists were unable to decipher only large characters (
), which are probably the code designations of secret names and organizations.

“It turned out that we can apply many linguistic methods for cryptanalysis,” says Dr. Knight.
The result was praised by other experts: “The decoding of Copiale Cipher is an elegant work of Kevin Knight and his colleagues,” said Nick Pelling, a British software developer and security specialist who runs the Cipher Mysteries blog on cryptography news.
But although this cipher has become a notable success, Dr. Knight and his colleagues cannot rest on their laurels. They say with disappointment that so far many ancient books and whole languages that are of great historical value have remained undecrypted.
Copiale Cipher is interesting only to historians who study the spread of political ideas. Secret societies were in vogue in the 18th century, says Dr. Knight, and to some extent influenced the events of the French Revolution and the US War of Independence. Recently, Kevin Knight sent decrypted text to Copiale to Andreas Onnerfors, a historian from Lend University (Sweden), an expert on secret societies.
“When he saw the book and the decrypted version, he was extremely excited,” says Dr. Knight. - He found a political commentary at the end of the text, which spoke about inalienable human rights. It’s quite interesting that such things were discovered in such an early document. ”
Recent examples of ciphers still unrevealed are letters from a serial killer nicknamed the Zodiac sent to the California police in the 1960s and 1970s and a Kryptos sculpture with encrypted text located in front of the CIA's central office in Langley, the text on which is only partially decrypted.
But the main mystery for the cryptographic community, the real Grail of the cryptographic world, remains the manuscript of Voynich- A mysterious book written about 600 years ago by an unknown author in an unknown language using an unknown alphabet. It consists of 240 richly illustrated pages with text that challenges the world's best cryptographers. For a long time, experts considered this a hoax, but a recent radiocarbon analysis confirmed that the document was created at the beginning of the 15th century.
Together with a colleague from the University of Chicago, Dr. Knight this year published a detailed analysis of the manuscript, in which he does not answer the question of mystification, but provides evidence that the Voynich manuscript contains some structures of the natural language.
“This is the most mysterious manuscript in the world,” says Kevin Knight. - She is chock-full of patterns, and the one who created such a thing spent a huge amount of time on it. So it seems to me that this is probably a cipher. "
In a famous 1947 letter to the mathematician Norbert Wiener, he wrote: “It is quite natural to ask whether the problem of translation can be considered as a problem of cryptography. When I see a text in Russian, I say: “Actually it is written in English, but encoded by some strange characters. I’ll try to decipher it now. ”
This conjecture ultimately led to the development of a whole generation of statistical machine translation programs, such as Google Translate - and, not by chance, to the emergence of new tools for analyzing historical ciphers. ”
Now a group of Swedish and American linguists have used statistical machine translation techniques to crack one of the most difficult ciphers: Copiale Cipher , a handwritten 105-page manuscript from the late 18th century. Scientists published their work on the eve of the conference of the Association of Computational Linguistics in Portland.

Discovered among the scientific archives of East Germany, the artfully bound gold and green brocade volume contains 75,000 characters of text, in an incomprehensible combination of mysterious characters and Latin script. The name of the Copiale Cipher manuscript is assigned to one of only two unencrypted labels that are present in the document.
Kevin Knight, a specialist from the Institute of Information Sciences at the University of Southern California, together with colleagues Beata Megyesi and Christiane Schaefer from Uppsala University (Sweden) were able to decrypt the first 16 pages. They contain a detailed description of the ritual of a secret society, which was interested in eye surgery and ophthalmology. The first page of the manuscript The second and third pages of the manuscript Work began this year as a weekend hobby, said Dr. Knight during the interview and added: “I do not have much experience in cryptography. My studies are mainly related to computer linguistics and machine translation. ”


Without knowing the original language, the researchers made several blind assumptions before they began to test their guesses. Firstly, they suggested that all information is contained only in Latin characters (

When this approach failed, scientists discovered that the text was actually created by a substitution cipher - a cipher in which each character of the original is replaced with a different character. And they suggested that the original language is German, as the manuscript was found in Germany.
In the end, they came to the conclusion that Latin characters are actually the so-called "empty values" that are intended to mislead the decoder, and that some special characters indicate spaces between words. The second breakthrough was the discovery that the colon means doubling the previous consonant.
After that, the researchers used well-known machine translation techniques, such as analyzing the expected frequency of the characters, to suggest which characters are the equivalent of the letters of the German alphabet. First of all, they calculated which combination of characters corresponds to the combination of ch , which is often found in German .

When this turned out, the frequency analysis suggested which character corresponds to the letter t, which in German most often follows the combination of ch . And so on, step by step, all other characters were matched. Scientists were unable to decipher only large characters (


“It turned out that we can apply many linguistic methods for cryptanalysis,” says Dr. Knight.
The result was praised by other experts: “The decoding of Copiale Cipher is an elegant work of Kevin Knight and his colleagues,” said Nick Pelling, a British software developer and security specialist who runs the Cipher Mysteries blog on cryptography news.
But although this cipher has become a notable success, Dr. Knight and his colleagues cannot rest on their laurels. They say with disappointment that so far many ancient books and whole languages that are of great historical value have remained undecrypted.
Copiale Cipher is interesting only to historians who study the spread of political ideas. Secret societies were in vogue in the 18th century, says Dr. Knight, and to some extent influenced the events of the French Revolution and the US War of Independence. Recently, Kevin Knight sent decrypted text to Copiale to Andreas Onnerfors, a historian from Lend University (Sweden), an expert on secret societies.
“When he saw the book and the decrypted version, he was extremely excited,” says Dr. Knight. - He found a political commentary at the end of the text, which spoke about inalienable human rights. It’s quite interesting that such things were discovered in such an early document. ”
Recent examples of ciphers still unrevealed are letters from a serial killer nicknamed the Zodiac sent to the California police in the 1960s and 1970s and a Kryptos sculpture with encrypted text located in front of the CIA's central office in Langley, the text on which is only partially decrypted.
But the main mystery for the cryptographic community, the real Grail of the cryptographic world, remains the manuscript of Voynich- A mysterious book written about 600 years ago by an unknown author in an unknown language using an unknown alphabet. It consists of 240 richly illustrated pages with text that challenges the world's best cryptographers. For a long time, experts considered this a hoax, but a recent radiocarbon analysis confirmed that the document was created at the beginning of the 15th century.
Together with a colleague from the University of Chicago, Dr. Knight this year published a detailed analysis of the manuscript, in which he does not answer the question of mystification, but provides evidence that the Voynich manuscript contains some structures of the natural language.
“This is the most mysterious manuscript in the world,” says Kevin Knight. - She is chock-full of patterns, and the one who created such a thing spent a huge amount of time on it. So it seems to me that this is probably a cipher. "