The DNA saved the operating system and video, and then read without errors

    image

    Soon, humanity will generate so much data that the usual storage will no longer cope. To solve this problem, scientists turned to a virtually unlimited natural repository of information - DNA. According to researchers, DNA is an ideal storage medium, since it is ultra-compact and can retain its properties for hundreds of thousands of years if it is provided with appropriate storage conditions. This is evidenced by the recent recovery of DNA from the bones of a 43-thousand-year-old human ancestor, found in Spain’s caves.

    In a new study, scientists from Columbia University and the New York Genome Center ( NYGC) demonstrated that the algorithm designed to stream video on a smartphone can almost completely unlock the potential of DNA in storing and compressing additional information in four nucleotide bases.


    The idea and general considerations about the possibilities of recording, storing and searching for information in DNA molecules belong to Mikhail Neiman , a Soviet physicist and physicist. In 1964, the magazine "Radio Engineering" published material that described the technology of this process and the data storage device - Neumann oligonucleotides (MNeimON).

    In 2012, geneticists from Harvard University managed to encode a draft book of 53.4 thousand words, 11 images and one program. They found out that 5.5 petabytes of data can be stored in each cubic millimeter of DNA. A year later, researchers at the European Institute of Bioinformatics succeededsave and then fully extract and reproduce about 0.6 megabytes of text and video files: 154 Shakespeare's sonnets, a fragment of Martin I Lewter King’s famous “I Have a Dream” 26 seconds long, scientific work on the DNA structure of James Watson and Francis Crick, photographs of the EBI headquarters in Hinkston and a file describing data conversion methods. All DNA files were reproduced with an accuracy varying between 99.99% and 100%.

    Yaniv Erlich and his colleague Dina Zielinski, NYGC researcher selected six files for encoding and writing to DNA - the KolibriOS computer operating system, the 1896 French film Arriving at La Ciotta Station, code 50 Amazon gift card, computer virus, images fromplates of "Pioneer" and the study of Claude Shannon in the field of information theory in 1948.

    Scientists have collected these files into one, and then divided the data into short lines of binary code. Using fountain codes , they randomly packed the strings into fountain “drops” —blocks and converted 00, 01, 10, 11 combinations into four nucleotide bases: adenine (A), cytosine (C), guanine (G), and thymine (T ). To then put these blocks together, the team of scientists added labels for each “drop”.

    In total, the researchers generated about 72 thousand such DNA chains, each of which contained about 200 bases. They gathered this information into a text file and sent it to San Francisco, where a DNA synthesis startup Twist Bioscience turned digital data into biological data. Two weeks later, the Ehrlich team received a tube with DNA molecules.

    Using sequencing technologies to read DNA strands and special software to translate the genetic code back into a binary file, they successfully restored the files. How much is reading and writing, scientists have not yet clarified.

    The research team, led by Ehrlich, also demonstrated that its algorithm, multiplying a DNA sample using a polymerase chain reaction, can generate and accurately restore a practically unlimited number of copies of the sample, and even copies of its copies.


    Ehrlich runs the operating system on a virtual machine and plays in the Sapper.

    However, the most impressive features of the algorithm were the ability to place 215 petabytes of data in one gram of DNA - 100 times more than it was achieved using other methods and algorithms.

    The storage capacity of DNA data is theoretically limited to two digits for each nucleotide, as well as a biological DNA device. In addition, to collect and read the recorded fragments, it is required to include additional information, which subsequently reduces the capacity to 1.8 binary characters per nucleotide. The DNA Fountain algorithm allows an average of 1.6 bits to be placed in each nucleotide, which is 60% more than previously managed, and also close to the limit of 1.8 bits.

    The main obstacle to the wide spread of technology is its cost. Researchers spent 7 thousand dollars to synthesize DNA and archive 2 megabytes of data, and another 2 thousand to decrypt it. And although the cost of DNA sequencing is gradually decreasing, its synthesis still costs a round sum. Investors are not ready to invest tons of money just for the sake of synthesis fell in price.

    Erlich and his team offer another way to solve the problem: you can reduce the price of DNA synthesis, if you produce molecules of lower quality, and then use the “DNA fountain” coding strategy to correct molecular errors.

    Scientific work published in the journal Science March 3, 2017
    DOI: 10.1126 / science.aaj2038

    Also popular now: