Hello! The world's first automatic data warehouse in DNA molecules



    Researchers at Microsoft and the University of Washington have demonstrated the first fully automated read-only DNA system in artificially created DNA. This is a key step towards moving new technology from research laboratories to commercial data centers.


    The developers confirmed the concept with a simple test: they successfully encoded the word “hello” in fragments of a synthetic DNA molecule and converted it back into digital data using a fully automated end-to-end system, which is described in an article published March 21 in Nature Scientific Reports.


    This article is on our site.

    In DNA molecules, you can store digital information with a very high density, that is, in the physical space, which is many orders of magnitude smaller than modern data centers occupy. This is one of the promising solutions for storing a huge amount of data that the world generates every day, from business records and videos with cute animals to medical images and images from space.


    Microsoft is exploring ways to bridge the potential gap between the amount of data we produce and want to store and our ability to store it. Among these methods is the development of molecular computing algorithms and technologies for encoding data in artificial DNA . This would allow all the information stored in a large modern data center to fit into a space approximately equal to the size of several dice.


    “Our main goal is to put into operation a system that for the end user will look almost the same as any other cloud storage system: information is sent to the data center and stored there, and then simply appears when the client needs it,” says the senior Microsoft researcher Karin Strauss. “For this, we needed to prove that it makes practical sense from the point of view of automation.”


    Information is stored in synthetic DNA molecules created in the laboratory, and not in the DNA of humans or other living things, and can be encrypted before being sent to the system. Although complex machines, such as synthesizers and sequencers, already perform key parts of the process, many of the intermediate steps still required manual labor in the research lab. “It's not suitable for commercial use,” said Chris Takahashi, a senior fellow at the Paul Allen School of Computer Science and Engineering at the US University .


    “People with droppers cannot run around the data center, with this approach, the probability of human error is too high, it is too expensive and requires too much space,” Takahashi explained.




    In order for this method of data storage to make sense from a commercial point of view, it is necessary to reduce the costs of both DNA synthesis - the creation of fundamental building blocks with meaningful sequences, and the sequencing process, which is necessary for reading stored information. Researchers say that there is rapid development in this direction .


    According to researchers from Microsoft, automation is another key part of this puzzle, allowing you to organize data storage on a commercial scale and make it more accessible.


    Under certain conditions, DNA can last much longer than modern archival storage tools that have been destroyed for decades. Some DNA managed to survive in conditions far from ideal for tens of thousands of years - in the mammoth tusks and in the bones of early humans. So, data can be stored in this way, as long as humanity exists.


    The automated DNA storage system uses software developed by Microsoft and the University of Washington (UW). It converts the units and zeros of digital data into nucleotide sequences (A, T, C, and G), which are the “building blocks” of DNA. Then the system uses inexpensive, mostly standard, laboratory equipment to supply the necessary fluids and reagents to the synthesizer, which collects the prepared DNA fragments and places them in a storage tank.


    When the system needs to extract information, it adds other chemicals to properly prepare the DNA and uses microfluidic pumps to push liquids into those parts of the system that read the sequences of DNA molecules and convert them back to computer-readable information. Researchers say that the goal of the project was not to prove that the system can work quickly or cheaply, but simply to show that automation is possible.


    One of the most obvious advantages of an automated DNA storage system is that it frees scientists to solve complex problems, eliminating the need to waste time searching for reagent bottles or monotonously adding drops of liquid to test tubes.


    “Having an automated system for performing repetitive work allows laboratory staff to do research directly, develop new strategies to innovate faster,” said Microsoft researcher Bihlin Nguyen.


    A team from the Molecular Information Systems Lab (MISL) Molecular Information Systems Laboratory has already demonstrated that they can store feline photos, wonderful literary works, videos and archival records in DNA and extract these files without errors. To date, they have been able to save 1 gigabyte of data in DNA, breaking the previous world record of 200 MB .


    The researchers also developed methods for performing meaningful calculations , such as searching and retrieving only images that have an apple or a green bicycle, using the molecules themselves to do this, without converting the files back to digital format.


    “It is safe to say that we are witnessing the birth of a new type of computer system in which molecules are used to store data, and electronics is used for control and processing. This combination opens up very interesting opportunities for the future, ”said Louis Sese, a professor at the Allen School of Washington University .


    Unlike silicon-based computing systems, DNA-based storage and computing systems must use liquids to move molecules. But liquids are inherently different from electrons and require completely new technical solutions.


    The University of Washington team, in collaboration with Microsoft, is also developing a programmable system that automates laboratory experiments using the properties of electricity and water to move droplets on an electrode grid. The full range of software and hardware, known as Puddle and PurpleDrop , can mix, separate, heat, or cool various fluids and run laboratory protocols.


    The goal is to automate laboratory experiments that are currently being carried out manually or by expensive liquid-handling robots and reduce costs.


    The next steps for the MISL team include integrating a simple end-to-end automated system with technologies such as Purple Drop, as well as other technologies that allow you to search in DNA molecules. Researchers have specifically made their automated system modular so that it can evolve as new technologies for DNA synthesis, sequencing, and DNA work.


    “One of the advantages of this system is that if we want to replace one of the parts with something new, better or faster, we can just connect a new part,” said Nguyen. “It gives us great flexibility for the future.”


    Top image: Researchers at Microsoft and University of Washington recorded and read the word hello using the first fully automated DNA data storage system. This is a key step in transferring new technology from laboratories to commercial data centers.


    Also popular now: