NEW YORK – A team at North Carolina State University has developed a new architecture for storing information in DNA that allows for easier access and increases the storage density.
The system, called DORIS (Dynamic Operations and Reusable Information Storage), relies on single-stranded DNA overhangs, or "toeholds," that serve as an address for specific double-stranded DNA molecules, or files, that encodes the information. Unlike most other DNA data storage systems, which use PCR for data retrieval, with DORIS, the information can be read out by transcription, after which the coding DNA can be returned to the database for repeated use. A description of the method appeared earlier this month in Nature Communications.
Albert Keung, an assistant professor of chemical and biomolecular engineering at NC State and a corresponding author on the paper, said the goal of the project was to develop a scalable and practical DNA information storage system. "One of the challenges is that DNA is this mixture of molecules, and it's not organized physically in the same way that a hard drive or data storage center is," he said.
Traditional DNA data storage systems rely on PCR to find a specific piece of information, he said, which requires the double-stranded DNA to be melted. However, the PCR primers can also bind non-specifically. That doesn't happen with the DORIS architecture, where the DNA molecules have a single-stranded "handle" that allows them to be pulled out without melting the DNA, using an oligonucleotide and magnetic separation. "The oligo that will bind the address doesn't have the chance to interact with the data because the data part is all double stranded," Keung explained.
As a result, the information density in the DNA can be higher because no extra sequences need to be inserted to prevent primers from binding non-specifically.
Another advantage of the single-stranded overhangs, he said, is that they can be used to manipulate the DNA files, for example, renaming, locking, unlocking, or deleting them.
After pulling out a desired DNA fragment, the information can be read out by transcribing it off a T7 promoter – similar to the way that DNA in a cell gets read by transcribing it into RNA. This is followed by reverse transcription and sequencing, a process that doesn't consume the original DNA molecule.
In a proof of concept described in their paper, the researchers constructed a three-file database, using DNA molecules up to about 200 nucleotides in size, and showed that they could access each of the strands specifically. They also scaled up the system to more than 2,000 distinct strands and explored the distribution of the data they retrieved.
According to Emily Leproust, CEO and cofounder of synthetic DNA company Twist Bioscience, which sees DNA data storage as a long-term growth area, the approach is "a very smart use of molecular biology and synthetic biology techniques to enable DNA manipulation for data storage purposes."
Because DORIS uses linear amplification instead of PCR to access the information, "this method has the potential to be a cleaner system with a lower degree of sequence constraints, opening up more sequence space and increasing the information density and maximum capacity," she said in an email.
Twist is not wedded to a particular DNA encoding scheme for data storage, she added, and is "happy to see different alternatives to the predominant PCR-based method." For example, Twist provides DNA synthesis services for a government-funded consortium to develop digital data storage.
Dina Zielinski, a bioinformatician and a senior scientist at Cibiltech, a transplant monitoring company in France, said that the NC State researchers "showed how several weaknesses of DNA storage can be overcome with some elegant molecular hacks." Three years ago, Zielinski and Yaniv Erlich, a researcher at Columbia University at that time, published their own paper about a new DNA storage architecture in Science.
"While [the method] has some advantages over PCR-based approaches, it's a bit more tedious and molecules are still lost after repeated access (e.g. ~50 percent drop after 5 times)," she said in an email. "However, the targeted access and in-file storage operations can reduce the computational challenges of encoding and accessing data."
She also noted the use of the single-strand overhangs for renaming and deleting files, and for changing access permissions. "These are the kinds of features that will allow DNA to compete with man-made devices," she said.
Going forward, the NC State researchers plan to scale DORIS further and see how it performs. In theory, a megabyte of information could be encoded by four megabases, explained James Tuck, a professor of electrical and computer engineering at NC State and the other corresponding author of the paper, but in reality, that's not possible because the promoter regions and toeholds, as well as a needed error correction system, add extra bases. As a result, "you probably end up with at least 30 percent more bases to represent that one megabyte," he said.
"From a practical perspective, we'd love to be able to keep scaling," Keung said. "Can we make a 100-megabyte-size system, can we go up to a gigabyte, a terabyte, and can we continue to push this technology to be able to handle those increasing sizes?"
Scaling every aspect of DORIS is going to be the greatest challenge for its practical use. "Scaling synthesis, scaling sequencing, scaling access with the requisite robustness and error rates — choosing which of these is the bottleneck is really difficult now because there is exciting progress in all three areas," he said.
How much information can be stored in a single DNA molecule, he added, is limited by the ability to synthesize long DNA strands, as well as by their stability, since long DNA fragments often get sheared.
Another goal of the team is to implement the system on an automated device. Because information can be retrieved under isothermal conditions, the technology is compatible with many types of liquid handling or microfluidic systems, Keung said. So far, the researchers have done some work with microfluidic channels and have had some success in pulling out and accessing DNA files with that.
They have also filed a patent application for the DORIS system and are thinking about founding a company to drive its commercialization. "We are certainly interested in exploring that and learning about what customers would want, where DNA storage would fit within the landscape of storage in general," Keung said. They might also be interested in finding partners who specialize in synthesizing large amounts of DNA inexpensively, he added. "Right now, we order the DNA, and the cost of that is quite high."
"One of the reasons scaling is so important is that in order to put a product like this on the market, it needs to be, on day one, competitive with the existing technologies," Tuck said, such as flash drives, tape drives, optical drives, and it needs to be continuously advancing.
DNA data storage is often touted as a means for archiving information for the long term, without being able to access it readily, but potential users often don't like that idea. "Anytime you tell someone 'you're not going to access your data very frequently,' they're not happy about it," he said, "but for big enough cost savings, and for being able to store enough data, they probably would eventually go along."