Skip to main content
Premium Trial:

Request an Annual Quote

DNA Extraction Remains Bottleneck for Long-Read Techs But Solutions Begin to Emerge


NEW YORK (GenomeWeb) – As long-read sequencing and genome mapping technologies from companies like Pacific Biosciences, Oxford Nanopore Technologies, Bionano Genomics, and 10x Genomics are gaining in popularity, a bottleneck has emerged around the ability to quickly and efficiently extract high-quality high molecular weight DNA.

Several companies have taken on the challenge, including Circulomics, Bionano, Sage Science, and RevoluGen, with the goal of developing high-throughput methods for extracting DNA hundreds of kilobases and up to megabases in length from different sample types. In addition, academic groups have been developing their own protocols or optimizing commercial ones to meet their specific needs.

"A lot of the standard methods to [extract DNA], like columns and magnetic beads, aren't really that great for getting really big DNAs," said Kelvin Liu, founder and CEO of sample prep startup Circulomics. "What most of the users end up doing is using very old-school methods," such as agarose plug purification, which protects the DNA from shear forces in solution by embedding it in a gel, or phenol-chloroform extraction and precipitation. These deliver clean and long DNA fragments but are slow and difficult to scale.

In the meantime, the need for high molecular weight DNA is growing as de novo genome sequencing projects have started to forgo short-read sequencing entirely. "People who want to sequence new organisms pretty much only do long-read sequencing now," Liu said. "It used to be that people may do some type of Illumina sequencing, but now, people are doing straight long-read technologies. All PacBio, all Oxford Nanopore, coupled with Bionano, coupled with 10x Genomics."

Projects like the Vertebrate Genome Project (VGP), for example, which plans to generate reference genome assemblies for all 66,000 vertebrate species on Earth, have committed to using genome mapping and long-read sequencing technologies in high throughput.

According to Olivier Fedrigo, director of the Vertebrate Genomics Laboratory at Rockefeller University, which generates data for the VGP, Bionano's mapping platform requires DNA fragments of 200 kilobases or above. While the other technologies used in the VGP — currently, mostly PacBio sequencing and 10x Genomics linked reads — can get away with somewhat shorter pieces of DNA, he said, his team decided to just use one type of DNA extraction that works for all of them.

Not only does the DNA need to be very long but also very pure, he added, with no proteins or other contaminants present, which appears especially important for nanopore sequencing.

For now, his group has settled on an agarose plug extraction kit and protocol from Bionano, which delivers fragments in excess of 250 kilobases, works well for all downstream applications, and is available for different sample types, including animal tissue, blood, cell lines, and plants.

However, the protocol takes seven to 10 days and is difficult to automate, which could become problematic once the VGP starts producing genome assemblies in high throughput. The researchers also had to tweak the Bionano protocol to improve the DNA yield, which was initially not sufficient for PacBio sequencing, Fedrigo said.

In addition, he and his colleagues have been developing protocols for storing and handling different types of tissues. "We're hoping to keep testing other preparation methods, but so far, flash-freezing tissues and blood is the best method across the board," said Jacquelyn Mountcastle, a research associate and head of the sample prep group at Rockefeller.

Other scientists are trying to pull together expertise for organisms other than vertebrates for applications such as nanopore sequencing. "When you start thinking about certain types of gram-negative bacteria, spores, fungi, and things like that, it's irrelevant what type of [DNA extraction] technique you're using because it's so difficult to crack those cells open in the first place," said Josh Quick, a researcher in Nick Loman's lab at the University of Birmingham in the UK.

As part of a recent grant from the Wellcome Trust, Loman and Matt Loose from the University of Nottingham founded a platform called "Long Read Club," which they hope will be used by researchers to share their expertise for getting DNA out of specific organisms. "The main benefit will be to technologies where you can get really long-read information, like Bionano and nanopores, and to some extent PacBio, as well," Quick said. "All of the methods are crucially dependent on this high-molecular weight extraction. We don't really know how to do that for that many species."

For example, he said, researchers have developed protocols to lyse the cell walls of particular microbes enzymatically, or to open those cells by physical means.

To extract DNA for nanopore sequencing from human cells, Quick and his colleagues have been using a phenol-chloroform prep — decades-old technology that keeps the DNA at high concentrations, which appears to protect long fragments. However, that method is difficult to automate. "Really, what we're talking about is scaling up from a few artisanal genomes with very long reads, which is what we have now, to it becoming the norm for population-scale sequencing," he said.

What counts in the end is how much of the DNA is available as long fragments, and what size 50 percent of the sequence reads have as a result, or the N50 number. "We like to talk about the longest read that we've got but that's really only to keep interest going," Quick said. "Really, the most important thing is the N50. "

Bionano Genomics

Bionano Genomics has been developing methods for extracting high-quality ultra-high molecular weight DNA, up to a megabase in length, since its inception, according to Mark Borodkin, the company's COO. "Our current chemistries, which are based on agarose plugs, were refined over many years and provide the standard in terms of the length of DNA that comes out of those," he said.

The company has several ongoing efforts, he said, both internally and with third-party collaborators, to develop sample prep methods that are faster and automatable, and plans to release new products over the coming months.

Improving sample prep starts with protocols for collecting and storing samples in a way that keeps the DNA intact. "You can't extract high-quality DNA if you have very poor samples to start off with," Borodkin said. Bionano already provides protocols for freezing and processing mammalian blood samples, he said, and is working on others for handling various tissue samples.

On the DNA extraction side, "we are investing in alternate sample prep protocols and kits that do not use agarose plugs, that are faster, and can be automated, as well," he said. The goal is to deliver very clean DNA with average fragment lengths of at least 250 kilobases. For that, the company is looking into in-solution protocols, as well as new ways to remove impurities.

The aim is to extract high molecular weight DNA within a couple of hours, he said, in line with standard protocols for DNA extraction. "Our goal would be to reach the same type of workflow time and hands-on times that [DNA extraction for] standard next-generation sequencing enjoys," he said.

Many customers would be happy with manual protocols that are fast and don't require capital equipment, he said. "Where the automation helps is then to scale that to more samples at a time."

While Bionano is primarily developing sample prep kits and protocols for use with its own platform, it does not mind if researchers use its kits to power other technologies. "We're fairly agnostic with regards to who uses our sample prep kits," Borodkin said.


One of several companies Bionano has been collaborating with to develop new DNA extraction technology is Circulomics. "We've been impressed with the initial results and we're going to continue to work with them," Borodkin said.

Baltimore-based Circulomics has developed a technology called Nanobind that uses small magnetic disks with a nanostructured silica surface, which binds DNA. The principle is similar to conventional magnetic bead extraction of DNA, "except instead of having millions of little particles, you have one single disk," Liu explained, the surface of which protects the DNA and prevents it from becoming sheared.

Using a bind-wash-elute protocol, this allows the company to obtain DNA fragments hundreds of kilobases or even megabases in length within an hour or so. Typically, Circulomics, a spinout from Johns Hopkins University, uses a manual protocol but it has also developed an automated version that runs on a Thermo Scientific KingFisher platform. Depending on the instrument type, this allows the firm to process between 12 and 96 samples within about half an hour, Liu said.

At the moment, the company sells two Nanobind DNA extraction kits — early versions that might still be tweaked in response to customer feedback: one for cultured cells, cultured bacteria, or blood, and the other for plant nuclei. In addition, it offers custom DNA extractions from uncommon sample types as a service, using chemistries that are still under development. However, Liu said that some customers with unusual samples, such as insects, "just buy lots of kits and start experimenting."

In the meantime, Circulomics is developing a commercial kit for tissue DNA extraction, which "can likely be adapted for many organisms and tissues," he said. This will involve some special homogenization steps to get rid of extracellular material, which would be difficult to automate.

Liu said the company is working with several sequencing or mapping vendors and has published papers with Pacific Biosciences and Bionano Genomics. It has also tested DNA from its extractions on PacBio, Oxford Nanopore, and Bionano platforms and is working with some groups to use megabase-size DNA for nanopore sequencing "to see if we can get super-duper long reads."

Circulomics uses two processes — one resulting in DNA up to a few hundreds kilobases in length that is "generally good enough for long-read sequencing," the other yielding megbase-sized DNA for applications like Bionano, which can be used for any of its kits. "When you get to super-big DNA, it's definitely more complicated to work with because it will be a lot more viscous, and also more heterogeneous. So if you have an application where you don't really need DNA that big, it's better to use standard high molecular weight DNA," Liu said. "Oftentimes, you'll have to shear it down to the appropriate size for it to run well."

In addition to DNA extraction kits, Circulomics is working on library prep DNA purification methods. This will involve the same magnetic disks as its extraction kits, Liu said, but with different chemistries, for example to get rid of short DNA fragments, salts, or proteins.

Sage Science

Sage Science, based in Beverly, Massachusetts, launched an instrument for high molecular weight DNA extraction called Sage HLS in 2017. The system, which uses electrophoresis, can process up to four samples per run — suspensions of cells, nuclei, or spheroblasts — that are loaded onto two gel cassettes.

The entire process takes two to six hours. It starts by moving a plug of SDS from a reagent well into the sample well by electrophoresis. This lyses the cells and strips proteins, membranes, RNA, and other cellular components off the DNA and moves them into the agarose gel column. The DNA remains in large fragments of several megabases and becomes entangled in the agarose wall of the sample well.

Next, enzymes are added to the sample well that fragment the DNA. These can be either non-specific nucleases or CRISPR/Cas9 complexes that cleave out specific DNA regions. After that, gel electrophoresis moves the DNA into the gel, where it is eluted into six size bins.

"A key advantage of the SageHLS extraction process is that the DNA is not exposed to viscous shear from pipetting techniques, so very high molecular weight DNA can be isolated," up to 2 megabases in size, said Chris Boles, Sage Science's CSO, in an email. Researchers have used the Cas-mediated process, called CATCH, to isolate fragments between 50 kilobases and 400 kilobases in size, he added, and the company is working on extending that size range to 1 megabase.

Most customers evaluate the SageHLS system for CATCH applications, he said, in particular for genes that have pseudogenes or segmental duplications. For downstream analysis, they often use 10x Genomics and Illumina sequencing, which have low input requirements, though some use PacBio sequencing. Other customers employ the SageHLS for preparing DNA for whole-genome sequencing, usually with 10x Genomics and Illumina.


UK-based RevoluGen is another company that has been working on new high molecular weight DNA extraction methods. The firm has developed a spin column kit, called Fire Monkey, that uses patent-pending technology to ensure the DNA stays intact and takes about an hour.

"The chemistry of the solutions and the matrix together are so different that we don't break the DNA anywhere nearly as much as with other spin column kits," said Georgios Patsos, RevoluGen's CSO. He added that the kit performed favorably in an independent comparison with several established spin column DNA extraction kits. The company does not disclose details of its chemistry but Patsos said it relies on "the concept of not adding too much stress to the DNA."

The DNA size distribution Fire Monkey achieves goes up to 500 kilobases, he said, and if the sample is fresh, average fragment sizes can be greater than 100 kilobases. In addition, the DNA has fewer nicks, which could interfere with long-read sequencing, than DNA extracted with other methods.

Like other spin column methods, Fire Monkey can be multiplexed, Patsos said, with the size of the centrifuge determining how many samples can be processed in parallel. Alternatively, the method could be automated using magnetic bead technology but the company is not working on that at the moment.

One challenge is that the DNA is often a mix of short and very long fragments at the end, which "will overwhelm both the nanopore technology and the PacBio — you would never see the long fragments," he said. To remedy this, RevoluGen has developed a DNA size exclusion protocol, called Fire Flower, for removing DNA fragments up to 10 kilobases in size during the library preparation process. Rival technologies can only deplete DNA up to about 3 kilobases in size, he added.

RevoluGen has made the Fire Monkey technology available to a number of academic groups, who have used it to extract DNA from blood and bacteria, Patsos said.

According to Erling Refsum, secretary and a director of the company, RevoluGen is in talks with several companies about marketing and distributing its products.

Opinions differ somewhat as to how to handle long DNA fragments once they have been extracted. "We have to be very careful about how we handle the DNA, how it's stored, how it's pipetted," said Rockefeller's Mountcastle. "Any pipetting in general will cause fragmentation," which is why the lab uses special, wide pipette tips. "The other thing is, once you have extracted it, it's in kind of a clump of tangled DNA, so we have to let it sit for about two to five days and it kind of detangles on its own at room temperature before we can work with it," she said.

The lab also never freezes high molecular weight DNA, Fedrigo said, but keeps it either at room temperature or in the fridge. "That was surprising to us at the beginning."

Liu said that long DNA should generally be treated gently, but "you don't have to be too crazy." Some of Circulomics' protocols, for example, involve a lot of vortex mixing but still result in DNA fragments that are hundreds of kilobases long.

However, getting DNA from unbroken human chromosomes in one piece, tens of megabases in size, may be difficult. "We have some methods we're thinking about that might be able to get a whole chromosome," he said. "I think it's possible but it would be challenging."

The Scan

NFTs for Genome Sharing

Nature News writes that non-fungible tokens could be a way for people to profit from sharing genomic data.

Wastewater Warning System

Time magazine writes that cities and college campuses are monitoring sewage for SARS-CoV-2, an approach officials hope lasts beyond COVID-19.

Networks to Boost Surveillance

Scientific American writes that new organizations and networks aim to improve the ability of developing countries to conduct SARS-CoV-2 genomic surveillance.

Genome Biology Papers on Gastric Cancer Epimutations, BUTTERFLY, GUNC Tool

In Genome Biology this week: recurrent epigenetic mutations in gastric cancer, correction tool for unique molecular identifier-based assays, and more.