Skip to main content
Premium Trial:

Request an Annual Quote

Sanger Team Sequences DNA Directly on PacBio RS without Library Prep


Researchers at the Wellcome Trust Sanger Institute have demonstrated that it is possible to sequence DNA on the Pacific Biosciences RS without first doing library preparation. They achieved the best results when the DNA was circular, but found that they could directly sequence linear DNA as well.

Senior author Harold Swerdlow told In Sequence that because the method requires no sample prep and works with limited DNA it could have applications in outbreak scenarios, acute disease detection, metagenomics, and forensics.

In the study, published in BioTechniques this month, the Sanger team demonstrated that the technique needed as little as one nanogram of input DNA and could generate sequence data within eight hours of receiving the sample.

However, Paul Coupland, the lead author of the study, said that a potentially larger benefit than cutting down turnaround time is that there is no bias from the library preparation. "You're sequencing exactly what you have," he said.

There are however, tradeoffs, particularly in throughput. In the study, direct sequencing generated as many as 3,000 reads per SMRT cell, compared to the 35,000 to 50,000 reads that would have been generated with the instrument using library prep, the authors wrote.

Meni Wanunu, who was not involved with the study but is working with PacBio under a grant from the National Human Genome Research Institute to reduce the DNA input requirements and cost of PacBio sequencing, told In Sequence that the study represents an important advance.

"It's a positive step forward," said Wanunu, a physicist and chemical biology professor at Northeastern University, citing both the fact that the team was able to directly sequence genomes without library prep, and also that they were able to use just nanograms of DNA, as opposed to the micrograms typically required by the PacBio machine.

He said that the method could be applied to bacterial, viral, and other small genomes and would also have uses in epigenomics.

Coupland said he tested DNA samples that naturally mimicked the PacBio SMRT bell — the company's proprietary template design for circularizing DNA. The SMRT bell, which is ultimately what is loaded onto the sequencer, consists of a double-stranded portion containing the insert and a single-stranded hairpin loop on either end for primer binding.

As such, he started first with single-stranded circularized viral DNA. Aside from being primed before loaded onto the sequencer, no other step was required.

Using viral DNA, the Sanger team first showed that from 25 nanograms of single-stranded DNA and 100-fold molar excess of primer, they could generate sequence data that mapped to the reference genome and called 100 percent of the bases with 100 percent consensus accuracy.

Next, they tested various concentrations of double-stranded viral DNA, from 100 nanograms down to 0.8 nanograms, using a single SMRT cell and running two 45-minute movies for each sample.

With 100 nanograms of DNA, they generated 1,917 filtered reads with a mean mapped read length of 1,559 base pairs, resulting in 329-fold coverage. Consensus accuracy was 100 percent, and 100 percent of the bases were called.

As less DNA was used, fewer reads were generated. With 1.6 ng of DNA, the team generated 224 filtered reads with a mean mapped read length of 1,046 base pairs, resulting in 18-fold depth of coverage and a consensus accuracy of 99.8 percent and 99.9 percent of the bases called.

The technique is "not applicable to projects where you need a lot of sequence," Swerdlow acknowledged, "But there are projects where you don't need a lot of sequence, so that's a key concept."

"This is a niche application that will hopefully be incredibly useful to somebody," he added.

The Sanger team also tested the protocol with bacterial plasmids, plasmid vector models, and linear DNA fragments covering an entire bacterial genome.

The researchers are still working on optimizing the method by figuring out the optimal amounts of input DNA and primers. For instance, over the course of the study, they demonstrated that using random hexamer primers as opposed to sequence-specific primers improves coverage.

Additionally, when testing varying concentrations of primers while sequencing Staphylococcus aureus plasmids, they found that increasing the concentration of primers reduced the number of mapped reads.

Starting with 50 nanograms of DNA and varying the hexamer primers from 10 times to 600 times the amount of DNA saw mapped reads drop from 3,240 to 2,011. The researches attributed this to "the proximity of annealed primers on the DNA strand at higher concentrations, leading to polymerases colliding with one another," or a reduction in signal-to-noise ratio.

Coupland added that the team is now working on techniques to get rid of excess primers that are left over after they've annealed and bound to the DNA.

Additionally, the smaller, 3-kilobase plasmid generated more data than the 30-kilobase plasmid — mean coverage was 35-fold and 5-fold, respectively.

Coupland said the reason for this was that at the time the team was using an instrument that did not have the updated MagBead loading system, which increases loading efficiency for larger fragments. At the time, loading was based on diffusion, so longer fragments did not load as efficiently as smaller ones.

In terms of read length and accuracy, the direct sequencing method is comparable to the standard sequencing protocol on the PacBio, Coupland said.

"There are no drawbacks in terms of read length and accuracy because PacBio is already single molecule sequencing, so it's just skipping the library prep and going straight into the sequencing part," Coupland said.

Direct sequencing of linear DNA molecules still needs a lot of optimization. When the team tested it on a linear molecule of Candidatus Phytoplasma mali, a plant-pathogenic mycoplasma with a genome around 600 kilobases and characterized by inverted repeats, only 870 post-filter reads were generated, of which 63 mapped with a mean consensus accuracy of 84.4 percent. The sequencing generated coverage of only 0.08 percent.

While the yield was poor, when the team searched the National Center for Biotechnology Information's RefSeq database, the correct pathogen was called as the most likely hit.

"This evidence suggests it is possible to draw information from the sequence data and begin to identify the genomes present in a sample even from 63 mapped reads … However, comparing the difference in data yield between the S. aureus and Ca. phytoplasma mali, it is clear that further optimization of the method is required to improve the number of reads that can be mapped when sequencing linear molecules from a variety of genomic samples," the authors wrote.

Wanunu added that he thought the method would be restricted to circularized DNA, since it mimics the template typically used on the PacBio machine. For the time being, the method will be most applicable to small genomes. "But if there is a way to take larger genomes and in a straightforward way to circularize them, it could be used for human genomes," he said.

Coupland said that he would like to see the technique used in real-world, clinical samples. For instance, "you could use this method to very quickly take a swab from a patient and find out what antibiotic resistance genes are present," he said.

Additionally, it could be used to quickly identify an organism, Swerdlow said, in which case, it wouldn't be necessary to sequence the whole thing to get a quick answer, as the team demonstrated when they were able to correctly identify the plant pathogen despite covering only 0.08 percent of the genome.

And because there is no library preparation, it could be used in cases where speed is important, such as in outbreaks or acute infections.

Forensics is another area where it could have benefit, where the "chain of evidence is important," Swerdlow said. "In the lab, samples could easily get contaminated or mixed up, but if you sequence directly what you have, there's the potential for keeping the chain of evidence more tightly controlled," he said.

While Swerdlow's team is not involved in doing clinical sequencing, he said the goal is to publish the method to enable others to apply it to such applications.

The Scan

Long COVID-19 Susceptibility Clues Contained in Blood Plasma Proteome

A longitudinal study in eBioMedicine found weeks-long blood plasma proteome shifts after SARS-CoV-2 infection, along with proteomic signatures that appeared to coincide with long Covid risk.

Tibetan Study Finds Adaptive Variant Influencing Skin Pigmentation

With a combination of phenotyping and genetic data, researchers document at PNAS a Tibetan-enriched enhancer variant influencing melanin synthesis and ultraviolet light response.

Domestication Linked to Nervous System Genes in Inbred Mouse Strains

Researchers highlighted more than 300 positively selected genes in domesticated mice, including genes linked to nervous system function or behavior in Genome Biology.

ALS Genetic Testing May Be Informative Across Age Ranges, Study Finds

Researchers in the journal Brain identified clinically actionable variants in a significant subset of older ALS patients, prompting them to point to the potential benefits of broader test use.