Hybrid selection, originally developed for the capture of sequencing targets in the human genome, also works well as a tool to isolate pathogen DNA from clinical samples for more efficient and affordable sequencing of infectious disease, a group of Broad Institute researchers has demonstrated.
Their study, published this month in Genome Biology, examined two types of bait design for hybrid selection of the malaria parasite Plasmodium falciparum in host DNA — one synthetic and the other generated from pure pathogen DNA. Both approaches achieved significant enrichment of pathogen DNA, the authors wrote, to a level that allowed them to do whole-genome sequencing on samples "which would otherwise would have been prohibitively expensive to sequence."
When tested on a clinical sample — a blood spot collected on filter paper and stored for over a year at room temperature — hybrid selection (following whole-genome amplification) yielded a 70-fold increase in malaria DNA representation, and sequencing data from a run on an Illumina HiSeq confirmed that about 5.9 percent of mappable reads in the selected sample corresponded to P. falciparum.
Dan Neafsey, one of the study's authors and the leader of the Broad Institute's Malaria Genome Sequencing and Analysis Group, told Clinical Sequencing News that the researchers were interested in tackling one of the "primary challenges" in infectious disease sequencing: "Simply getting enough samples of sufficient quality."
Hybrid selection, he said, expands the types of samples that can be sequenced efficiently and affordably, allowing researchers to identify "the genes and pathways that underlie traits like virulence, drug resistance, and other things that have an impact on public heath."
While the group used malaria in their method research, Neafsey said hybrid selection could be suitable for "any kind of pathogen that is difficult to isolate and separate from host tissue." Malaria, he said, was an interesting test subject, because while the P. falciparum strain can be cultured in vitro, "adaptation to life in a Petri dish can take as much as two or three months and lots of hours of tech time and doesn’t always work."
"So if you have some particularly important clinical samples — say a patient is showing resistance to an important drug — and you want to sequence those samples, … culture might not be an option.
The group's hybrid selection approach, developed by two of the study's authors, Alexandre Melnikov and Andreas Gnirke, was originally intended for capturing the human exome and Agilent holds a license to the technology for use in its SureSelect target-capture product (IS 2/24/2009). "We thought, it might work equally well to recover a clinical sample where typically only one percent of the DNA in the clinical sample from an infectious disease patient might actually be the DNA of the pathogen," he said.
In their study, the researchers "applied two different versions of hybrid selection" to a mock clinical sample consisting of 99 percent human DNA and one percent Plasmodium DNA by mass, Neafsey said. For each version, the group tested both unamplified samples and whole-genome amplified DNA generated from ten nanograms of the mock clinical sample. (WGA, they wrote, did not "significantly alter the fraction of malaria DNA present in the sample.")
One version of the hybrid selection used synthetic 140-base-pair oligos generated by Agilent and designed to correspond to regions of the P. falciparum genome the group thought "would be good to capture."
The other employed a new approach, Neafsey said, using "so-called whole genome baits," created from pure malaria DNA cultured in vitro. "Instead of having baits synthetically generated in a machine that are complementary to the parasite genome, we actually use the parasite genome itself as a means of generating baits," he explained. The researchers fragment pure malaria DNA into approximately 250-basepair fragments that they then use to generate baits that cover the entire genome.
"It’s a very efficient cost-effective way of designing baits that specifically target the whole genome of the pathogen you're interested in," he said.
According to Neafsey, both methods performed well, though each may have strengths for different applications. "If you want the entire pathogen genome, then whole-genome baits are a very cost-effective way of getting the whole pathogen genome. If you want to target heavy sequencing coverage on certain regions, a synthetic bait approach might be a better way," he said. For example, researchers interested in sequencing just the exons of the pathogen genome could design synthetic oligos to capture only exonic regions.
"In fact," Neafsey said, "we did that. We designed our oligos to especially concentrate on exonic regions of the malaria genome, and when we performed capture we got dimensionally higher coverage in the exonic regions."
Synthetic baits yielded an average of 41-fold and 44-fold parasite DNA enrichment for unamplified and WGA simulated samples. The whole-genome baits yielded 37-fold and 40-fold genome-wide enrichment levels for unamplified and WGA samples respectively, the authors reported.
Sequencing using a single Illumina Genome Analyzer lane gave approximately 67-fold coverage on hybrid selected sample DNA. Coverage levels in baited regions were "significantly higher than the levels observed from comparable sequencing of pure P. falciparum DNA" using synthetic baits, the authors wrote, with mean coverage at 143.8-fold and 92-fold respectively. This indicated that hybrid selection with synthetic baits may be useful for "strategically augmenting coverage levels in regions of pathogen genomes where heightened sequence coverage could be informative, such as highly polymorphic antigenic regions subject to host immune pressure."
In their report, the authors wrote that though effective sequencing coverage levels were reduced relative to pure pathogen DNA in their study, the reduction was "small compared to the 100-fold reduction in coverage expected without hybrid selection." They observed no reduction in coverage uniformity, they reported.
According to Neafsey, hybrid capture could be very useful for looking at archival samples, for example, if researchers wanted to compare pathogen populations before and after an outbreak of a drug-resistant strain.
Samples collected in the field and stored for long periods as dried blood spots on filter paper contain little pathogen DNA compared to host and aren't suitable for methods like white blood cell extraction to isolate pathogen material, he said.
"If your sample is fresh or has been collected and stored appropriately … you can perform white blood cell depletion … to selectively remove the human white cells and leave just red cells that contain the malaria parasites inside them," Neafsey said. "But if your sample has been frozen and you've lost cellular integrity, if the host and malaria cells have lysed, the DNA is mixed together and you can't separate by any means apart from a protocol like hybrid selection."
To evaluate hybrid selection as an option for sequencing difficult samples, Neafsey said the group experimented on DNA extracted from a dried blood spot on filter paper taken from a malaria patient in Thies, Senegal, in 2008 and stored at room temp for approximately a year.
The researchers used whole-genome amplification to generate enough material — a few micrograms — for hybrid selection. Plasmodium DNA in the original sample was measured by qPCR at approximately 0.1 percent of total DNA by mass. After WGA and hybrid selection, malaria DNA represented 7.7 percent — a 70-fold increase.
According to Neafsey, a second round of hybrid selection achieved enrichment at an even higher level. "We were able to rescue this sample that would have been extraordinarily expensive to sequence … and raised the malaria DNA to a fractional level that was affordable and efficient to sequence," he said.
The researchers evaluated the accuracy and utility of the sequencing data by calling SNPs against the P. falciparum reference assembly, identifying a total of 26,366 SNPs relative to the parasite's reference assembly. While the depth of coverage obtained would not be "sufficient for de novo genome assembly," they wrote, "SNP calling against a reference assembly is the end-stage analysis for most Illumina data, and therefore a good indication of a dataset's potential utility."
A Complementary Approach
According to Neafsey, though the group's results have been promising, hybrid selection will definitely not usurp other methods of pathogen DNA enrichment in all cases.
"One thing I'll say," he said, "is that this is a method that is complementary to white cell depletion as far as malaria goes, because if you have the sample fresh, and the technicians and resources available to do white cell depletion, that's a cost-effective way of preparing samples that would be efficiently sequence-able and [would remove] the human DNA straight away."
In other situations, however, such as field work where it's easier to store samples as dried blood spots, or in cases where samples were archived years ago, "hybrid selection is probably one of the only ways to do efficient sequencing on samples of that nature," he said.
Additionally, groups with the resources can alternately "throw some extra sequencing coverage at the problem" in order to obtain sufficient coverage, Neafsey said, though he acknowledged that "not every person doing sequencing might have the resources to use that additional coverage to deal with host contamination, and there might be some samples that are not amenable to that."
The authors wrote in their paper that for augmented coverage to be an affordable strategy relative to hybrid selection for a target coverage level of 40-fold, samples must contain at least 50 percent pathogen DNA, which is "rarely found in clinical samples unless white cell depletion is performed."
"For a more typical clinical sample," they wrote, "hybrid selection resulting in 40-fold enrichment enables 40x coverage depth for a dramatically lower total price … than deeper sequencing of the unpurified sample" — approximately $1,000 vs. $40,000 respectively.
Another team of researchers has already started using the Broad group's hybrid selection method in their investigation of another malaria parasite, P. vivax. Taylor Bright, a graduate student in Elizabeth Winzeler's lab at the Scripps Research Institute, is currently using the whole-genome capture technique to investigate the genetic diversity of P. vivax in an area of Peru.
Bright told Clinical Sequencing News in an e-mail that the method "will be critical in obtaining sequencing data from field samples and other samples that are contaminated with foreign DNA," and "seems to be able to analyze the genomic content of low levels of pathogen DNA without introducing any substantial bias."
The Scripps group has previously sequenced samples using white blood cell filtration, but this requires a field lab, limiting the area where the team can collect samples. "Our main interest in the whole-genome capture technique is therefore to allow us to perform sequencing analysis of P. vivax parasites from a much more diverse geographic area. In addition, collecting samples via blood spot will increase the number of samples we can obtain versus a two- to three-hour on-site filtration protocol," Bright wrote.
So far, the technique "has been working well for us," he said. "To date we have performed the capture on three frozen blood samples and two blood spots and in each case we were able to increase the P. vivax DNA content from less than 1.5 percent to greater than 20 percent P. vivax DNA, which then allows us to conduct efficient whole-genome sequencing analysis on the parasite samples … thereby giving a more complete picture of the genetic diversity in and around our field site."
Neafsey said that the Broad team is also interested in P. vivax and has been adapting its protocol to sequence the strain.
"This has been a neglected malaria parasite, in part because it can't be cultured in a Petri dish. It doesn't like to grow outside of primate hosts, so instead they have to be grown inside of monkeys with their spleens removed in order to let the infections get to a high level," he said.
As a result, so far there have only been two P. vivax genomes sequenced compared to hundreds for P. falciparum, he noted, adding that his group has had a project proposal approved by the National Institute for Allergy and Infectious Disease to adapt hybrid sequencing to P. vivax in order to "rectify this imbalance in sequencing resources between these two species."
According to Neafsey, the project plan is to sequence approximately 40 P. vivax isolates from around the world in order to assess genomic diversity in the neglected species.
"The protocol is pretty well hammered out at this point," he said. "We had to do surprisingly little adaptation of it to transform the basic selection protocol from human exome capture to pathogen genome capture."
"We have very high confidence it will work well for vivax, it's just a matter of getting the bait set and then once we have that we can just turn the crank."
In their paper, the authors disclose that they are seeking to patent their whole-genome bait preparation and the application of hybrid selection to clinical infectious disease samples.
Neafsey said the group has not yet discussed commercialization with any companies, but the researchers have initiated their patent application "to facilitate commercialization if there turns out to be a sufficient market."
Have topics you'd like to see covered in Clinical Sequencing News? Contact the editor at mashford [at] genomeweb [.] com.