High-throughput sequencing-based studies of plant genomes and populations could benefit from the expanding repertoire of targeted enrichment strategies developed for human genetics research, according to a study in the February issue of the American Journal of Botany.
Researchers based in Oregon and Utah profiled the strengths and weaknesses of four leading targeted enrichment methods based on the use of PCR amplification, hybridization probes, restriction enzymes, or expressed transcript isolation, respectively.
"These methods each have their place," the study's first author Richard Cronn, with the US Department of Agriculture Forest Service's Pacific Northwest Research Station, told In Sequence.
Even so, Cronn and his co-authors argued that while many plant researchers are still using PCR-based enrichment strategies, which are a cost-effective way to target smaller regions of plant genomes, other available methods are better suited for tackling very long stretches of DNA or fully utilizing the sequencing capacity offered by next-generation sequencing platforms.
"PCR really was a tool for focusing on small genomic targets," Cronn explained. "If a person really wanted to scale up to large targets … many of the other methods that were available were much better at pulling out targets of that size."
Plants have proven challenging from a genomics perspective because many plant genomes are often extremely large, polyploid, and comprised of numerous repeat sequences.
While there have been successful efforts to sequence — and re-sequence — model organisms such as Arabidopsis (IS 8/30/2011) or important crop plants such as rice (GWDN 12/12/2011) and soybean (IS 1/17/2012), the complexities of many plant genomes have made it difficult to sequence even individual genomes, much less the number of individuals needed to do genomics-based population studies.
"[F]or non-model plants and plants possessing large genomes, we are at a crossroads where complete genomes can be sequenced but not readily assembled and where comparative genome-scale analysis of a large number of individuals is not cost effective for most studies," Cronn and his co-authors explained.
Sequencing stretches of DNA that have been isolated by targeted enrichment makes it possible to get the depth of coverage needed for finding SNPs, sorting out gene structure, assembling sequences, and comparing individuals within or between plant populations, they noted.
And with an ever-growing repertoire of enrichment methods, many based on strategies developed for human genomics, the emphasis is now on selecting targeted enrichment approaches that are not only appropriate for the research question at hand, but also cost-effective and compatible with high-throughput sequencing platforms.
For their recent analysis, Cronn and his colleagues considered the utility of PCR-based, hybridization-based, restriction enzyme-based, and transcriptome-based enrichment strategies for high-throughput plant sequencing projects.
The comparison was based not only on criteria such as the specificity, level of enrichment, and the depth of coverage uniformity associated with the approaches, but also the ability to scale these enrichment strategies up to match the sequencing capacity of existing sequencing platforms.
"The major departure of these methods from their historical roots is in modifications to accommodate large targets (kilobases to megabases) to capitalize on the high capacity afforded by [next-generation sequencing] platforms," authors of the AJB study explained.
Pros and Cons
Based on their analyses, for example, the team concluded that PCR-based enrichment remains feasible for targeting small to medium-sized regions of the genome.
For high-throughput sequencing of larger or more numerous regions, though, the researchers suggest that the efficiency of PCR falls off, since it can take tens of thousands of PCR amplicons to actually approach the capacity of high-throughput sequencing instruments.
"If you're not using the capacity, then these become much more expensive platforms," Cronn noted.
For example, the team estimated that the cost per sample to sequence 50,000 bases of DNA from 96 samples with 500 base pair amplicons would be $118. That jumped to more than $1,800 per sample when targeting 500,000 bases of DNA.
Microfluidics-based multiplexing of short PCR amplicons or using longer amplicons can bring the price of enriching long stretches of DNA by PCR down somewhat, they reported, but are still predicted to be more expensive on a per-sample basis than other enrichment methods when looking at half a million or more bases of sequence.
And beyond the price of reagents alone, Cronn noted that researchers ought also to consider the frequency of failure for each approach as well as the time needed to complete experiments using each method.
"It's very easy to do a simple experiment and pull down a large number of targets by hybridization, whereas acquiring that same number of targets by PCR really takes a lot of time," he said. "It really becomes an issue of managing missing data."
In that respect, targeted enrichment via hybridization to arrays or custom probes offers an advantage, Cronn said, calling hybridization an "efficient way to scale up a very large number of targets and to use the capacity of our current generation of sequencers."
"The confluence of high-density oligonucleotide synthesis and [next generation sequencing] technologies has set the stage for transforming hybridization into a capture method with broad potential in the plant sciences," he and his colleagues noted, "and one that is likely to displace PCR from a starring role in targeted enrichment."
Hybridization-based enrichment methods also seem to have the potential to simultaneously pull down multiple copies of the same gene or genome region, which may help in dealing with polyploid plant species.
"We have great evidence right now that these enrichment techniques — primarily the hybridization-based techniques — do pull down all the members of a gene family that have been duplicated by polyploidy," said Cronn, who collaborates with researchers working on octoploid strawberry plants. "So they're going to be very powerful in that regard."
Meanwhile, the team noted that restriction enzyme-based enrichment methods appear to be especially amenable to tracking down SNPs and doing comparative studies between different plant populations or species.
These include restriction-site-associated DNA, or RAD, tagging; genomic reduction based on restriction site conservation, also known as GR-RSC; and genotyping-by-sequencing, or GBS.
"These methods rely on the discriminatory power of the restriction endonucleases to produce homologous restriction fragments among the individual samples being assayed," authors of the AJB study explained. "When paired with [next-generation sequencing] platforms, these methods provide a cost-effective means to identify large numbers of high confidence SNPs with broad applications across diverse genomes."
The team cautioned that there may be complications associated with trying to apply restriction enzyme-based strategies to polyploid plants and noted that the necessary depth of coverage and price per data point for restriction enzyme approaches can vary depending on the amount of genetic diversity present within a given plant population.
Finally, as in other fields, the group explained, transcriptome sequencing is finding favor among plant researchers who want to become more familiar with an organism's coding SNPs and sequences — and their repertoire of expressed transcripts — without necessarily taking on the entire genome.
While transcriptome sequencing can be pricey owing to the amount of sequence needed to cover transcript sequences at an adequate depth, the cost is going down rapidly with the advent of newer platforms and more multiplexed experiments, Cronn explained.
It also offers an advantage over other enrichment methods, since it can be tailored to a range of applications — from studies comparing tissues within the same plant to research on multiple plants from the same species or population and even for doing intra-species comparisons.
Though it's possible to nab targets in the genome without knowing specific sequences with "anonymous" methods such as transcriptome and restriction enzyme-based enrichment, there are instances in which such prior knowledge is crucial before researchers can attempt targeted enrichment — for instance, when designing specific PCR primers or hybridization probes.
That's where a complementary method called "genome skimming" comes in, Cronn explained. The approach, which some members of the same team outline in another paper in the same issue of AJB, involves doing very low coverage sequencing of plant genomes to find targets for subsequent enrichment experiments and more in-depth sequence analysis.
"It's not uncommon for people who are studying plants to be working on something where there simply are no genetic resources available for the species, the genus, or sometimes the plant family," Cronn said. "In order to be able to fill that information gap — in order to get the resources to get started — the genome skimming approach is a great idea."
Even coverage of one-fold or lower can find targets in the genome for subsequent interrogation enrichment using any of the approaches, he explained.
Genome skimming seems to be most effective when it's possible to get good genomic depth at a very small price — an area where massively parallel platforms such as Illumina and SOLiD stand out.
These short-read platforms are also compatible with all of the targeted enrichment methods that are currently being used for plants, Cronn said, though he noted that the longer read lengths offered by the Roche 454 instrument can be advantageous for those doing amplicon sequencing.
"At least historically, there's been an advantage for amplicon sequencing on the 454 because of the potential for longer read lengths," he said. "For the other methods — for the hybridization-based enrichments, for the restriction fragments, for transcriptomes — the short-read sequencers are more than adequate for taking advantage of those kinds of enrichment methods."
Learning from Experience
Cronn predicted that as newer and newer sequencing platforms become available, targeted enrichment methods that don't scale up accordingly will become less appealing for those doing plant genomics.
In his own lab, which is involved in sequencing studies on a number of tree species, including Douglas fir, aspen, and tan oak, Cronn said researchers are using all four of the targeted enrichment methods discussed in the paper.
"We started where many people did, which was with PCR," he explained. "We quickly saw that PCR did not scale with growth of the [sequencing] machines, so enrichment through hybridization really became our focus."
The team began developing its own hybridization-based enrichment methods in 2008, he noted, and by the following year had come up with a low-cost protocol for doing hybridization-based enrichment with reagents on hand in the lab or easy to obtain such as PCR primers.
They are now focusing more energy on transcriptome sequencing, including methods that combine transcriptome and hybridization-based enrichment strategies, Cronn said. "We're actually taking hybridization probes and pulling them out of transcriptomes so that we can look at a specific slice of the transcriptome."
The group has no plans to commercialize its own hybridization method, though Cronn predicted that there may be opportunities for companies interested in tailoring some of the existing targeted enrichment products to specifically suit the needs of plant researchers, particularly in the area of probe development for hybridization enrichment experiments.
"Instead of looking at a small number of individuals for a few million bases of targets, [plant researchers] might prefer to look at a thousand individuals for a few hundred thousand bases of targets," Cronn noted. "There may be a real opportunity, kind of a niche market, for the people who synthesize these probes to serve the plant biology community."
Have topics you'd like to see covered in In Sequence? Contact the editor at anderson [at] genomeweb [.] com.