Skip to main content

Targeted Enrichment Scheme Enables Parallel Sequencing of Multiple Chloroplast Genomes


A University of Florida team is touting a targeted sequencing strategy that uses custom RNA probes to enrich for chloroplasts from multiple flowering plants so that the chloroplast genomes can subsequently be sequenced in parallel.

The researchers described the approach in a protocol note published in BioOne's Applications in Plant Science. There, they demonstrated that they could successfully select for chloroplast genomes from two-dozen flowering plant species using enrichment probes targeting 22 known plastid sequences. The chloroplast genomes were sequenced to a depth of more than 700-fold each, on average, when the investigators multiplexed all 24 samples on a single lane of the Illumina GAIIx instrument.

Having determined that the existing probe set has what it takes to nab chloroplast sequences from a wide range of flowering plants, the group is getting set to use the targeted enrichment scheme to explore relationships within this branch of the plant tree. The method also appears to have promise as a means of assessing chloroplast variability within a plant species or population.

"There are a whole bunch of applications that this could be used for," senior author Matthew Gitzendanner, a biology and genetics researcher affiliated with the University of Florida and the Florida Museum of Natural History, told In Sequence. "We're just starting to explore where we can use it."

Chloroplast sequences are proving useful in studies of everything from phylogeny and population genetics to phylogeography, Gitzendanner explained, noting that these are "studies where you're not really after the whole genome, you just need some markers for looking at evolutionary relationships or patterns of genetic diversity.

"In many of those cases, it would be nice to have some of the nuclear genes," he noted. "But part of the problem with plants is that they have such large genomes that if you wanted to try to get a consistent set of nuclear genes you kind of need to do some enrichment or sequence at very high depth in order to get a consistent set of nuclear genes across multiple samples."

In the past, researchers had largely relied on approaches such as PCR amplification or chloroplast isolation to specifically look at plastid DNA. Even so, each approach has drawbacks when trying to look at the organelle's sequence in many plant samples and/or when the amount of plant tissue available is limited, as it is for some herbarium samples.

"Prior to this we typically would just use PCR and amplify a handful of genes and sequence those with Sanger sequencing," Gitzendanner explained.

"Another way would be to do a chloroplast isolation from the plant tissue, but that would require grams, usually, of fresh tissue. That kind of limited the number of samples that you could get that much DNA from and process through those methods," he added.

With the advent of high-throughput sequencing, it's also become possible to discern chloroplast sequences using reads from whole-genome sequencing experiments on a given plant.

For instance, the team speculated that they could sequence roughly 12 to 16 chloroplast genomes at once on the GAIIx using non-enriched genomic DNA from plant samples. That jumps to between three- and four-dozen chloroplast genomes on the HiSeq 2000 or 2500.

That's useful in some situations, Gitzendanner noted, though it can add considerable time, cost, and analytical investment to experiments that could otherwise be done using plastid sequences alone.

"Usually about 5 to 10 percent of reads from a genomic DNA prep will be from the chloroplast," he said. "So if you sequence a lane of Illumina for a sample, you'll probably get enough to get the chloroplast sequence out of it. But you're wasting 90 to 95 percent of the sequence — the nuclear DNA — if you're mostly interested in the chloroplast."

In their new enrichment scheme, Gitzendanner and his co-authors targeted a variety of flowering plant chloroplast sequences at once, using 120-base-pair custom Agilent SureSelect Target Enrichment probes designed by a company called Genotypic Technology in India.

This probe set was intended to correspond to almost two-dozen known plastid sequences representing eudicot plants, which comprise one of the two main flowering plant groups.

A similar strategy has been used to enrich for individual chloroplast sequences in the past, explained first author Gregory Stull, a graduate student in Gitzendanner's University of Florida lab.

But the latest iteration of the RNA probe-based method differs in that researchers designed probes corresponding to almost two-dozen "different, phylogenetically diverse, chloroplast genomes," making it possible to select for a range of chloroplast sequences and then sequence them simultaneously.

In a modification on the method recommended by Agilent, Gitzendanner noted that he and his colleagues have been barcoding and pooling their Illumina libraries prior to — rather than after — the selection step.

Authors of the protocol paper also noted that a newer version of the Agilent SureSelect kit has been released since their own experiments were performed.

For their proof-of-principle study, the researchers tested the RNA probe enrichment strategy using not only some of the eudicot plant species against which the RNA probes were originally designed, but also more distantly related eudicot species and even a few representatives from the other flowering plant group, the monocots.

The monocots were selected to "test how far you could get from the probe set [sequences] and still get good selection," Gitzendanner explained.

Indeed, the 22 plastid probes seemed to be well suited for pulling out chloroplast DNA from all 22 eudicots tested, as well as the two monocot plants, prompting speculation that the probe set would be useful for doing chloroplast enrichment across much of the flowering plant lineage.

"The success of this experiment illustrates the utility of the capture method in general and the broad applicability of the probe set in particular," authors of the study concluded.

"Furthermore," they added, "the broad phylogenetic utility of the probe set employed here makes this method applicable for plastome-based evolutionary studies across not only eudicots, but also monocots and potentially all angiosperms."

Given their success in enriching for DNA from both eudicot and monocot chloroplasts using the 22 existing probes, Gitzendanner explained that he and his team are now gearing up to do parallel chloroplast genome sequencing on a wide variety of plant samples that they have on hand.

The researchers have not attempted the approach in combination with another platform such as Illumina's HiSeq 2000, mainly because they already have enough — or more than enough — capacity to work with on the GAIIx.

For instance, the researchers estimated that they could get some 50-fold coverage per chloroplast genome, on average, by multiplexing 300 or more chloroplast genomes in a single Illumina GAIIx lane.

Even so, there is currently something of a bottleneck related to the limited number of distinct barcodes they're able to get their hands on, according to Gitzendanner. "Right now, as far as commercially available indexes go, it does seem that 96 is the most that any company is marketing."

To multiplex more samples, he speculated that a group would likely have to design its own custom barcodes or perhaps hold out for vendors to begin selling larger and larger barcode sets.

In the meantime, Gitzendanner argued that high coverage of the chloroplast genomes sequenced in a 96-sample-per-lane scheme — expected to average out at around 180x — could be useful in some instances.

In particular, he pointed to so-called spacer regions, which tend to diverge more from probe sequences than do coding sequences in the chloroplast genome. Because those spacers may provide information on subtle chloroplast differences within or between closely related plant species, high coverage may prove important in finding "every last little bit that's different," Gitzendanner explained.

So far the authors of the paper have not done any phylogenetic analyses with the sequence data generated with the RNA probe enrichment approach. But going forward they're keen to tackle a range of research questions, from family-level phylogenetic analyses to analyses that stretch across the flowering plant lineage.

"We have various taxa that either we don't have very good data for or that are in phylogenetically difficult areas of the tree, where we're still a little bit unsure of the relationships," Gitzendanner said. "We're trying to get more data to throw at those questions and resolve some of the branches of the tree of life that are still not quite as well resolved."

The team may eventually explore the feasibility of incorporating some nuclear probes into the enrichment system, Gitzendanner noted. "If we can make some nuclear gene probes we could add those to the chloroplast ones and then have a probe set that would select both the nuclear and the chloroplast."

The Scan

Pfizer-BioNTech Seek Full Vaccine Approval

According to the New York Times, Pfizer and BioNTech are seeking full US Food and Drug Administration approval for their SARS-CoV-2 vaccine.

Viral Integration Study Critiqued

Science writes that a paper reporting that SARS-CoV-2 can occasionally integrate into the host genome is drawing criticism.

Giraffe Species Debate

The Scientist reports that a new analysis aiming to end the discussion of how many giraffe species there are has only continued it.

Science Papers Examine Factors Shaping SARS-CoV-2 Spread, Give Insight Into Bacterial Evolution

In Science this week: genomic analysis points to role of human behavior in SARS-CoV-2 spread, and more.