Researchers in Germany have streamlined a sequencing-based strategy for mapping plant genes by folding in an additional deep-sequencing step.
In a study appearing online in the journal Plant Physiology in July, researchers from the Max Planck Institute for Plant Breeding Research and the Cologne Center for Genomics started by doing Illumina-based mapping-by-sequencing on Arabidopsis thaliana plants from a single genetic background to look for genes that could compensate for the loss of a chromatin-related Polycomb repressive pathway gene known as lhp1.
After narrowing in on a few candidate sites suspected of suppressing the lph1 mutant phenotype during this first screening stage, the researchers relied on Life Technologies' Ion Torrent PGM instrument to find the causative allele, using a newly developed method they dubbed "deep candidate re-sequencing," or dCARE.
The idea came from the realization that it was possible to get closer and closer to a causal mutation in such a screen by doing increasingly deeper sequencing of affected plants, explained senior author Franziska Turck, a plant developmental biology researcher at the Max Planck Institute for Plant Breeding Research.
"One bottleneck, of course, is the price of sequencing," Turck told In Sequence. "So why not do it in two steps: go for lots of deep sequencing in the first step to identify a region, but then map all of the SNPs in that region and go for those SNPs by more selective deep sequencing, so you get more precise quantification."
Many conventional gene-mapping studies in the model plant A. thaliana involve crossing plants from genetic backgrounds. Because these plants carry some distinct, genetic background-specific markers, researchers can then follow the segregation of genes and phenotypes of interest over generations.
While that approach is often useful, the reliance on multiple genetic backgrounds can be a hindrance at times, Turck explained. For instance, researchers who are interested in screening for genetic changes that compensate for or enhance the phenotype caused by a specific mutation must first find or make plants that carry both the mutation of interest as well as appropriate genetic markers.
"You had to have the mutant available in the marker line for enhancer-suppressor screens," she said. "If you didn't have that, you had to integrate [the mutation] quite tediously into a marker line."
And even when such lines are available, Turck added, natural variation between plants from different genetic backgrounds can sometimes obscure traits of interest.
"When you used these distinct accessions belonging to the same species for marker crosses, you had a lot of other phenotypes that are segregating because of the natural variation between the accessions," she explained. "Especially if you had a very weak phenotype, you wouldn't find your phenotype anymore — it was lost in the morphologic variation that you observe in the F2 [second generation]."
To get around this problem, Turck and her team turned to a method known as mapping-by-sequencing using "isogenic" plants from a single A. thaliana genetic background.
Rather than following markers through plant crosses, the approach involves introducing random alterations into a parental plant line — for instance, using the mutagen ethyl methanesulfonate, or EMS. Plants that come out of this mutation step with interesting phenotypes can then be resequenced and compared to unmutagenized parental plants to find genetic variants and genes suspected of contributing to traits of interest.
"The fact that we use EMS-induced changes to map allows us to stay in exactly the same parental line," Turck said. "We can use the same parental line to map the mutagenized offspring."
"That gets rid of the problem of not having the mutation in another accession and it gets rid of all the phenotypic variation," she added.
Using this approach, for example, it's possible to quickly screen for genetic changes that exacerbate or minimize the phenotype associated with a given mutation.
It was that type of enhancer/suppressor screen that Turck and her colleagues were keen to try in Arabidopsis plants with mutations to lph1, a gene believed to have a central role in the Polycomb pathway they work on.
A few groups have published similar isogenic mapping studies, Turck noted. Where her group's strategy differs, though, is in its use of a second, deep sequencing step to narrow in on causal mutations within the set of candidate changes that fall out of the original mapping-by-sequencing screen.
Typically, such candidate mutations are tested through in-depth analyses that look at how mutations segregate across generations and/or with experiments aimed at trying to complement potentially causal mutations in transgenic plant lines.
Because such approaches are time consuming and labor intensive, though, Turck and her colleagues wanted to find a way to simplify this process. And that's where the deep-sequencing step came in.
For their current study, for instance, the researchers selected nearly 300 EMS-mutagenized plants made in the lph1 mutant background that no longer had the physical features of plants mutated for lph1 alone.
Genetic material from these plants was sequenced to a depth of around 40-fold coverage on the Illumina GAIIx platform and compared to resequenced DNA from the leaves of 48 plants in the parental lph1 mutant line.
That initial screening step led to five candidate regions, and the team decided to focus on the three candidate sites that fell within protein-coding genes for their subsequent dCARE analyses.
After using PCR to amplify short pieces of DNA surrounding each of these variants, the researchers generated between 5,000 and 20,000 reads per amplicon with the Ion Torrent PGM platform using the 316 chip.
Based on the allele frequency data for the SNPs, the team was able to identify mutations in a transposase-related gene called alp1 that suppressed the effects of the lph1 mutations.
The two-step method would probably work with as little as 20-fold coverage in the first step, Turck noted. But, she explained, having lower coverage during that screening stage of the analysis would likely leave more regions to test by dCARE in the second step, meaning more primer design, PCR, and amplicon sequencing.
For its part, the Max Planck team is continuing to search for enhancers and suppressors that contribute to Polycomb repressive pathway activity.
As new enhancers and suppressors of the lhp1 mutation are revealed through this type of mapping, Turck explained, it provides an opportunity to spread their search for genetic and functional interactions ever outward by screening for genetic changes that influence the phenotype associated with mutations in each new gene as it is identified.
While the team is continuing to work with Arabidopsis plants for its own studies, Turck noted that the mapping-by-sequencing plus dCARE strategy should be useful for genetic mapping studies of any plant with an adequately sequenced and assembled reference genome.
The team has not tested its method on plants with very large or complicated genomes, which may pose a challenge during the population sequencing step used to find candidate mutations. But Turck speculated that there may be ways around such problems, if and when they arise, by adding in even more sequencing-based screening steps prior to the dCARE analysis.
At the moment, the depth of coverage that's possible on the Illumina instruments offers an edge during the initial pooled plant DNA-screening step. But that coverage can be overkill during the amplicon sequencing step, making sequencing speed the main priority there.
"The [Illumina] HiSeq is great, but now it's actually, for many applications, delivering too much [data]," Turck said. "Then it's a matter of what is fastest and what is the cheapest because, in a way, it doesn't matter what you use."
Consequently, quick turnaround time for the Ion Torrent PGM proved especially useful for the current study, according to Turck. "One advantage I see is that it's so fast," she said. "PCR one day, making the library the next day, [a sequencing] run overnight, and the third day or fourth day you analyze."
One consideration when alternating between different platforms for the deep sequencing step is the length of the amplicons used to look at candidate regions, which may have to be adjusted somewhat to remain compatible with the read length and sequencing strategies available for each platform.
In the case of the PGM platform, for instance, the primers that the team designed to look at each candidate region had to be within 50 base pairs of the primer to be included in a single-end read on the platform.
"We had to design the primers so that the SNP that we wanted to read was, of course, after the primer but before the limit of the read length from the Ion Torrent," Turck said.
"We don't need to read the entire amplicon," she added, "we just need to score the SNP."
Otherwise, though, the dCARE step is expected to be compatible with any of the sequencing technologies available, including the Illumina MiSeq, the company's speedy, lower capacity "personal sequencer."
Turck and her team have not had access to a MiSeq instrument yet. But since completing research for the Plant Physiology study, they have done additional mutant identification studies using the Illumina platform for screening and either the Illumina GAIIx/HiSeq or Ion Torrent PGM platforms for the dCARE step.
"We take the [platform] that is the fastest available," Turck noted. "So either 50- or 100-base pair runs, either paired-end or not, it doesn't really matter, you still get nice coverage."
By barcoding samples on the Illumina HiSeq instrument during the first stage of the mapping-by-sequencing experiment, meanwhile, the researchers are currently able to screen four mutant Arabidopsis lines at a time for around €800 ($1,020) per mutant line.
"It seems expensive when you have to pay the bill for the sequencing," Turck said. "But if you think about how much time you gain, it's amazingly cheap, actually."