Monsanto researchers have used Life Technologies' Ion Torrent Personal Genome Machine to sequence the genome of a succulent plant called Sedum album, Todd Michael, head of Monsanto's Genome Analysis Center, reported during a Life Technologies workshop at the Plant and Animal Genomes conference in San Diego earlier this month.
The work is expected to provide a resource for studying succulent plant genomes in general — and provide clues about stress response mechanisms in S. album, which uses a standard photosynthesis strategy known as C3 photosynthesis when it has enough water but transitions to a water-saving photosynthesis method known as Crassulacean acid metabolism, or CAM, in times of drought.
The S. album project is also laying the foundation for an extensive comparison between the utility of existing platforms for sequencing plant genomes, Michael told In Sequence.
While his PAG presentation focused primarily on findings that could be gleaned from using the Ion Torrent instrument to sequence the plant's genome and transcriptome, Michael said he and his colleagues have already sequenced the S. album genome on the SOLiD 5500XL, Roche 454, Pacific Biosciences RS, and Illumina's HiSeq 2000 and MiSeq.
"We wanted to sort of do a 'bake-off' of all the platforms, in terms of which ones would provide the best tools to sequence plant genomes, and, in addition, to look at whether we saw any difference in error profiles that would allow us some power in plant genomes," Michael explained.
The researchers are still in the process of analyzing and comparing that data, but plan to present a poster that should offer more information on the platform comparison at the upcoming Advances in Genome Biology and Technology meeting in Marco Island, Fla.
One-Month Turnaround
Sedum album, commonly known as white stonecrop, is a succulent plant that uses CAM to fix carbon dioxide at night at times when water is scarce, minimizing water loss due to evaporation as leaf structures known as stomates open to allow the carbon dioxide used for photosynthesis into the leaves.
"Sedum album was originally identified as a facultative CAM plant, which means, basically, that under favorable conditions it fixes carbon dioxide through the C3 pathway during all times of the day," Michael said. "However, during times of drought or decreased water availability, it basically switches its carbon fixation to the dark period of the day."
The plant also has the smallest estimated genome size among the succulent plants included in the Royal Botanical Gardens' Kew database, he added, making it an appealing candidate for sequencing-based studies.
In an effort to explore the C3-to-CAM drought response and its relationship to the plant's diurnal and circadian gene networks, researchers decided to use the Ion Torrent PGM to quickly generate both genome and transcriptome data for the plant.
Given the speed and throughput of the PGM instrument, Michael noted, the team was able to complete most of the experimental steps of the study — from getting the plant all the way through to doing genome and transcriptome sequencing — within about a month.
"In that amount of time, I was able to ask the question: what are the transcriptional mechanisms that are happening in the [drought-grown] versus the well-watered plant?" Michael said.
"There are lots of tools that make this pretty straightforward," he added. "The software associated with the PGM is really easy to use."
Using a white stonecrop plant obtained from a local store, the researchers made libraries from the plant tissues and sequenced these libraries over about five days using two PGM instruments.
In 20 PGM sequencing runs — about four runs per day on each instrument — they generated enough sequence to cover the 121 million base S. album genome an average of 45 times.
When they started the project, the group was running the Ion 314 chip on the PGM, Michael noted, though they switched to the higher-throughput 316 chip and, eventually, the 318 chip for subsequent stages of the study.
At the moment, they are able to get roughly a billion bases of sequence per run using the PGM and the 318 chip — throughput that is most comparable to Illumina's MiSeq, Michael said. At around $1,000 per chip, the price per base is also similar, he added, though preliminary analyses suggest that the MiSeq data may be slightly higher quality.
There are some differences in run time, as well, with a PGM run taking a modest five hours or so compared to about a day per run for the MiSeq, though the sample preparation time is typically shorter for MiSeq than for PGM.
Over the course of their white stonecrop project, they also went from using a 100 base pair protocol for PGM to a 200 base protocol that seems to produce not only longer, but also better quality reads, Michael noted.
After doing error correction with the SOLiD Accuracy Enhancement Tool, or SAET, a Life Tech tool originally developed for the SOLiD platform that has more recently been updated for use with PGM data, the team removed adaptor sequences from the reads.
They then assembled the S. album genome using the CLC Bio assembler (IS 3/30/3010), which Michael called "an all-purpose kind of assembler" that offers "the best results across the board in terms of mixing data types in plants."
With contig N50 scores of around 1,400 nucleotides, the genome assembly is still quite fragmented, Michael said, though comparisons with transcriptome data suggest it adequately represents gene-coding regions.
"We're right around the right genome size," he explained, "but we're really just assembling the genic region of the genome."
For the gene-calling and annotation steps of the study, researchers relied on the program SNAP to predict gene models before running a Blast search for these predicted genes against the genome of the model plant Arabidopsis thaliana.
"It's sort of a quick and dirty annotation method to get an idea of what gene families there are," Michael said. "Then, when we did the expression analysis, I could see which of the genes were up- and down-regulated."
For these expression studies, researchers established cuttings from the original S. album, established roots for the cutting, and grew them under well-watered or drought conditions.
After at least 10 days of growth under each condition — the timeframe previously reported to induce C3-to-CAM switching — they then used the Ion Torrent instrument to sequence transcripts from each, tracking down nearly 1,200 genes that were differentially expressed between watered and drought-grown S. album plants.
Analyses of the transcript data are ongoing. But, as Michael reported at PAG, the search has unearthed at least one gene of interest: a gene known as FRIGIDA in Arabidopsis that controls flowering time and is more highly expressed when those plants endure cold stress.
Likewise, expression levels of the gene jumped in S. album during drought stress. While FRIGIDA showed little to no expression in the well-watered plants, it was one of the most highly expressed genes in S. album plants facing drought.
Platform Comparison
The transcriptome-sequencing arm of the white stonecrop project relied exclusively on the Ion Torrent platform, allowing for rapid comparisons of gene expression during drought adaptation.
The genome, on the other hand, has been sequenced with several different instruments at Monsanto — work that is expected to inform the sequencing strategies the center uses to tackle plant genomes in the future.
While he did not comment on how many sequencing instruments Monsanto currently has in house, Michael noted that the S. album genome has been sequenced at the genome center using the Ion Torrent PGM, Illumina's HiSeq 2000 and MiSeq platforms, the SOLiD 5500, and Roche 454 and PacBio instruments.
"It is a bit early to say how the other platforms performed," Michael said. In addition to presenting initial data at AGBT later this month, the team is also aiming to publish results from the study sometime this spring.
Michael said that researchers at Monsanto most frequently use the Illumina MiSeq and HiSeq instruments for genome sequencing. They also rely on hybrid assembly approaches that incorporate data from both short- and long-read platforms.
In the case of white stonecrop, for instance, the team expects that adding in different reads, such as those generated on the SOLiD or PacBio reads that have been "polished" with Illumina sequence data, will produce a more continuous and less fragmented assembly.
"We have all the other data types," Michael said. "We haven't put them all together yet, but that's the plan."
Have topics you'd like to see covered in In Sequence? Contact the editor at anderson [at] genomeweb [.] com.