Researchers have started to use next-generation sequencing platforms in a wide variety of projects, such as gene expression studies, chromatin characterization, genome resequencing, and structural variation analyses, according to presentations at last week’s Biology of Genomes meeting at Cold Spring Harbor Laboratory.
“I think you can see from the session, a year makes a lot of a difference,” said Mike Snyder, a professor of molecular, cellular, and developmental biology at Yale University, in his introductory remarks for a meeting session dedicated to high-throughput genomics and genetics.
“From what we were doing a year ago to what’s going to show up in today’s talks is pretty amazing. Many of the talks will actually use the new sequencing technologies, and a lot of them will focus on variation and also mapping regulatory sequences,” he said.
At the time of last year’s meeting, only 454’s platform was in the hand of users, but this year, several researchers presented projects involving Illumina’s platform, and one presented data generated on ABI’s SOLiD system at the session.
The session kicked off with a talk by Elaine Mardis, co-director of the Genome Sequencing Center at Washington University School of Medicine, whose group has used both 454’s and Illumina’s systems in a variety of projects. “I hope this session leaves you with the same excitement that I feel about the power of next-generation sequencing technology,” she said.
For example, researchers in her lab have used 454’s platform to sequence full-length cDNAs from different organisms. This can not only help to discover novel expressed sequences and splice variants, but also to annotate genomes of organisms that have been sequenced but where no close relatives have been sequenced yet. One example is platypus, an egg-laying, duck-billed mammal, which the Wash U group is currently studying.
As a test run to resequencing entire human genomes, Mardis’ group has also resequenced C. elegans strains on Illumina’s sequencer, using both fragment reads and paired-end reads.
The researchers have now also used the platform to sequence a sample from a patient with acute myelogenous leukemia at 3.5-fold coverage and validated mutations in that genome that were already known from directed Sanger resequencing.
Finally, the group has used both 454’s and Illumina’s platforms for targeted resequencing of selected regions of the human genome. Mardis said that the results “look basically just the same” with both techniques.
Gabor Marth, in his presentation, focused on new bioinformatic tools to analyze next-generation sequence data. Marth, an assistant professor of biology at Boston College, noted that the increased throughput of the new technologies “is a blessing, but it also poses a number of formidable informatics challenges for a bioinformatician.”
To overcome these challenges, his group has developed several new software tools: A new base-calling program, called PyroBayes; a new sequence aligner and assembler called Mosaik that works for a large spectrum of read lengths; a genome viewer called EagleView; and an updated version of the group’s PolyBayes SNP-calling program.
He and his colleagues have tested these tools in a number of projects, for example a SNP discovery project in collaboration with a group at Cornell University and Mardis’ group that used low-coverage 454 data in to study 10 different Drosophila strains.
Other projects that used the software tools include a collaboration with Agencourt Bioscience that generated deep 454 coverage of a Pichia stipitis mutant (see In Sequence 3/6/2007), and the C. elegans strain resequencing project of the Mardis group that used deep Illumina coverage to discover SNPs and indels.
The C. elegans project “required a tremendous scale-up of our tools,” Marth remarked. The researchers also had to mask repeat regions of the genome to obtain the “resequenceable” part of the genome to which short reads can be uniquely aligned.
Both 454’s and Illumina’s platforms are very accurate for SNP discovery, he said. “The 454 [system] makes very few substitution errors, and Solexa data is great for SNP discovery and indel discovery.”
Several groups used Illumina’s Genetic Analyzer to characterize DNA from chromatin immunoprecipitation, or ChIP, experiments.
“The new sequencing technologies are giving us views into biology that just were not possible as of last year.”
“I also share Elaine’s excitement that the new sequencing technologies are giving us views into biology that just were not possible as of last year,” said Martin Hirst, a researcher at the British Columbia Cancer Agency Genome Sciences Center in Vancouver.
He and his team studied DNA sequences that are associated with certain histone modifications using ChIP followed by Illumina sequencing, in some cases starting with less than a nanogram of DNA. The researchers profiled six types of methylation in a model cell line.
The scientists also measured transcription in the same cell line using Illumina’s sequencer, and overlaid their data with the ChIP sequence data.
Hirst recommended validating the sequence data from the new platforms with other techniques, such as NimbleGen tiling arrays. “It’s a new data type, and I think it’s very important that we validate these data types,” he said.
Brad Bernstein, a professor of pathology at Harvard Medical School, also mapped the sites of methylated histones by coupling ChIP with sequencing on Illumina’s Genetic Analyzer. His results correlated well with array-based readout methods, or ChIP-chip, but sequencing provided both higher throughput and was cheaper, according to the abstract for his presentation.
In another study that coupled ChIP with Illumina sequencing, researchers from Rick Myers’ group at Stanford University and Barbara Wold’s lab at Caltech characterized the DNA binding sites of the neuronal repressor NRSF/REST, identifying new binding motifs for the protein. Their work, presented by Myers last week, is slated for publication in Science (see In Sequence 5/1/2007).
In the only presentation of the session that involved Applied Biosystems’ new SOLiD sequencer, Anton Valouev, a researcher in Arend Sidow’s group at the department of pathology at Stanford University School of Medicine discussed the analysis of nucleosome-DNA binding sites in C. elegans.
The researchers, collaborating with scientists at ABI, analyzed DNA fragments of about 150 base pairs by SOLiD sequencing, generating “multiple millions of reads” to generate a detailed nucleosomal map across the genome, according to the abstract.
Mike Snyder spoke about using 454 sequencing to map structural variations in the human genome, a collaboration between his lab and researchers at 454 Life Sciences. Array-based methods, he said, can only determine deletions or insertions, but not inversions or balanced translocations, which sequencing-based analyses should be able to detect as well.
His work involved a new method for generating paired-end reads from 3-kilobase fragments of genomic DNA on 454’s platform (see In Sequence 3/6/2007). Compared to fosmid paired-end mapping, an approach pursued by a large-scale NHGRI project (see other feature in this issue), Snyder’s approach requires no cloning and has a resolution of 3 kilobases instead of 40 kilobases. But the 454 reads —110 bases on either side — are “often just barely long enough for unique mapping,” according to his presentation.