Three vendors of next-generation sequencers provided updates on technical improvements and new applications for their platforms at last week’s Genomes, Medicine, and the Environment Conference in San Diego.
454’s Michael Egholm, vice president of research and development, predicted that with the currently available Genome Sequencer FLX, scientists will soon start sequencing and assembling genomes of eukaryotes like fruit, vegetables, and insects.
What will be helpful in these projects is a new strategy for long, “Sanger-like” paired-end reads of 110 bases on each side, which 454, in collaboration with researchers at Yale University, recently published in Science (see In Sequence 10/2/2007).
Egholm also provided an update on “Project Jim,” 454’s genome sequencing project of Jim Watson. Though the company and colleagues at Baylor College of Medicine completed the project in May, handing Watson a copy of his DNA sequence on a disk (see In Sequence 6/5/2007), getting the results published in a high-profile journal has been held up by the politics of a split review panel.
While half of the referees said the project was “really great,” the other half characterized it as a “cheap publicity stunt,” according to Egholm.
In the meantime, Lincoln Stein, a researcher at Cold Spring Harbor Laboratory, has created a genome browser to view Watson’s genome, which is available here.
“It’s hard to say anything meaningful about a single human genome,” Egholm admitted. But a new method to select and enrich parts of the human genome, for example all exons, means that “routine human sequencing will be getting feasible” soon, he said [see other feature in this issue).
Researchers at Baylor, working with Nimblegen, another Roche subsidiary, developed the method and published a description in Nature Methods this week. They used 454’s platform for sequencing the enriched DNA.
Improvements to this method, as well as 454’s platform, will allow researchers to scale this approach to “thousands of genomes,” Egholm predicted.
454’s research and development team is currently working on increasing the output, read length, and accuracy of its FLX platform, which the company plans to launch next summer as a system upgrade (see In Sequence 10/9/2007). The scientists are already reaching a high-quality read length of 380 base pairs and are working to “push it out further,” he said.
They are also working on doubling the number of wells per picotiter plate to 3.4 million and hope to fill and recover about three-quarters of these, instead of half, Egholm said.
These improvements would increase the output of the instrument from 400,000 reads per run to almost 2 million, he said. Assuming a read length of 380 base pairs, this would yield about 730 megabases of DNA per run.
Egholm also said that the company identified crosstalk between and within individual wells ― and not, as previously assumed, the fact that the platform’s chemistry is non-terminating ― as the “root cause” for homopolymer errors that have been plaguing the 454 system. He said the company has developed “new chemistries” that eliminate this crosstalk and “achieve the goal of an order of magnitude improvement” in errors in homopolymer regions.
According to Kevin McKernan, Applied Biosystems’ senior director for scientific operations for high throughput discovery, the launch of the company’s SOLiD sequencing system this month is going as planned. “Boxes are on airplanes,” he said during a talk at last week’s conference.
One of the first recipients is the J. Craig Venter Institute, In Sequence has learned, which is expecting a unit shortly. JCVI scientists plan to use the instrument to generate another 12X coverage of Venter’s genome (see Short Reads in this issue).
McKernan reported an increase in the commercial platform’s data output, to up to 4 gigabases per run. The company will officially launch the system at the American Society for Human Genetics annual meeting in San Diego later this month.
According to ABI’s latest specification sheet (available here), the first commercial version of the system will generate between 2 and 3 gigabases of single reads per run, and between 3 gigabases and 4 gigabases of mate-pair reads per run.
On an alpha-instrument in house, ABI researchers have even obtained up to 6 gigabases in a single run, using single 35-mer reads, McKernan reported.
He pointed out that the company’s ability to generate mate-paired libraries with a variety of insert sizes – according to the spec sheet, between 600 base pairs and 10 kilobase pairs ― makes the SOLiD system especially suitable for detecting structural genome variants, such as insertions, deletions, or copy number variations.
The company has been using mate pairs to analyze structural variants in a HapMap individual, an ongoing project in collaboration with Agencourt Bioscience. Agencourt, working with Evan Eichler at the University of Washington in Seattle, has deposited fosmids reads in GenBank. ABI has also used mate pair reads in a collaboration with Andy Fire at Stanford University to characterize nucleosomes, according to McKernan.
Half of the referees reviewing 454’s Jim Watson genome project said the project was “really great” while the other half called it a “cheap publicity stunt.”
The company is also collaborating with Shinichi Hashimoto from the Univerisity of Tokyo on using SOLiD for serial analysis of gene expression. The aim is to profile transcription start sites that may play a role in colon cancer. Hashimoto and John Edwards at Columbia University (see Transcript in this issue) are scheduled to speak during an ABI workshop on SOLiD at the American Society for Human Genetics annual meeting in San Diego next week.
At present, ABI is working on “instrument-agnostic upgrades” to the platform that will improve its raw accuracy, which is currently 97 percent, and increase the number of “productive beads” in the emulsion PCR process. ABI scientists have also obtained read length of up to 65 base pairs, McKernan said.
He said that a higher coverage of GC-rich regions that some scientists have observed was not due to a GC-bias of the system but to the shearing method used to make some fragment libraries. The same bias was not seen in mate-pair protocols, which the company recommends for most applications.
McKernan also alluded to further possible improvements of the company’s 2-base encoding error correction method, which improves the raw base accuracy from 97 percent to 99.94 percent. In the future, 3-base, 4-base, and 5-base encoding schemes could reduce the error rate even further, he said.
It is “very possible” that the system will reach an output of up to 8 gigabases within the next three months, he told In Sequence after his presentation.
According to Geoff Smith, senior director of enzymology at Illumina, the “vast majority” of applications for the company’s Genome Analyzer will move on to paired-end sequencing within the next six months.
In a talk at last week’s conference, he described the company’s paired-end approach, which Illumina plans to commercialize broadly by the end of this year. The system can use libraries with small gap sizes, ranging from 200 base pairs to 400 base pairs, he said, and generate 36-base reads on each end. It also only requires a microgram or less of DNA to start with, he said. In addition, Illumina is working on a paired-end library with a 2-kilobase pair insert, he added.
Illumina researchers, in a collaborative project with researchers from the National Reference Center for Mycobacteria at the Research Center Borstel in Germany and the University of Cambridge in England, have already used the small-gap paired end reads to sequence and analyze strains of M. tuberculosis, including clinical isolates from an outbreak of multi-drug-resistant tuberculosis in Uzbekistan, and were able to detect strain-specific variations, Smith said.
In a collaboration with the Wellcome Trust Sanger Institute and the National Human Genome Research Institute, they have also used paired reads to sequence purified X chromosomes, including SNPs and structural variants.
But the company also sees interest in its platform for human resequencing. For example, a consortium of researchers in China has used a Genome Analyzer to sequence a genome of an Asian individual, which was a first (see Short Read in this issue).
Also, this week, researchers led by George Church published a new method to selectively enrich human exons and sequence them on Illumina’s Genome Analyzer in Nature Methods in which (see related article in this issue). Yuan Gao, an assistant professor at Virginia Commonwealth University, who contributed the sequencing, presented results from the project during Illumina’s workshop last week. He has owned a Genome Analyzer since this spring (see In Sequence 3/8/2007).
In addition to genome sequencing, Illumina sees strong applications of its system in transcriptome analysis. For example, small RNA analysis is a “very popular application,” according to Gary Schroth, Illumina’s senior director for gene expression applications R&D, who presented during a company workshop at the conference.
For example, Illumina, working with Dave Bartel’s group at the Massachusetts Institute of Technology, has sequenced miRNA libraries from various developmental stages of C. elegans.
The company is also working on shotgun-sequencing full-length cDNA sequencing, a new application, Schroth said. This could provide new information on splice sites, for example.
At present, Illumina’s system generates more than 1 gigabase of data with single reads, and more than 2 gigabases with paired reads, but the company plans to increase this number to 4 gigabases in the “near term,” Smith said.