Life Technologies said last week at the Advances in Genome Biology and Technology conference in Marco Island, Fla., that it is planning to release its Ion Torrent Proton II chip mid-year, along with an upgraded version of its PI chip that will yield 16 gigabases of data with 200 base pair reads.
As it said at its Ion World meeting last year, the company is also planning a PIII chip that will initially deliver 64 gigabases of data with 100 base reads, but will eventually be able to deliver 256 gigabases of data with 400 base reads (IS 9/18/2012). Jonathan Rothberg, founder of Ion Torrent, said at AGBT last week that the company has yet to announce a launch date for that chip.
Aside from development of its Proton system, the company is working to improve the sample prep process, Rothberg said. As previously announced, it is developing an emulsion PCR-free sample prep system called Avalanche.
The company is working on several versions of the Avalanche system, including one in which the library construction could be done directly on the Proton chip. He said that company researchers have tested out the protocol, and were able to demonstrate a perfect read of nearly 600 bases.
Andy Felton, Life's senior director of product marketing, clarified that the company is still deciding how it will commercialize Avalanche — specifically whether the process will use a bead configuration or be done simply on the chip itself.
He did not provide an update on when Avalanche will be available for customers.
Rothberg said that more than 12,000 runs of the Proton have been completed so far, both from its customers and in the company's own lab.
Customer Data
One such customer, Joe Boland, said during an Ion Torrent workshop at AGBT that his lab at the National Cancer Institute has completed more than 200 runs on its six Protons and has sequenced more than 100 exomes. He said he has also tested transcriptome sequencing on the Proton, and is currently preparing a manuscript comparing exome sequencing on the Proton to exome sequencing on the HiSeq 2000.
Boland said the lab averages around 12 gigabases of data per 3.5-hour run and covers around 80 percent of the exome at 20-fold. This is an improvement over the 8 gigabases the lab was routinely generating as of last November when Boland presented at the American Society of Human Genetics meeting, two months after initially installing the systems (IS 11/13/2012).
Currently, the lab is able to sequence between four and eight exomes per day, running one exome on a chip. Sequencing runtime is around three and half hours with analysis taking an additional eight to 11 hours.
The lab recently completed an exome sequencing study of cervical cancer. From a cohort of 110 women, 24 were chosen for exome sequencing on the Proton and the remainder were screened with a custom AmpliSeq panel for the Ion PGM that included all known genes related to cervical cancer.
The exomes have now been processed, and the team is building a larger AmpliSeq panel that includes the new variants identified from the exome sequencing.
Regarding the comparison with the HiSeq, Boland said that while both platforms "performed well in detecting and calling SNPs," with 99 percent concordance between the two, there were still some differences and both platforms "had issues with indel detection," which will be the subject of a second manuscript.
For exome sequencing on the HiSeq, the NCI lab used Nimblegen's SeqCap v3 for exome capture, and on the Proton, they used Life's TargetSeq v2.
The Proton called around 27,000 SNPs versus around 33,000 on the HiSeq. After removing variants from segmental duplicate regions and simple repeat variants, and filtering for high-quality variants, they were left with 488 Illumina-specific variants and 290 Proton-specific variants. Boland said that most of the Proton-specific variants were misalignments due to homopolymers and false positives that are sometimes created when an adjacent site is a non-reference homozygote.
Additionally, he said the lab is looking to perform transcriptome sequencing on its Protons. For the first transcriptome that the lab sequenced on the system, the Proton generated 54 million reads, but ribosomal RNA levels were "through the roof," because the lab used a polyA enrichment step rather than ribosomal RNA depletion. Since then, the group has switched to rRNA depletion.
The lab is now in the process of doing an eight-sample study, for which it generated 300 gigabases of data, and is comparing bioinformatics pipelines from three companies — Seven Bridges Genomics, Station X, and Partek.
Agnes Viale, who runs the genomics core lab at Memorial Sloan Kettering Cancer Center, is taking advantage of the four Proton systems from the New York Genome Center that are currently being housed in her lab, primarily for transcriptome sequencing.
Viale also has one Roche 454 GS FLX; one 454 GS Junior; two of Life's SOLiD 5500 systems, one of which has the Wildfire upgrade; two Illumina HiSeqs, one of which has been upgraded to the 2500; one Illumina MiSeq; and one PGM.
The four Protons were installed in the lab last September, and in December the first transcriptome sequencing study was started, she said during a presentation at AGBT.
Ross Levine, an oncologist and researcher within Memorial's Human Oncology and Pathogenesis Program, was planning a transcriptome sequencing study to study expression of mouse models of acute myelogenous leukemia with FLT3/TET2 mutations. The double mutations had previously been linked to very poor prognosis, and the lab wanted to design a mouse model that it could use to test drugs and gain biological insight.
Levine's group wanted the 12 transcriptomes for the study to be completed quickly, so Viale suggested the newly installed Protons.
Libraries were constructed from 100 nanograms of total RNA. Next came rRNA depletion, followed by adapter ligation, reverse transcription, amplification, and size selection. Then the samples were sequenced.
Viale said the workflow, from sample prep through sequencing, could be completed in three days. If a library was started at 9 am on Monday, the One Touch could be started by 4 pm on Tuesday and run overnight. Bead enrichment could be done Wednesday morning with sequencing starting around noon, and raw data obtained by 4 pm, Viale said.
The 12 samples produced an average of 62 million reads per sample with 104 base reads, with about 5 percent of reads from rRNA. Between 85 percent and 90 percent of the reads mapped to the genome and between 68 percent and 75 percent mapped to transcription. Sixty percent of the data was at a quality score of 20 or above, Viale said.
The RNA-seq data showed that there was a "specific genetic signature" for the FLT3/TET2 mutant AML cases.
When the lab compared RNA-seq on the Proton to the HiSeq, Viale said the results were very similar, with a concordance of 0.92 for expressed genes.
Viale said the team is now testing RNA-seq protocols on formalin-fixed paraffin-embedded samples. The smaller library size of the Proton compared to Illumina might make the system well-suited for such samples, she said. The team tested samples in house and also sent samples to Life Tech. The groups had concordant results with turnaround times of three days.
Viale said the next application she plans to test on the Proton is ChIP-seq, which she said is similar to RNA-seq in that it is a tag-counting application. For such applications, Q scores are less important, she said.
For instance, in the case of a 100 base read, if there are three bases that don't match, the read will still map, so it can still be used to look at expression, she said. But with something like exome sequencing, you want the Q scores to be "as close to perfect as possible."
For ChIP-seq on the Proton, Viale said she would start with samples that have been run on the lab's SOLiD and HiSeq systems to validate the protocol on the Proton, but she then wants to look at the epigenetics of the FLT3/TET2 mutant samples and see how the epigenome relates to expression.
The New York Genome Center will spearhead the validation of the Proton for exome sequencing applications, using Agilent's SureSelect for exome enrichment, she said.
Viale said that she would ideally like to have different sequencing instruments dedicated to specific applications. For instance, the lab currently uses its HiSeqs primarily for exome sequencing and then does targeted resequencing on the SOLiD to validate those results. The 454 machines are mainly used for metagenomics, the MiSeq for library quality control, and the PGM to run targeted AmpliSeq gene panels. If the continuing Proton experiments go well, she will likely use that machine for RNA-seq and ChIP-seq projects.