At the Advances in Genome Biology and Technology conference in Marco Island, Fla., last week, Illumina highlighted new applications for its HiSeq and MiSeq sequencers that are enabled by technical improvements and provided some insights into the future development of the two platforms.
The company also featured the Moleculo long-read technology that it acquired last month (IS 1/8/2013) and previewed several sample-prep kits that it has in development.
A year ago, Illumina introduced the HiSeq 2500 and presented data at last year's AGBT showing that it could be used to sequence and analyze the genomes of newborns with life-threatening diseases of unknown origin within 50 hours (CSN 2/22/2012), using the instrument's rapid run mode.
At this year's meeting, Illumina provided proof of principle that this timeline can be further reduced, generating results in less than 24 hours on the existing HiSeq 2500, including sample preparation, sequencing, alignment, and variant calling. "We think this is very significant and going to further drive the application of sequencing into the clinic," said Geoff Smith, Illumina's senior director of scientific research, during a company workshop.
At a conference user meeting, Jason Betley, director of technology development at Illumina's UK site, said his team recently sequenced and analyzed a HapMap sample, CEPH NA12878, on the HiSeq 2500 at greater than 40x coverage in just under 24 hours.Sample prep, which was PCR-free and used 100 nanograms of DNA, required 2.5 hours, following by an 18.5-hour sequencing run with 2x100 base pair reads, which generated about 140 gigabases of data. The analysis, which used Illumina's iSAAC high-speed aligner, generated BAM files and VCF variant files in just under 2.5 hours.
As an example of an application for the 2500's rapid-run mode in non-invasive prenatal testing, Betley mentioned a collaboration between Illumina and the Great Ormond Street Hospital in London. The hospital provided Illumina with 48 samples of circulating DNA from maternal plasma from pregnancies suspected of aneuplodies, which company researchers sequenced within 8 hours using 36 base pair and index reads. Including post-sequencing analysis, the experiment took 9 hours, and the results completely matched both the known outcomes and results generated on the HiSeq 2000.
Illumina also tested its recently-launched TruSeq PCR-free sample prep on the 48 samples, starting with DNA from only 5 milliliters of plasma, and generated sufficient library material for 10 HiSeq lanes. Company researchers sequenced the samples in a single HiSeq 2500 rapid run, using 2x100 base pairs and generating 120 gigabases of data.
Betley did not mention whether Illumina plans to introduce HiSeq rapid-run testing at Verinata, the prenatal testing firm that Illumina recently acquired.
As previously mentioned, Illumina plans to increase the output of the HiSeq by the second half of this year, through improvements in reagents and software, to up to 300 gigabases in 60 hours using 2x250 base pair reads. This, Betley said, will open additional applications for the platform.
One new area is metagenomics, where long-read platforms like the 454 GS FLX have dominated the field. As an example, he cited a collaboration between Illumina and Radboud University Nijmegen in the Netherlands to study the microbiome in an in vitro model of the proximal large intestine, using 2x250 base pair HiSeq reads. The goal was not only to identify bacterial species but to identify genes within specific bacterial genes for functional analyses. He said that starting with read lengths of about 200 bases, the scientists were able to see "a very large number of those functional genes that [we] were not observing before."
Stephan Schuster, a professor at Penn State University and Nanyang Technological University in Singapore, provided another example of the use of long HiSeq reads for metagenomics during the user meeting. Illumina sequenced four libraries from environmental slush samples for him on the HiSeq 2500, using tightly size-controlled inserts of 480 base pairs.
From the paired-end reads, they generated contiguous reads of up to 480 base pairs, with an average read length of 400 to 440 base pairs and a sequencing error of no more than 1 percent at the junction.
The majority of the reads could be assigned to the species level, Schuster reported. While moving from 100-base pair to 250-base pair reads did not make much of a difference in mappability, he said, doubling the read length to 480 base pairs improved things "a lot."
He estimated the cost of generating the HiSeq data – about 180 million reads, or 90 gigabases – to be about $5,000. The same data would have required 30 runs on the MiSeq, costing about $45,000, he said, and the same amount of data on the 454 GS FLX+, which generates reads up to 1,000 base pairs in length, would have cost about $774,000.
Illumina intends to increase the read length for the HiSeq even further and has already conducted experiments with 2x400 base pair reads in its R&D laboratory. Betley showed one example of that, where Illumina researchers used a PCR-free library with a median insert size of 650 base pairs, which they loaded onto a single HiSeq rapid run flow cell, generating more than 180 gigabases of data from two lanes, with 70 percent of the bases reaching a quality score of Q30.
Illumina bioinformaticians then generated "mini-contigs" of up to 800 base pairs from the data, which they aligned to the human reference genome, accepting only perfectly aligned reads. Of those that aligned, over 14 million were longer than 600 base pairs, and 128,000 between 700 and almost 800 base pairs.
As mentioned earlier, Illumina also plans to increase the output for the MiSeq later this year to 15 gigabases per run with 2x300 base pair reads and up to 25 million clusters per experiment. The increase in clusters will allow users for the first time to do exome sequencing on the MiSeq, using 2x75 base pairs and taking less than 13 hours. Even longer 2x400 base pair reads for MiSeq that would increase its output to 20 gigabases per run are currently in R&D.
Smith mentioned during the workshop that Illumina has also been exploring the use of ordered arrays, which would increase the cluster density further and reduce turnaround time. He said that the company has "seen good progress" on this.
Moleculo
Many conference attendees expressed an interest in Illumina's new Moleculo technology, which the firm acquired earlier this year. Moleculo provides accurate long reads up to 10 kilobases in length that are assembled from short Illumina reads.
During the company's workshop, Smith explained that the process starts with about 1 microgram of DNA, which is fragmented into 10-kilobase pieces, to which adaptors are ligated. After clonally amplifying the DNA by long PCR, the amplicons are fragmented again using the Nextera technology, a process during which each amplicon and its fragments receives a barcode. He did not mention whether or how the long fragments are separated prior to this step. After that, the Nextera fragments are pooled and a single library is generated for sequencing on a single HiSeq flow cell lane using 2x100 base pair reads, generating about 30 gigabases of data.
Those data are processed in one of two ways: either Moleculo's algorithm assembles the short reads derived from each 10-kilobase fragment into long reads, or a phasing algorithm uses Moleculo reads in conjunction with standard 30x coverage of a genome to produce a phased genome.
Illumina has generated Moleculo libraries for a variety of organisms, including humans, flies, plants, and metagenomic samples, and has found the median length of the assembled reads to be about 8 kilobases. It is still assessing the accuracy of the long reads, Smith said, and plans to provide data on that soon. He said the company has seen a representation bias "away from AT DNA" in some libraries, but no such bias in others, and is still studying these effects.
He showed data for chromosome 12 of the CEPH NA12878 sample, where Illumina researchers were able to anchor an 8-kilobase long read in its correct position despite repeat elements in the chromosome, because the long read contains unique flanking sequences. "Using the Moleculo technology, we will be able to bridge what is effectively a large complex repeat," he said.
To obtain phasing information for the same sample, Illumina combined 120 gigabases of paired-end short-read sequence data with about 30 gigabases of Moleculo reads. The same information could be gained by assembling Moleculo reads only, Smith said, but combining the two data types is "more economical" because it requires lower depth of coverage with Moleculo reads.
As an example of phasing, he showed two Moleculo reads covering 4 kilobases in total, noting that "with those two reads, it's very easy to distinguish the two different haplotypes and to be able to identify the phased heterozygous SNPs." In another example, Moleculo reads allowed the researchers to phase two HLA genes.
Preview of New Sample Prep Kits
Illumina's previously announced Nextera rapid exome capture kit will be available in April, allowing users to prep up to 96 exomes in parallel within 1.5 days. The kit, which was designed in collaboration with the Broad Institute, will only require 50 nanograms of input DNA and will cover 37 megabases of content. An expanded version will also be available that contains additional functional regions. The new kits will enable users to sequence 110 exomes in 8 days on the HiSeq in standard mode, and 20 exomes in 20 hours on the HiSeq 2500 in rapid-run mode.
Smith said that Illumina has also been exploring the Nextera technology, which fragments DNA enzymatically, internally for bacterial sequencing, where it has become "a very standard method." The technology has also simplified mate-pair production. "For the first time, you can do de novo assembly of bacteria into a single scaffold from a single prep, starting with just 1 microgram of DNA material," he said.
Also as previously mentioned, Illumina plans to release a TruSeq targeted RNA expression kit this quarter for the quantitative analysis of specific RNA targets on both HiSeq and MiSeq. The kit will allow users to analyze between 12 and 1,000 targets across tens to hundreds of samples, depending on the number of targets. Panels can be user-designed, but Illumina will also offer fixed-content panels. The cost per sample will be "considerably lower" than equivalent qPCR or TaqMan assays, Betley pointed out, and the entire workflow will take less than 2 days and require only 50 nanograms or less of total RNA as input.