After announcing two new sequencing platforms − the HiSeq X Ten and the NextSeq 500 − last month, lllumina and one of its early-access customers provided some performance data on the new systems at the Advances in Genome Biology and Technology meeting in Marco Island last week. Illumina also mentioned a number of future developments it has planned for the platforms.
The NextSeq 500 is a compact sequencing system with redesigned optical and fluidic systems and a new sequencing chemistry that allows for two-channel readout, where only two instead of four images per cycle are taken. It uses two types of flow cells that produce up to 40 gigabases or up to 120 gigabases of sequence data from 400 million reads and with read lengths of up to 2x150 base pairs. Run times are about 30 hours.
The data quality at launch will be Q30 for 75 percent of the data with 2x150 base pair reads, according to Vince Smith, senior director of scientific research and consumables product development at Illumina, who gave a presentation during a user meeting that preceeded the AGBT conference.
The NextSeq's new chemistry still uses four different dyes, he explained, but instead of four colors, it uses two shades of red and two shades of green. T is labeled with green, C is labeled red, half of A is a different red and the other half is a different green, and G remains unlabeled. The optical system records a red image and a green image.
Smith said that company researchers tried different combinations of dyes and nucleotides but the one they chose had the best performance in terms of data quality, sequence speed, and ease of manufacturing. In addition to using new dyes, the company also developed a polymerase that is optimized to incorporate them.
Clusters are "quite a bit" larger and brighter than on the HiSeq or MiSeq, and the sequencing and image acquisition occurs isothermally for the first time, he said, meaning no changes in temperature are required.
Apart from the two-channel readout, NextSeq uses "fundamentally the same chemistry" as the HiSeq and MiSeq, he added.
Illumina has tested the performance of the NextSeq for several applications. In about 60 test runs, it sequenced a bacteriophage genome, using 2x150 base pairs. More than 75 percent of the data exceeded Q30, the anticipated launch performance.
They also tested the NextSeq for human genome sequencing, comparing it to data from the HiSeq 2500. Smith said that the NextSeq is "very much in line" with the HiSeq in terms of sensitivity and precision for calling SNPs.
In terms of genome coverage, the two platforms are "very much equivalent" in many regions, including cancer genes, exomes, and regions of high and low GC content, he said, and in some areas the NextSeq is "slightly better" than the HiSeq.
Illumina also compared exome sequencing on the NextSeq and HiSeq and found the results to be "very similar." RNA-seq data on the NextSeq correlated closely with both HiSeq and MiSeq data, and coverage between the three platforms is "almost indistinguishable" for this application, he said.
The Broad Institute currently has two NextSeq 500 systems installed. During an Illumina-sponsored workshop at AGBT, Sheila Fisher, director of operations and development of the genomics platform at the Broad Institute, said that the system has cost advantages over the HiSeq 2500 for small numbers of samples, and that the institute is considering it for use in its clinical laboratory.
To assess the NextSeq's data quality, Broad researchers sequenced a well-characterized human sample on the NextSeq for which it already had HiSeq 2500 data. The run took about 22 hours and generated 115 gigabases. Seventy-eight percent of the data had a quality score of Q30 or higher, and the overall error rate was 0.8 percent.
About 96 percent of the genome was covered well, and the GC bias was "exactly the same" as for the HiSeq 2500. Coverage across the genome was also "very similar" between the two instruments.
In another experiment, Broad researchers tested the NextSeq 500 for exome sequencing, multiplexing 12 samples per run, including cells lines and tumor/normal pairs. They sequenced them in two 18-hour runs with 2x76 base pair reads. About 84 percent of bases had a Q30 quality value and the error rate was 0.6 percent.
SNPs and indels called "essentially matched" those from the HiSeq 2500 data. "We're very excited to see this data right out of the box," Fisher said.
Smith said that the NextSeq system has been "designed with scalability in mind," implying that there might be higher-throughput versions in the future.
Illumina is "already working on the next generation of chemistry" for the platform, he said, screening new dyes and nucleotides to improve the data quality. One goal, he said, is to improve the intensity of the red and green dyes on the A nucleotide.
HiSeq X Ten
The HiSeq X Ten, which Illumina sells in sets of 10 and has not yet installed at a customer site, is optimized for human whole-genome sequencing, at an all-in cost of about $1,000 per genome.
It uses ordered arrays, a chemistry that is four times faster, and a camera that scans six times faster than that of the current HiSeqs. In addition, it has a more powerful computer to manage the increased data output.
Per run, which takes less than three days, each HiSeq X instrument produces up to 1.8 terabases of data with 2x150 base pair reads.
The platform uses patterned flow cells with nanowells on the top and bottom that define the location of the DNA clusters. A new cluster amplification chemistry prevents more than one DNA fragment from being amplified in each well, increasing the fraction of clusters suitable for sequencing, Johanna Whitacre, associate director of consumables product development at Illumina, explained during the user meeting.
She said the patterned flow cells are more robust to DNA input variation than other Illumina platforms, an "improvement customers will really like."
Illumina tested the performance of the HiSeq X Ten by running the same sequencing library on the HiSeq X, the HiSeq 2500 in high-output and in rapid mode. The HiSeq X had more than twice as many high-quality clusters than the HiSeq 2500, and the error rates were "very much on par" with the HiSeq 2000, she said.
Illumina also compared HiSeq X Ten runs from four different instruments. Each run generated about 2 terabases of data, with more than 80 percent having a quality score of at least Q30. Whitacre said the company is developing new Q-tables for patterned flow cells at the moment because it was underestimating the Q scores, so the scores are expected to go up in the future.
Each of the 64 lanes on the eight flow cells generated 124 gigabases of data on average, which is equivalent to 35x genome coverage. At least 75 percent of the data in each lane had Q30 quality scores.
Overall, the HiSeq X Ten has "equivalent" genome coverage, uniformity, and SNP calling to the HiSeq 2000 or 2500, she said. Genes and regions of high GC content were covered by the HiSeq X "as well or equivalent" to the HiSeq 2000, though the HiSeq X had a "small shortfall" in coverage of extremely AT-rich regions. "The development team is actively working to improve this coverage right now," she said.
The Broad Institute also tested the HiSeq X Ten, by shipping a well-characterized human test sample to Illumina for sequencing. After receiving the data, Broad researchers analyzed it using the institute's analytical pipeline.
The first run, conducted with 2x150 base pair reads, produced more than a terabase of data from a single flow cell in less than three days, with about 138 gigabases per lane. About 85 percent of the data had a quality value of Q30, and the error rate was 0.5 percent.
They also compared the data for one genome with data they had previously generated on the HiSeq 2500. Approximately 96 percent of the genome was covered well, and the mean coverage was 30x. SNPs, indels, sensitivity, and precision were "essentially the same" as for the HiSeq 2500, Fisher said.
The HiSeq X Ten has "slightly better" GC coverage and slightly worse coverage of very AT-rich regions, she said, adding that the Broad is "not particularly concerned" about those differences.
Notably, the data quality of the HiSeq X Ten at 2x150 base pairs is as good as that of the HiSeq 2500 at 2x100 base pairs, she said.
According to Illumina's Whitacre, the HiSeq X Ten is "extremely scalable," allowing the company to decrease the size and spacing of the nanowells in the future.
The platform currently only allows for the use of TruSeq Nano library prep, but Illumina will enable the use of the TruSeq PCR-free library prep in the future. The company has already tested its use and has achieved greater than 30x coverage per genome per lane. All difficult regions of the genome were covered well, including extremely AT-rich regions.
Illumina has "no plans" to introduce patterned flow cells for its current sequencing platforms this year, she said, but continues to assess this option for the future.