Study Reports Incorrect Assignment of Reads to Samples in Multiplexed Illumina HiSeq 4000 Sequencing

SAN FRANCISCO (GenomeWeb) – A Stanford University team has found that multiplexing samples on Illumina's HiSeq 4000 instrument leads to some reads being assigned to incorrect samples, an issue resulting from index switching.

The team reported its findings in a manuscript on the preprint BioRxiv server earlier this month and wrote that the problem occurred in between 5 percent and 10 percent of its sequencing reads.

Rahul Sinha, co-lead author of the paper, said that the team had to scrap all of the experiments it had done on the platform. In the BioRxiv paper, the researchers describe a single-cell RNA-seq experiment they performed in order to look at gene expression patterns in subpopulations of mouse hematopoietic stem cells. The researchers used the Smart-seq2 protocol and Illumina's Nextera XT library prep and noted in the paper that after library prep and before pooling, there was no PCR step. Libraries were sequenced on either the HiSeq 4000 or the NextSeq 500 at Stanford.

Sinha noted that Geoff Stanley, co-lead author of the paper and a graduate student in Stephen Quake's lab at Stanford, was the first to notice something amiss with the data.

"All 41 subpopulations of our mouse stem cells shared the same index," Sinha said. "We knew it wasn't biological, but didn't know what was causing the artifact."

After discussing different possibilities that could have led to the artifacts with the core lab director, the team decided to test whether it could have been index switching. Sinha said that to test this, the researchers used an ongoing single-cell RNA-seq experiment of mouse fetal heart cells, where more than half of the wells in a 384-well plate were empty. In some of those empty wells, the researchers added the sequencing reagents, including index primers, but no cDNA.

After sequencing, the wells that had only the reagents had reads assigned to them that mapped to the mouse genome. Empty wells that contained neither cDNA nor reagents had very few reads assigned to them, and only about half mapped to the mouse genome. The team observed the same phenomenon across more than 50 experiments conducted by more than eight different laboratories at Stanford. Furthermore, they found that approximately 5 percent to 7 percent of the reads were incorrectly assigned.

To further test the issue, the researchers conducted an experiment where they added an excess of index primers that had not been used during library preparation. They then divided the sample, sequencing one on the HiSeq 4000 and another on the NextSeq 500. After demultiplexing, they found a large number of reads assigned to the free index primers on the HiSeq 4000, but very few on the NextSeq 500.

Another feature of the index switching that the authors observed was that the signal spreading occurred among cells in a given column or row on the 384-well plate, which they wrote was likely because of the specific conditions in which isothermal amplification is performed.

Sinha also noted that the researchers identified the index switching phenomenon not just in their single-cell RNA-seq experiments, but also in bulk RNA-seq and ATAC-seq experiments.

Sinha said he thinks the reason for the index switching on the HiSeq 4000 but not on the NextSeq platform is due to the newer patterned flow cells and exclusion amplification (ExAmp) chemistry on the HiSeq 4000.

"We don't fully understand what is happening with the ExAmp chemistry," Sinha said, since the chemistry is proprietary. But, he said, his team was able to piece together some of the basics of the chemistry by examining the patents on the method. The ExAmp procedure is fundamentally different from bridge amplification, the method used on Illumina's older systems, the authors wrote in the bioRxiv paper. In bridge amplification, single-stranded library molecules bind to an oligonucleotide that is immobilized on the flow cell. Any library molecule that does not bind is washed away along with leftover primer. The ExAmp procedure does not include the bind and wash steps prior to cluster generation, they wrote. Instead, the library molecules are mixed with the ExAmp reagents and loaded onto a patterned flow cell. Then, a rapid isothermal amplification step is performed to generate clusters.

Sinha's group is not the only one to note the index switching issue. James Hadfield, head of genomics at Cancer Research UK, wrote about the issue last December in a blog post on Enseqlopedia. Like Sinha, Hadfield gleaned information about the ExAmp chemistry from Illumina's patents.

In his post, he wrote that multiplexed sequencing with standard dual-indexed adaptors where one end is shared is likely to be affected at low levels. His group identified the problem when trying to look for cancer mutations in an exome sequencing experiment, and he said that the issue, which seemed to occur at a frequency of .1 percent to .2 percent, "may stop us calling variants below 1 percent" allele frequency.

Rasmus Nielsen, a professor at the Center for Theoretical Evolutionary Genomics at the University of California, Berkeley, said his group has also experienced the issue of "barcode bleeding," as he described it. He said that the problem occurred at a similar rate as the Stanford group, affecting between 5 percent and 10 percent of the reads. Nielsen said he noticed it on targeted multiplexed experiments where they were sequencing samples from different species. The problem was immediately apparent, he said, because sequence reads that were clearly from one species were assigned to a different species based on the barcode. "I'm glad this paper got to the bottom of it," he said.

Sinha's BioRxiv paper also generated significant discussion over Twitter, and Illumina has since published a white paper on the issue.

The white paper describes the problem and also suggests some steps to mitigate the effect. For instance, it suggests using dual indexed libraries with unique indexes. That will ensure that any reads that end up with the wrong index are flagged as unaligned and can be excluded from analysis.

Tim DeSmet, director of operations and development at the Broad Institute, said in an email that the Broad has been using such a dual indexed library with unique indexes for several years, and so has not experienced the index-switching problem. "Our pipeline automatically filters out reads that might have the index-swapping phenomenon described in the paper," he said.

Illumina noted in a statement that while the steps described in the white paper will be effective for the short term, it is also "evaluating long-term corrective actions, and this is one of our highest priorities."

In addition, the company said that it has so far found the problem occurs at a much lower frequency than what Sinha's group reported — at less than 2 percent. Gary Schroth, vice president and distinguished scientist of product development at Illumina, said in an email that index switching is not specific to the ExAmp chemistry and the chemistry itself does not cause the problem. "Index switching has been a known phenomenon since the early days of next-generation sequencing," he said. In the white paper, the company said that it occurs at a rate of less than 1 percent in systems that use bridge amplification. However, he noted, the ExAmp chemistry is "more sensitive to some of the variables that lead to index switching."

With regards to the Stanford group's much higher observed rate of index switching, he said that their highly multiplexed, single-cell RNA-seq experiment "brought to bear essentially all of the variables that we now know exacerbate index switching."

The group used a "homebrew set of indexes and adapter," he said, since at the time "well established protocols, kits, and index sets for their application did not exist."

Schroth added that customers, on other infrequent occasions, have documented "greater than expected levels of index switching," and when those customers have brought the problem to the attention of Illumina, the firm collaborated with them to fix the issue.

Sinha said that since his team published his BioRxiv paper, the response from Illumina has been positive. However, he is disappointed that it took so long for the company to acknowledge the issue. Nielsen, too, said that Illumina's slow response was "disappointing." He added that the issue has been known to a number of researchers who have been discussing it online in blog posts such as Hadfield's. "I'm surprised that it took the BioRxiv paper for Illumina to come out with anything," he said.

Sinha said that he is now working on updating the BioRxiv paper based on feedback he's received and additional experiments he's run and plans to submit it to a peer-reviewed publication. "We have data that shows [index switching] affects all kinds of libraries," he said. He said he still thinks the problem is the ExAmp chemistry since the same libraries that are run on the NextSeq do not have the same effect.

He estimated that approximately $600,000 worth of his experiments have been impacted. Fortunately, though, for many of the experiments, the group has leftover libraries that it plans to run on the NextSeq. In addition, he said that Illumina has reached out to him and his group to help permanently resolve the issue.

A former Penn State Hershey Medical Center staffer has admitted to lying about skipping mandatory steps of genetic cancer tests he performed, the Associated Press reports.

The genome of a rare, red bat suggests that its effective population size has been in decline for thousands of years, according to a PLOS One study.

In Nature this week: investigation into the genetics of medulloblastoma, and more.

A project in the UK is to use genomic selection to speed spruce tree growth, according to Innovators Magazine.

Sponsored by

This webinar will address improvements in the library prep workflow for small RNA sequencing in serum and plasma.