Skip to main content
Premium Trial:

Request an Annual Quote

Double-Barcoded Single-Cell Library Prep Solves Index-Hopping Issues, Lowers Sequencing Costs


NEW YORK – Firms making droplet-based sample preparation instruments for single-cell transcriptomics have adopted an important new library prep technology that could halve the sequencing costs per sample.

These so-called "dual-indexed libraries" use a double barcode scheme that enables multiplexed sequencing on Illumina's highest-throughput platforms: NovaSeq and HiSeq. The libraries not only solve problems associated with the well-documented issue of index hopping inherent to those instruments, but also allow more multiplexing of samples and appear to improve base calling accuracy by adding to the diversity of sequences.

Both 10x Genomics and 1Cellbio recently introduced products with this technology. 10x Genomics announced its dual-indexed Single-Cell Gene Expression kit in February at the Advances in Genome Biology and Technology conference and released it earlier this month. Its new Single-Cell Immune Profiling kit, also released this month, will soon be dual-index compatible, the firm said in an email. The kits are priced the same as single-index kits and "cost savings can be as much as 70 percent on sequencing per cell, compared to smaller benchtop sequencing platforms and lower capacity flow cells," 10x said.

1Cellbio, which is commercializing the inDrop method, announced its TruDrops dual-indexed libraries earlier this month, following a publication about the solution in collaboration with Ken Lau's lab at Vanderbilt University and RootPath Genomics, a 1Cellbio customer. The firm's PrimeRead kits will also be dual-indexable; however, the current business environment has pushed the launch of both products to the second half of 2020, 1Cellbio said.

The paper, published July 2 in BMC Genomics, also suggested that the new library design improves base calling by introducing sequence diversity. A cost breakdown in the supplementary materials suggests a 53 percent decrease in sequencing costs per sample when taking advantage of a maxed-out NovaSeq run.

Assuming a read depth of 100 million reads per sample, sequencing 96 samples on a NovaSeq S4 flow cell costs approximately $360 per sample, the authors said, compared to sequencing four samples per high-throughput flow cell on an Illumina NextSeq instrument, which costs about $765 per sample. Even using the NovaSeq S2 flow cell, which enables 37 samples per run and uses only two lanes, costs are approximately $530 per sample, or a discount of about 30 percent compared to NextSeq.

For these reasons, Lau said, he saw little downside to using dual-indexed libraries, though he acknowledged there are upfront costs of making primers and devising library schemes. Data generated from dual-indexed libraries may be incompatible with previous results, he said, but noted that was more of an issue of experimental design.

Just a few reads hopping can generate a large proportion of phantom molecules, especially in low-complexity samples.

Illumina's highest throughput sequencers were previously incompatible with droplet-based, single-indexed, single-cell libraries because of index hopping, which leads to the misassignment of reads to the wrong sample. "A lot of researchers are not aware of this [index hopping] problem," Lau said. "They just use the data 'as is' for the most part. I think these people should be aware of this problem. If you're aware of the problem, you can prepare for it and mitigate it."

Though many labs may not even notice the problem, single-cell genomics can magnify the adverse effects, a recent paper published in June in Nature Communications suggested. "Just a few reads hopping can generate a large proportion of phantom molecules, especially in low-complexity samples," said Rick Farouni, a postdoc at Canada's McGill University Genome Center and the first author of the paper. He noted that "low-complexity samples would have a relatively higher ratio of reads to molecules, since in high-complexity samples the same number of sequencing reads would then need to be distributed over a larger number of molecules."

The general problem, while easily overlooked, is well-documented for Illumina's HiSeq and NovaSeq instruments, which use the ExAmp chemistry that causes it. In April 2017, researchers at Stanford University reported in a BioRxiv preprint that between 5 and 10 percent of reads from multiplexed samples were assigned the wrong barcodes. Soon after, Illumina published a white paper that pinned the problem on an increased concentration of free-floating barcodes that attach to complementary DNA fragments. Bead- or gel-based cleanup can help remove these adapters, and Illumina has advised using dual indices for multiplexed sequencing libraries for those platforms.

In droplet-based single-cell RNA sequencing, where each read has both cell- and sample- identifying barcodes, swapped indices can have several effects. According to a 2018 paper published in Nature Communications by the lab of John Marioni of the European Molecular Biology Laboratory – European Bioinformatics Institute, cell-specific barcodes could get used twice but assigned to different samples, where index misassignment leads to mixed transcriptome profiles. Alternatively, a cell barcode could be introduced into a sample, creating an artifact.

Farouni's paper described phantom molecules that confound downstream analyses, and even the misclassification of empty droplets as cells. Using 10x-generated libraries, he and his colleagues determined the index-hopping probability to be between 0.003 and 0.009, which "counter-intuitively gives rise to a large fraction of phantom molecules — the fraction of phantom molecules exceeds 8 percent in more than 25 percent of samples and reaches as high as 85 percent in low-complexity samples," they wrote.

These results mean that some researchers will want to reanalyze their samples to make sure their datasets weren't corrupted by index hopping, Farouni said, especially "if you think that your sample is of lower complexity relative to other samples that were sequenced in the same run." But reanalysis requires all the original samples, which may be impossible for some projects.

One way to address the problem is computationally, by purging reads that have barcodes shared across multiplexed samples. But sometimes these algorithms mistake wheat for chaff and end up tossing out valid reads presumed to be artefacts. In their paper, Farouni and his collaborators proposed a statistical framework for modeling index hopping that enables researchers to estimate the index hopping rate, which "allows you to reassign the sample of origin for hopped reads," Farouni said. They also developed a classification procedure to optimize the purging of phantom molecules. "Then you don't end up throwing away a portion of your data," he said.

Lau said that addressing the problem in the sample preparation is important "because you don't know read quality metrics a priori."

Farouni added that dual-indexed libraries were "an advancement, for sure." He said he hopes dual-indexed libraries will "solve the problem once and for all and obviate the need to remedy the problem using computational strategies."

10x's solution combines both sample prep and computational methods. The libraries use only specific combinations of indices, so if index hopping occurs, the firm's Cell Ranger software can identify it and purge the read.

Researchers wanting to switch to dual-indexed libraries are likely to see other benefits besides lower costs. Ambiguous base calls for libraries generated with 1Cellbio's instruments disappeared, according to Lauren Quigley, a senior scientist at the firm and a senior author on the BMC Genomics paper. "I was expecting we would see an improvement [in the number of ambiguous calls] but I didn't know it would drop all the way to zero," she said. The firm has also seen an increase in uniquely aligned reads and fewer discarded reads.

Lau said this was due to increased complexity in the barcode regions of the reads. "Sequencers work by having a diversity of sequence compositions," he explained. "If you up the diversity of the regions, you get better-quality base calls" and more data passing the quality-control threshold set by the sequencing instrument.

"We are seeing customers updating to the new release that was launched in early July and are seeing a number of new customers as well," a 10x Genomics spokesperson said in an email. "Core facilities in particular welcome the addition of the dual index kits as it facilitates the combining of libraries." The firm noted it has tested the dual-indexed kits across several Illumina instruments that aren't associated with index hopping, including the MiSeq and NextSeq.

Also, along with the imminent ability to do targeted single-cell transcriptomics, decreasing costs "makes more clinical applications tractable that weren't economically possible before," said 1Cellbio CEO Colin Brenan.