SAN FRANCISCO (GenomeWeb) – As the number of single-cell RNA sequencing technologies has proliferated over the last several years, researchers have sought to understand the advantages and limitations of the various methods. A team from the Broad Institute has now analyzed seven single-cell RNA sequencing methods, described in a preprint on BioRxiv, and hopes the project can serve as a guide for other researchers and as a metric against which to measure new techniques.
Joshua Levin, a senior group leader and research scientist at the Broad as well as a senior author of the paper, said that the evaluation would help the Broad team select techniques for single-cell RNA-seq studies, including for projects such as the Human Cell Atlas.
In addition, because the researchers used readily available sample types and developed a bioinformatics pipeline that could work for all the methods, it would be straightforward to benchmark new techniques or improved versions of existing technologies, Levin noted.
"We wanted to understand the strengths and weaknesses of each method," he said. "We also wanted to make the comparison in a different way, looking at not only technical measurements but also the biological information that is captured."
Although there have been previous studies comparing genomic methods, single-cell genomics "is a fast-moving, very dynamic field," Levin said, so he felt that it was important to survey the latest technologies.
For instance, just earlier this week, a team from the University of Michigan developed yet another method for single-cell RNA-seq called Hydro-seq. Those researchers focused on collecting circulating tumor cells in a microfluidic device and preparing them for expression analysis.
While that method was not included in the Broad team's analysis, the Hydro-seq method could ultimately be benchmarked and compared in the same way.
In the recent study, the researchers evaluated seven methods, including two lower throughput methods, Smart-seq2 and CEL-Seq2, and five high-throughput methods: 10x Genomics' Chromium single-cell gene expression solution, the microfluidics-based Drop-seq approach, the Seq-Well protocol developed by researchers from the Massachusetts Institute of Technology, the inDrop protocol being commercialized by 1CellBio, and the combinatorial indexing strategy known as sci-RNA-seq that was developed by researchers from the University of Washington.
The researchers analyzed the methods on a cell line mixture, peripheral blood mononuclear cells (PBMCs), and cells from mouse brain tissue. In addition, they also tested the ability of four of the methods to analyze RNA from nuclei.
In total, they created 36 different libraries, generating expression data from around 92,000 single cells.
Overall, Levin said, all the methods yielded results. "Each method had its strengths, so I would be cautious about saying that there was one method you should use; that wasn't the message. But, there are some methods that perform better in different ways."
For instance, the lower throughput methods are generally more sensitive, detecting more genes per cell. The Smart-seq2 method detects reads from across the entire transcript, not just the 3' end, so if researchers are interested in RNA splicing variants, for instance, that method would pick them up, Levin said.
Roser Vento-Torma, who heads a single-cell genomics lab at the Wellcome Sanger Institute and was not involved in the study, said that "these types of studies are needed because they compare the different methods in an unbiased way."
In addition, she said, because these types of benchmarking studies can be really expensive, smaller labs cannot readily conduct them, so having this type of data in the public domain is extremely valuable.
The study is "useful because you can go to this analysis and choose the best method for your own questions," she said.
The results weren't particularly surprising, Vento-Torma added, noting that they largely confirmed what was suspected about each of the methods, but "it was one of the first times that there was a systematic comparison."
In the study, the researchers analyzed a number of different metrics, including sensitivity, the distribution of reads, and multiplet rates, or how often more than one cell was captured.
As expected, the lower throughput methods were more sensitive, detecting the most unique molecular identifiers and genes per cell. The 10x Chromium instrument had the highest sensitivity among the high-throughput methods, while inDrop had the lowest.
The researchers also analyzed the distribution of reads from each library type, which provided information about the methods' efficiency. For instance, the team analyzed the proportion of reads from each library that did not have polyT in the expected position. In general, they found that the methods that don't use beads to capture mRNA — CEL-Seq2 and sci-RNA-seq — as well as the 10x Genomics approach, had polyT at the expected positions, while the Drop-seq, inDrop, and Seq-Well methods had higher fractions of reads without polyT.
Looking at the distribution of reads, the team found that Smart-seq2 and inDrop had the highest fraction of reads that corresponded to exons, at over 50 percent, while sci-RNA-seq had the lowest fraction of exonic reads, at below 30 percent. In addition, all methods had lower proportions of exonic reads when analyzing blood cells versus cell lines.
To calculate multiplet rates, the researchers analyzed data from the human and mouse cell line mixture, which allowed them to calculate when multiple cells were captured, since cell barcodes from both species would be present.
All methods except for inDrop had multiplet rates below 3.5 percent. The low throughput methods, Smart-seq2 and CEL-Seq2, had multiplet rates below 1 percent, which the authors attributed to the fact that those methods use FACS to sort cells.
Aside from analyzing the methods' technical specifications, the researchers also wanted to look at how they compared in terms of the biological information captured.
One of the main applications of single-cell RNA-seq is to identify distinct cell types. To analyze the methods' ability to do this, the researchers analyzed data from the blood cells and mouse cortex, since both samples contain diverse cell types.
For these comparisons, the team sampled the same number of reads per cell and separately sampled the same number of cells from each method. In general, the methods had more difficulty separating cells that were transcriptionally similar to each other, however, the 10x and inDrop methods outperformed the other methods in this regard. Meanwhile, all methods performed well for recovering abundant cell types.
For the lower throughput methods, not enough cells were analyzed to recover the rarer cell types.
Looking at the mouse cortex cells, only DroNc-seq, a version of Drop-seq designed to work on nuclei, was able to capture the rare pericytes. In general, DroNc-seq, 10x, and Smart-seq2 had similar performance but were each slightly different in their ability to detect certain cell types.
Aside from analyzing the methods themselves, Levin said an important component of the study was the development of a bioinformatics pipeline that could work with all of the methods. For that, the Broad team developed a tool called Scumi.
Levin said that although computational tools are available for each of the methods, the team felt that it was important to compare the methods using a universal method to minimize differences that would occur due to differences in the bioinformatics. "I was surprised at how difficult it was to come up with a fair comparison," Levin said, adding that it was tricky to balance optimizing the computational pipeline so that each method performed optimally without biasing it against any other method.
Vento-Torma agreed that the computational pipeline was an important part of the study and would enable other methods to be compared using the same approach. "These types of analyses are really challenging to do," she added, since the bioinformatics tools used can have such a big impact on the results. But having a tool that worked with the various protocols "put them in a good position to compare the methods," she said.
The study is not the first attempt to compare single-cell sequencing methods. For example, researchers from Beijing's Tsinghua University last year compared the droplet-based methods and researchers from the Sanger Institute have been comparing various single-cell RNA sequencing methods and sequencing platforms.
Vento-Torma noted that while the Broad team's results were consistent with those previous studies, it also went a step further by analyzing methods on various sample types, including biological data, and by using a uniform bioinformatics pipeline.