NEW YORK – Algorithms used to marry spatial transcriptomics data sets with those from single cells have multiplied in the last couple of years. A new study has put more than a dozen of them to the test, comparing them on their ability to map the spatial distribution of RNA transcripts in single-cell RNA-seq (scRNA-seq) data sets to spatial locations or to deconvolve clusters of RNA transcripts into different cell types.
Researchers led by Kun Qu at the University of Science and Technology of China in Hefei and the Chinese Academy of Sciences compared 16 different integration algorithms on 45 different paired data sets from published studies of mouse brain tissues.
The spatial transcriptomic datasets were produced by 13 spatial transcriptomics approaches including fluorescence in situ hybridization (FISH); ouroboros single-molecule FISH; seqFISH; MERFISH, being commercialized by Vizgen; STARmap (spatially resolved transcript amplicon readout mapping); in situ sequencing; expansion sequencing; BaristaSeq; spatial transcriptomics, now commercialized as 10x Genomics' Visium platform; Slide-seq; Seq-scope; and HD spatial transcriptomics. The scRNA-seq datasets were obtained by Drop-seq, Smart-seq, and the 10x Chromium platform.
The top-performing algorithms for predicting transcript spatial distribution were Tangram, developed by Aviv Regev's group at Genentech and the Broad Institute; gimVI, a spatial algorithm developed as part of the SCVI (single-cell variational inference) tool set at the University of California, Berkeley; and SpaGE. For cell type deconvolution, RCTD, a method initially developed at the Broad for Fei Chen's Slide-seq method; Cell2location, a method from Omer Bayraktar's lab at the Wellcome Sanger Institute; and SpatialDWLS outperformed other methods, the authors said in their paper, published last month in Nature Methods.
"Scientists often focus on developing the next artificial intelligence method, but it is equally important to the scientific community to spend time thoroughly evaluating the pros and cons of existing methods," Tommaso Biancalani, a bioinformatician at Genentech and a developer of Tangram, said in an email. "The work by the Qu lab does exactly this," he said, adding that the study was "thoroughly executed" and the paper "was a pleasure to read."
"In future studies, I would be interested to see a more detailed assessment of the various methods in other tissues besides the mouse brain," he said. "However, it is not always easy to find high-quality ground truth data on other tissues for deconvolution."
The paper may help researchers choose the best tool for each job at a time when commercial spatial transcriptomics technologies — and datasets resulting from them — are poised to explode. Only four of the algorithms can do both transcript mapping and cell type deconvolution: Tangram, Seurat, SpaOTsc, and NovoSpaRc.
"Spatial transcriptomics approaches have substantially advanced our capacity to detect the spatial distribution of RNA transcripts in tissues, yet it remains challenging to characterize whole-transcriptome-level data for single cells in space," the authors wrote. High-resolution methods, such as MERFISH, cannot detect the entire transcriptome in a tissue section, and methods that can, such as Visium, lack single-cell resolution.
To fill in the gaps in spatial data, bioinformaticians have proposed integrating single-cell transcriptomics data, often with the help of machine-learning methods. For example, gimVI and Tangram use neural network-based models.
Predicting transcript location
An important benefit of data integration is that it can help ease concerns about panel design.
"Now, you do not have to worry about which gene to include or exclude from your limited set of genes to be spatially measured," Tamim Abdelaal, a postdoc at the Netherlands' Technical University Delft and a developer of SpaGE, said in an email. "With SpaGE, and similar methods, it is possible to accurately predict the spatial pattern of new genes that were not included in the spatial data. Of course, the quality of this prediction is dependent on the number and relevance of genes spatially profiled already. However, you do not have to run an additional experiment to spatially measure more genes."
The benchmarking study authors compared the algorithms' Pearson correlation coefficient (PCC) for predicting the spatial distribution of known marker genes in the mouse cortex, including Igsf21 — where Tangram performed best — and Rprm — where Seurat and SpaGE were top.
They then came up with a set of additional metrics for prediction accuracy: "structural similarity index (SSIM), which combines mean value, variance, and covariance to measure the similarity between the predicted result and the ground truth; root mean square error (RMSE), the absolute error between the predicted distribution and the ground truth; and Jensen–Shannon divergence (JS), which uses relative information entropy to gauge the difference between two distributions." A metric they dubbed "accuracy score" (AS) rolled all four metrics into one.
The researchers systematically assessed the eight methods for predicting undetected transcripts considering all metrics for all 45 data sets. "The average accuracy scores for the Tangram, gimVI, and SpaGE predictions were 0.96, 0.84, and 0.69, respectively, all of which exceed the AS values for Seurat (0.50), SpaOTsc (0.55), LIGER (0.25), NovoSpaRc (0.47), and stPlus (0.31)," the authors wrote.
They also ran separate evaluations for data based on 10x's Visium, seqFISH, MERFISH, and Slide-seq, as there were more than three data sets available for each platform. "We found that Tangram, gimVI, and SpaGE outperformed other integration methods for data generated from 10x Visium, seqFISH, and MERFISH platforms, and Tangram and gimVI are top-ranked methods in processing Slide-seq datasets," they wrote.
"Concerning the spatial distribution prediction, I admire the fact that the authors did not only evaluate the prediction performance, but they also tested the effect of data sparsity and showed that some methods are more robust than others," Abdelaal said.
Applications for this type of data integration is parsing cell communication, Biancalani said. "For example, this technology could be used to understand how the immune system talks to a cancer cell and how cancer cells respond to immunotherapy."
An aspect of gimVI that the paper didn't consider is the fact that it can provide an estimation of confidence for each prediction, said Achille Nazaret, a graduate student at Columbia University who helped develop the algorithm during a stint at Nir Yosef's Berkeley lab. He suggested that gimVI could be supplemented by another method, say, Tangram, when confidence scores are low.
Performance on deconvolution
Deconvoluting cell type is important because "space on the [tissue] slice is confounded with cell type," said Rafael Irizarry, a bioinformatician at Harvard University and the Dana-Farber Cancer Institute whose lab developed RCTD. Simply looking at differential gene expression across the tissue results in genes that are expressed in one cell type versus another. "It looks like a spatial change, but it's not," he said.
The 12 algorithms evaluated for deconvoluting cell types are able to do so for "spots" generated using the 10x Visium or other spatial transcriptomics platforms, which cover multiple cells. The authors simulated the so-called "multi-cell spot problem" by creating pseudospots from a higher resolution data set generated with STARmap and Smart-seq that captured 1,549 cells with 15 different cell types. "After gridding, the simulated data had 189 spots, with each spot containing one to 18 cells," the authors noted.
"We plotted the locations of L4 excitatory neurons and found that RCTD and Stereoscope performed better in terms of the [Pearson coefficient] values (0.87), followed by Tangram (0.85), Cell2location (0.83), STRIDE (0.80), SPOTlight (0.79), Seurat (0.76), SpaOTsc (0.74), and DSTG (0.71)," the authors wrote. Using their accuracy score, RCTD was highest, followed by Stereoscope.
A similar gridding analysis on a separate data set yielded high accuracy scores from SpatialDWLS, Tangram, and RCTD.
Irizarry said he was happy to see the method proven with Visium data. "There has been a misconception that [RCTD] doesn't work on Visium," he said. "I think it's because our first data sets were from Slide-seq."
The authors also compared all methods on their computational intensity. For the spatial distribution prediction test, Seurat and LIGER took less than 10 minutes of CPU time, and Tangram and LIGER consumed less than 32 GB of memory. Tangram and gimVI can run on GPUs, the authors noted; however, they said those methods reported memory errors on their GPU platform, an Nvidia Tesla K80 with 12 GB memory. For cell type deconvolution, Cell2location reported memory errors on the GPU platform. Seurat and Tangram took less than 30 minutes of CPU time, and Stereoscope, Tangram, and DestVI consumed less than 8 GB of memory. "Tangram and Seurat are the top two most-efficient methods for processing cell type deconvolution of spots," the authors said.
One path of development is to increase dual functionality. While not working on gimVI, per se, Nazaret said he's now developing methods for cell type deconvolution that are also based on deep generative modeling.
And Dylan Cable, a doctorate student in Irizarry's lab who worked on RCTD, is designing a version of the software that is unsupervised, meaning it can analyze unlabeled data.
Biancalani's team is working to release Tangram2, "which, in addition to an improved method for data integration, will also account for extracting cell-cell communication mechanisms," he said.
"Integration of transcriptomics data is quickly becoming a standard machine-learning problem, and there is so much to be done," he said. "In particular, once this data is fully integrated, we need to leverage the resulting signal to understand how cells talk to each other. Our opinion is that cell-cell communication cannot transcend data integration — these problems need to be tackled as one."