NEW YORK – A new computational method offers a new avenue for interplay between spatial and single-cell transcriptomic datasets.
Developed by researchers at the University of Pennsylvania, Cell Location Recovery (Celery) is a deep learning-based algorithm that tries to predict where in a 2D tissue slide a cell is likely to come from given only the transcriptome and reference spatial transcriptomics datasets.
In a paper published last month in Nature Communications, Mingyao Li, a bioinformatics researcher at the UPenn Perelman School of Medicine, and her team, which also included researchers from Biogen, McGill University, and Emory University, provided benchmarking data showing that it could be used with a variety of spatial transcriptomics platforms, including 10x Genomics' Visium and Xenium; multiplexed error-robust fluorescence in situ hybridization (MERFISH), developed by Harvard University's Xiaowei Zhuang; and Merscope, the instrument based on MERFISH being commercialized by Vizgen.
"This is an interesting approach that uses deep learning to build a model that relates gene expression to location using spatial transcriptomics data," Aaron Newman, a bioinformatician at Stanford University who has developed CytoSpace, another algorithm for integrating data from the two approaches, and who was not involved in the study, said in an email.
"We wanted to recover the pairwise spatial relationship between every two cells in single-cell RNA-seq data," Li said. "We're trying to predict the location of a single cell in the coordinate space as defined by a spatial reference" dataset.
The authors suggested that their method could be used to provide spatial context for the large amount of existing single-cell data and, going forward, for integrating data in specific tissues, such as the heart and brain.
The complementarity of single-cell and spatial transcriptomics methods has aided the rise of the second group of methods. Seurat, a single-cell bioinformatics suite developed by Rahul Satija of New York University and the New York Genome Center, was early to offer data integration, providing information on cell locations through the use of anchor genes. But the reliance of anchor genes has limited the application of this feature, the authors wrote in their paper, as it is often unavailable.
And other data integration approaches, including Newman's CytoSpace, use single-cell transcriptomic data to augment spatial datasets, allowing researchers to improve gene recovery or use it to offer greater spatial resolution. "Thus, the goals of the two methods are different," Newman said.
Celery's output is either a so-called spatial "domain" or 2D coordinates — accompanied by a confidence score. Spatial domain information is "not as precise as the 2D location," the authors noted but "still provides valuable information when studying human disease."
The location data can be used in both single-cell- and spatial-centric analyses, the authors wrote. An example of the former would be cell-cell communication analysis, such as ligand-receptor binding. "Since CCC is spatially coordinated, knowing the cellular localizations is crucial to understand how different cells interact with each other during disease development and progression," the authors wrote.
Spatial-centric analyses include inferring cell-type composition and cell-type-specific gene expression. "Moreover, they can aid in imputing missing gene expression for genes that are not included in Merscope, Xenium, and [NanoString Technologies'] CosMx," the authors noted.
Newman noted that the coordinates predicted by Celery "are not constrained to the exact 2D coordinates and spatial architecture of the spatial transcriptomics dataset. In addition, cells are probabilistically assigned to spatial locations, meaning a given cell can potentially map to many locations with high probability."
Celery can also "flexibly impute 2D coordinates, even those not within the spatial transcriptomics dataset," Newman said. "This can be used to interpolate coordinates for cells that are not well explained by the spatial transcriptomics dataset." A similar concept is offered by CellTrek, a method to map cells to locations using data from 10x's Visium and the Broad Institute's Slide-seq spatial methods.
Celery does not yet use histology information available from some multiomic spatial datasets in its predictions but that could be an opportunity to improve the algorithm, said Qihuang Zhang, first author of the paper.
Li's lab is planning to use Celery in collaborations with cardiovascular disease researchers, specifically on coronary artery disease. "Spatial omics is just getting started" in that field, she said. "So far, I haven't seen a lot of data generated. It's not as advanced as in cancer [research], but it's coming."
Spatial data could provide insight into the transition of certain cells in atherosclerotic lesions, areas where blood vessels become thick and stiff. Both smooth muscle cells and macrophages are involved. "We want to know where the transition [from a healthy state to a diseased state] starts, where they are moving to, and the spatial relationship of those cells in the artery," Li said.
"When you only have single-cell data, you have no idea of their location, and you don't know how close they are to each other."