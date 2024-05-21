NEW YORK – Normalizing spatial transcriptomics library size using conventional single-cell RNA tools can hinder spatial domain identification, according to a new study by researchers from the University of Adelaide in Australia and their collaborators.

In their study, published in Genome Biology last month, the authors underlined the potential pitfalls of applying tools developed for single-cell data to spatial analyses while highlighting the need to develop new methods tailored for spatial biology.

Representing the total number of transcripts detected in each region, library size is commonly corrected during spatial transcriptomics analysis as a strategy for equal sampling across the tissue, according to Dharmesh Bhuva, a researcher at the University of Adelaide and the corresponding author of the study. However, due to the lack of dedicated normalization tools, algorithms developed for scRNA are routinely deployed to spatial data.

"​​Many people go for the easiest solution of using the single-cell methods for spatial transcriptomics, but they are not acknowledging the fact that the datasets are very different," Bhuva said.

For example, while single-cell workflows dissociate cells before sequencing, most spatial technologies analyze transcripts in cells fixed in the tissue. This, in turn, can lead to different reagent permeability due to the tissue architecture, potentially resulting in differences in transcript signals across the tissue and effects on library size.

To evaluate the impact of library size correction on spatial domain identification, the researchers compared three commonly used normalization tools: sctransform, scran, and RUVIII-NB. They used publicly available data from 25 tissue samples analyzed using four different spatial technologies: 10x Genomics’ Visium and Xenium, NanoString Technologies’ CosMx, and BGI’s STOmics. In addition, the authors performed analysis without normalization.

Overall, the paper concluded that spatial domain identification was "strongly dependent" on the normalization method used. For instance, library size correction using sctransform, one of the most effective tools for the task, led to poorer domain identification across most datasets regardless of platform, Bhuva said.

Meanwhile, performance for spatial domain identification when using RUVIII-NB, scran, and no normalization was primarily influenced by the clustering method used downstream.

Additionally, the researchers noted that library sizes differed significantly across tissue structure, irrespective of the technologies and tissue types used, reflecting real biology rather than technical artifacts.

"If you are seeing this information, removing that information [would] be bad," Bhuva said.

"I do believe what they reported," Joe Yeong Poh Sheng, a spatial immuno-pathologist at Singapore General Hospital who was not involved in the study, wrote in an email. "Library size does affect the scientific data, and our scientific interpretation always needs to factor in biology."

"I think it is an important study to understand the impact of library size and how it may confound the data," Arutha Kulasinghe, scientific director of the Queensland Spatial Biology Center who was not involved in the study, wrote in an email. "The authors show that this common practice, though sensible for single-cell data, is not appropriate for spatial molecular data because these effects confound biology."

Calling the paper "thought-provoking," Kulasinghe also noted that larger studies are still needed in order to further assess the impact of library size normalization on spatial biology.

Bhuva said one take-home message from the paper is to be extra cautious when applying tools designed for single-cell analysis to spatial data.

"It's fine to use existing [single-cell] methods, but we have to be very careful," he said. "One of the key messages in this paper is that if you're not sure what to do, don't do it."

Accordingly, the paper also highlighted the need to develop new tools for spatial data. To that end, Bhuva's team recently developed a pipeline for normalizing spatial transcriptomics data called spaNorm, which is available on GitHub.

As spatial technologies continue to evolve and new platforms emerge, the field needs to not only conduct cross-product comparisons but also analyze data across technologies to identify potential pitfalls and best practices, Bhuva said.

"We really should stop thinking about just technology comparisons," he said. "Figuring out what is a common phenomenon across the technologies, I think that's very important, as well."