NEW YORK (GenomeWeb) – Two independent research teams have developed computational methods for spatially mapping individual cells profiled by single-cell RNA sequencing back to their original positions using marker genes measured separately with in situ techniques.
The approaches, described in studies published online this week in Nature Biotechnology, involve integrating information about genes that are co-expressed in individual cells to match these cells up against reference maps with information on a far more limited number of marker genes.
Various in situ approaches such as single-molecule FISH have been used to tally up transcript levels for a handful of genes in individual cells that remain in their original location within a given tissue.
But in most single-cell sequencing protocols, information on the spatial relationships between cells is lost when tissues are shaken apart into their individual component cells as part of the sample preparation process.
Authors of the newly published studies reasoned that there might be computational ways around this problem. By considering single-cell RNA-seq profiles in conjunction with a second data source, they set out to find ways of maintaining the breadth and throughput of single-cell RNA sequencing approaches, while placing cells in a spatial context.
For one of the studies, researchers from the Broad Institute, Harvard University, and elsewhere used their version of this approach — dubbed Seurat, in a nod to the pointillist painter by that name — to map more than 850 early stage zebrafish embryo cells subjected to single-cell RNA sequencing back to within-embryo positions informed by available in situ RNA maps for the same stage of zebrafish development.
Just as pointillist paintings are built up of individual dots, the team found that it could computationally map individual cells back to points in the embryo to get a look at genome-wide transcription for cells at each position.
Likewise, European Molecular Biology Laboratory and Wellcome Trust Sanger Institute researcher John Marioni led a group of investigators from the UK and Germany that used an existing, whole-mount in situ hybridization-based expression atlas on the developing brain of the marine worm Platynereis dumerilii to retrace the origins of more than 150 RNA-sequenced individual brain cells from that organism.
"To really understand what a cell's doing, you need to understand its context. And one important part of its context is its location within the particular tissue that it's come from," Marioni told GenomeWeb.
In the case of the zebrafish, members of the other team looked at a developmental stage when only 10,000 or cells are present in an embryo and many have yet to acquire a set identity.
"Early in development, there are extensive signaling gradients that are present in the embryo," Rahul Satija, the study's first author, told GenomeWeb, "and a cell's exact position is paramount in determining what signals it receives and, eventually, determining what type of tissue it's going to differentiate into."
Satija was completing his post-doctoral research at the Broad Institute when the study was done. He is now continuing to develop similar approaches in his own lab at the New York Genome Center and New York University's Center for Genomics and Systems Biology.
Though embryonic cells are largely homogeneous at the zebrafish developmental stage considered, he and his colleagues were able to use Seurat to localize 682 RNA-sequenced single cells from random sites in the embryo with the help of in situ expression information for just a few dozen genes.
"We care a lot about a small number of genes, particularly ones for which we have in situ information," Satija explained. "In our [zebrafish] study, we only had about 40 of those genes. So all of a sudden we place an enormous amount of weight on our measurements for those 40 genes."
To estimate the expression of these genes even when technical noise hindered their direct measurement, that team came up with a strategy for imputing the expression of landmark genes from the in situ map based on sequence data from the complete set of transcripts sequenced in an individual cell.
"We used all of our data to impute back very robust values of the [in situ] genes that we cared about," Satija said. "That's the heart of our computational method: using this imputation strategy to get robust values of those 40 genes."
As a means of double-checking the accuracy of their results, the researchers used a similar approach to test 141 cells from sites near the embryonic margin, which had one of the characteristic expression clusters identified in the study.
They also ran the algorithm on a subset of single-cell RNA sequenced cells plucked from specific places in zebrafish embryos — transparent at that stage of development — to see how well the Seurat-based spatial maps matched cells' actual starting location.
Marioni and colleagues were dealing with a more heterogeneous set of cells in the marine worm P. dumerilii.
That team came up with a computational approach for looking at the relationship between single-cell RNA-seq profiles and an in situ-based gene expression atlas representing 169 transcripts in the P. dumerilii brain at a stage of development when the larval brain is made up of just 2,000 or so cells, 48 hours after fertilization.
"We reasoned that we could use this atlas effectively as a barcode, as a reference," Marioni said. "We could take individual cells from the same developmental stage, we could dissociate cells from the brain, not knowing where they come from, look at the expression profiles … profiled in situ that were present in our reference atlas, and use that to map the cells back to a relatively precise location within the tissue of interest."
Using so-called specificity-weighted messenger RNA profiles, the team estimated the specificity of a given marker gene's expression in each cell. It also took into account information across the set of transcripts detected in each cell as a whole.
"By looking at the locations where multiple genes expressed in a given cell are expressed in the atlas, you can begin to really gain power in your mapping," Marioni said. "We're looking, effectively, at patterns of co-variant expression in the method that we have at present."
Through computational comparisons to the in situ P. dumerilii expression atlas, the researchers mapped sequenced marine worm brain cells back to their original locations around 81 percent of the time.
Based on 98 high-quality genes from the in situ atlas, for example, the researchers mapped 69 of 139 RNA sequenced single cells that had been quality filtered back to the P. dumerilii brain with high confidence.
Another 43 cells were mapped with medium confidence, based on the number of "voxels" in a grid of in situ genes that could be linked back to transcripts in sequenced cells. The other RNA-sequenced cells either did not map or mapped with low confidence.
When they attempted to map cells back to the worm brain using information from subsets of the in situ atlas, the researchers found that the proportion of medium- or high-confidence cell placements notched upwards with increasing atlas size.
Still, Marioni noted that the type of genes selected for in situ testing are perhaps even more important to placing the RNA-sequenced cells.
"If you profile lots of genes that are expected in exactly the same cell type, that's not going to provide you with very much information," he explained. "You really need to select effective genes that are marking a variety of regions within the organ you're studying."
While the precision of both computational approaches described in the studies may vary depending on the in situ method use for measuring markers, the researchers noted that the strategies are compatible with a range of in situ and single-cell RNA-seq protocols and sequencing technologies.
Further, those involved in the studies expect to be able to perform similar spatial reconstructions on sequenced single cells from any tissue type and/or developmental stage with an appropriate in situ gene expression atlas.
For non-model organisms that may lack in situ markers, Marioni noted that it may be possible to produce a more limited in situ expression atlas on the fly through targeted testing on a few dozen targeted genes.
Meanwhile, Satija and his team are looking at the possibility of evaluating non-model organisms with alternative strategies that combine single-cell RNA-seq data with information from experiments in which bulk sequencing is done on sections of tissue coming from well-defined sites in an organism.
He noted that it may also be possible to iteratively establish a set of reference transcripts by analyzing single-cell RNA-seq data to try to find genes that are most variable between datasets to narrow in on markers for pinning individual cells down in space.
"You might be able to get a hint that these are genes that are fluctuating wildly across single cells and, therefore, might be likely to drive spatial differences," Satija said.
Going forward, Marioni and his colleagues are continuing to work on P. dumerilii in the hopes of evaluating cellular heterogeneity at this and other stages of development. They are also looking at ways of better modeling the technical limitations of single-cell RNA sequencing to improve the accuracy of cell placement.
Marioni explained that there may also be benefits to doing spatial reconstruction on RNA-sequenced single cells from other non-model organisms, particularly to learn more about animal development or evolution.
Both teams are making their software open-source so it will be available for use by other researchers. Satija and his team have also developed a web site that includes information on downloading and installing Seurat, as well as tutorials.
Along with its use for positioning cells in space after single-cell RNA-seq experiments, Satija noted that the general framework behind Seurat could serve as a framework for other types of single-cell analyses, including studies that track cells tested by single-cell RNA-seq over time.
"We've focused on spatial information, but there's nothing in the algorithmic strategy or in the intuition that has to be focused on space," Satija said. "You could also think of combining single-cell RNA-seq with flow cytometry data or with RNA FISH data or with any other source."