Skip to main content
Premium Trial:

Request an Annual Quote

UC Berkeley's FastProject Improves Visualization, Interpretation of Single-Cell RNA-Seq Data


NEW YORK (GenomeWeb) – A pair of researchers from the University of California at Berkeley have developed software called FastProject for visualizing and interpreting single-cell RNA-sequencing data.

"My lab is interested in studying regulation of transcription using genomic tools, [and] over the past couple of years, we've started working with single-cell data," Nir Yosef, an assistant professor in UC Berkeley's department of electrical engineering and computer science and one of the authors on the paper, told GenomeWeb. "When you work with single-cell data, you are faced with a huge amount of information, it can be thousands of cells [and] thousands of genes. ... FastProject gives a quick and very intuitive way to explore the data and see what the main themes are in an unsupervised way."  

As explained in a recently published BMC Bioinformatics, FastProject addresses three challenges associated with visualizing single-cell RNA-seq data in two-dimensional plots. These include selecting an appropriate data projection method; and once a projection has been created, understanding the biological significance of the projected data such as which phenotypes are responsible for the observed cellular configuration. There are also challenges with controlling for confounding factors such as differences in gene capture rates, which can make single-cell RNA data difficult to interpret, the researchers wrote. 

"FastProject fills a niche that hasn't been addressed previously much — the interpretation of projections," Yosef said. "[In] almost any RNA-seq paper, you see two-dimensional plots of cells. People interpret these plots in various ways and every paper has a different thing based on what people know about their system." FastProject provides a "standardized recipe" for interpreting projections and doing so in as unbiased a way as possible, he said.

Specifically, it uses various linear and non-linear projection methods to visualize data, Yosef explained. "You basically get an overview of all of these two-dimensional projections and see what insight each one can give you ... in a way that is easy to view and compare." The software also incorporates gene signature information — collections of genes with a common function — into the created projections so that users can explore the data points in the context of the biological processes that they might represent. It includes tools for scoring cells against gene signatures to minimize the effects of missed transcripts and for ranking gene signature-projection matches to highlight meaningful associations in the data, according to the authors.

The input to FastProject is an expression matrix in a tab-delimited format and gene signatures from standard databases and repositories such as the Broad Institute's Molecular Signatures Database, the Gene Ontology, and the Kyoto Encyclopedia of Genes and Genomes. Users can also create their own gene signatures that reflect particular phenotypes of interest to them and use these as input to the pipeline, according to the paper. 

The software first estimates false negatives in the datasets and uses different criteria to filter out genes that show up in fewer than a preset threshold number of cells — the default threshold is 20 percent of the input cells. It then uses 11 different projection methods to generate two-dimensional coordinates for each cell, and uses information from gene-signature databases to scores individual cell-signature pairs and applies a randomization test to identify and rank statistically significant projection-signature associations.

The software produces 76 possible projections and their associated functional annotations and provides these in a user-friendly report. It also produces results as text files, which makes it easier for users to inspect relationships between different pathways that are "highly correlated" with the two-dimensional positions as well as highlight new associations in the data, according to the paper.

For example, a researcher studying T-cells could use FastProject to generate two-dimensional plots of the data and combine them with gene signatures from RNA-sequencing studies that compare naïve T-cells and memory T-cells, Yosef explained. By overlaying the gene signatures on the plots, the researcher can then score every cell in the projected data as either a naïve or memory cell. "This signature projection consistency score has all the statistics behind it to identify the associations that are significant," he said. "So for every projection, we [can] say this is the projection of the data and this is the underlying biology that it reflects.

Compared with current methods, FastProject is unique in the combination of methods, according to its developers. For example, ViSNE, a single-cell RNA analysis solution developed in the laboratory of Dana Pe'er, an associate professor in Columbia University's department of biological sciences, does not incorporate gene signatures or offer a method for analyzing biological variation in two-dimensional projections, they wrote. Another method called Pagoda, part of the SCDE software package developed in the lab of Peter Kharchenko in Harvard Medical School's biomedical informatics department), uses gene sets but in the context of heatmaps and not two-dimensional projections, according to the paper.

One commonly used pipeline for interpreting single-cell data "is to cluster the cells ... and then conduct differential expression between clusters, as a way to interpret their biological meaning," said Yosef. "FastProject provides an alternative approach that is applicable in both the scenario where clusters/ sub-populations can be clearly inferred from the data as well as when there is no clear partition into sub-populations."

Furthermore, its results are accurate, he said. When the UC Berkeley researchers applied FastProject to 430 tumor cells gleaned from five glioblastoma patients from the Gene Expression Omnibus repository, the results showed that the software correctly stratified cells according to their individual donors, which matched previously published results about the datasets.

"It more or less brings up the same kind of conclusions but [does] it in a more generic and systematic way," Yosef said. "So for us it was a good thing to see that when you [use] such a generic tool ... you get the same or similar insights that you would gain by more manual and laborious types of methods."

Yosef told GenomeWeb that his lab is developing additional tools and applications for analyzing single-cell RNA-seq data that could be bundled with FastProject. One of these is a tool called Single-Cell Overview of Normalized Expression, or SCONE, which provides a way for researchers to normalize their RNA-seq data before moving it into FastProject for analysis. SCONE provides a framework for running multiple normalization workflows in parallel as well as tools for ranking workflows and visualizing trade-offs. It uses common normalization modules used in traditional bulk sequencing and also supports user-specified normalization modules, according to the developers.

The developers are also making improvements to FastProject itself, Yosef said. For example, they are including methods for clustering gene signatures in the plots that behave in a similar way. They are also developing tools that will allow users to overlay cell trajectories on the two-dimensional plots and study gene signatures that change over time across the trajectories.