Skip to main content
Premium Trial:

Request an Annual Quote

Allen Institute Developing Gene Panel Based on Longitudinal scRNA-seq Analytics Platform


NEW YORK – The Allen Institute for Immunology is looking to commercialize a multiomic gene panel to screen for immune disorders based on an internally developed bioinformatics platform for analyzing longitudinal single-cell RNA sequencing data. In doing so, the Seattle-based research institute hopes to make scRNA-seq more accessible as a clinical tool in immune health and beyond.

"Single-cell RNA-seq has always been a very boutique assay, used by academic centers for really deep science on a limited number of samples," said Peter Skene, director of high-resolution translational immunology at the Allen Institute. "We are trying … to develop a signature panel that would allow the chemistry to scale at an attractive price point."

The assay would be based on PALMO, technology that stands for Platform for Analyzing Longitudinal Multi-Omics Data. In a recent study published in Nature Communications, Allen Institute scientists described how they were able to pinpoint 220 genes as likely biomarkers for cell types that could indicate immune diseases including influenza and COVID-19.

PALMO features five modules for analyzing longitudinal bulk and single-cell multiomics data: variance decomposition analysis (VDA); coefficient-of-variation profiling (CVP) of proteomics data; stability pattern evaluation across cell types (SPECT); outlier detection analysis (ODA); and time course analysis (TCA). This SPECT acronym differs from the more common usage, for single-photon emission computed tomography imaging.

The Allen Institute team used each on the same datasets, and Skene led data creation and curation.

The research in Nature Communications focused on three cohorts of 50 subjects each, covering children aged 11 to 13 years old at the commencement of the study, young adults between 25 and 35 years old, and older adults from 55 to 65. Each donor provided six to 10 samples over several years, and Skene's team generated scRNA-seq data on about 20,000 cells using 10x Genomics' high-throughput technology.

"We tried to analyze the data longitudinally," said corresponding author Xiaojun Li, director of informatics and computational biology at the Allen Institute for Immunology. "I found there's actually not a good software package available to dive deeply into it."

For optimizing development of the "deep immune profiling pipeline," Li and colleagues conducted weekly blood draws and multiomic testing on a control group of six healthy participants for 10 weeks. That data led to the development of PALMO, which the Nature Communications paper ostensibly introduces.

PALMO actually first surfaced last year when the Allen Institute introduced the Human Immune System Explorer (HISE). It is one of two analytics methods included in the initial release of HISE, along with TEA-seq, an assay that measures transcriptomics, epitopes, and chromatin accessibility through scRNA-seq.

TEA-seq is not longitudinal, and thus not part of the recently published study.

Longitudinal omics data is "challenging to analyze due to its many intrinsic types of variations," according to the Allen Institute team.

The investigators noted that single-cell sequencing carries "complications" including dropout, sparseness, and interdependency between cells and unbalanced cell counts within samples.

For this study, they tested PALMO on their own datasets as well as six external ones, covering flow cytometry on peripheral blood mononuclear cells (PBMCs), bulk and single-cell RNA-seq for transcriptomics, single-cell RNA-seq, mass spectrometry for proteomics, single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) for epigenomics, and the Olink Proximity Extension assay for single-cell proteomics.

They collected "stable across time in cell types" (STATIC) genes as potential biomarkers for cell types or diseases and eventually identified 220 "unique" genes, according to the paper.

The researchers tested these genes in four manners. "In all four cases, the 220 STATIC genes separated major cell types and most of their subtypes very well, suggesting that some STATIC genes are potentially good markers for cell types," Li and colleagues wrote.

They further compared these genes against PBMCs of patients with the flu or COVID-19 from earlier studies as well as the control group of healthy subjects to find differential expression genes. They found that all 220 STATIC genes were "significantly enriched" as differential expression genes, which they said illustrates these genes' potential for disease monitoring.

"Longitudinal single-cell omics data is even more complicated than cross-sectional scRNA-seq data and may require new statistical methods to properly handle its many types of variation," the investigators wrote, and they struggled to benchmark the performance of PALMO thanks to "the lack of a well-accepted software package for longitudinal omics data."

There have been bespoke analytical methods developed for longitudinal bulk and single-cell data, producing what the Allen Institute investigators called "mixed performance." Examples include MAST, SCDE, Discrete Distributional Differential Expression (D3E), and Monocle

Earlier, broadly distributed software includes the Icahn School of Medicine at Mount Sinai's VariancePartition, a Russian-developed tool called tcR, and popular RNA-seq analysis tool Seurat. In the Allen Institute article, PALMO outperformed VariancePartition, the researchers said, because the VDA module can work around missing data that is "almost inevitable" when dealing with longitudinal omics data.

The group also said that Seurat is not suitable for research involving more than two points in time because it works by contrasting data from two groups of differentially expressed genes. Seurat and the PALMO TCA module "generated rather different results on up- or down-regulated genes," with TCA showing "better dynamic changes over time," according to the paper.

Because the Allen Institute did not build PALMO specifically for T-cell receptor data, Li and colleagues were unable to say definitively that VDA is better than tcR.

Both VariancePartition and tcR "can be repurposed to examine longitudinal omics data either from a single perspective and/or collected on a single technical platform," according to the authors, but they said they were unaware of any widely used technology for analyzing longitudinal bulk and single-cell omics data.

Some are in development or testing, however.

Last year, a team from Cornell University published details of a statistical model that can jointly impute cell type composition and cell type-specific gene expression from bulk RNA-seq data using scRNA-seq data from glioblastoma patients as a reference.

Meanwhile, researchers at MD Anderson Cancer Center developed an algorithm to reduce the complexity of single-cell gene expression data that they said improved on Seurat.

More recently, researchers from the Stanford University Institute for Stem Cell Biology and Regenerative Medicine described CytoSPACE — for Cyto Spatial Positioning Analysis via Constrained Expression alignment — a method that goes beyond earlier techniques by mapping individual cells to a reference single-cell RNA-seq atlas.

Those methods all focused on cancer. Additional work on the 220 PALMO-identified genes that does not appear in the Nature Communications paper also looks at leukemia and other cancers. Li said that some future research will focus on this area as the Allen Institute evaluates whether this assay is suitable for wide-scale immune health profiling.

Because PALMO is a platform, the Allen Institute will be looking at building additional modules and applications, according to Li. The technology is available on GitHub, so the open-source programming community is welcome to develop other apps.

Skene said that the Allen Institute has profiled more than 40 million single human immune cells to date. He expressed the belief that personalized medicine through scRNA-seq has a lot of promise that is growing as the cost comes down, thanks in part to multiplexed single-cell technologies.

"I think the identification of gene panels like this will ultimately lead to this being used in a clinical setting," he said.

Skene said that it would be a second-line test for situations like choosing cancer or autoimmune therapies because the turnaround time is still days rather than hours. "It would never be used in a clinical setting, for example, for the detection of antibiotic susceptibility testing. It's just not fast enough," he said.

Li envisions the test being ordered when bloodwork at an annual physical seems abnormal, for example.