NEW YORK – Immunologists at the Wellcome Sanger Institute in Cambridge, UK, have modified an emerging bioinformatics pipeline for single-cell adaptive immune receptor sequencing analysis to unlock new knowledge of the origins of human B1 cells and the development of innate lymphocyte/natural killer cells.
In a paper published in Nature Biotechnology this month, the Sanger team introduced a T-cell receptor-based "pseudotime trajectory analysis" method to the pipeline, called Dandelion.
Lead developer Kelvin Tuong created Dandelion initially for analyzing single-cell B-cell receptor sequences, which Sanger researchers discussed in a 2021 Nature Medicine paper that examined single-cell multiomics analysis of human immune response to COVID-19 infection.
In the new paper, Tuong, now leader of computational immunology at the Ian Frazer Centre for Children's Immunotherapy Research at the University of Queensland in Brisbane, Australia, expanded Dandelion to single-cell adaptive immune receptor sequencing analysis, which the authors call scVDJ-seq because it focuses on the recombination of the variable, diversity, and joining gene regions of B-cell and now T-cell receptors.
Co-corresponding author Tuong was formerly a computational immunologist in the Sanger Institute laboratory of Sarah Teichmann, principal leader of the Human Cell Atlas consortium. Teichmann and colleague Menna Clatworthy — of the Cambridge Institute of Therapeutic Immunology & Infectious Disease — are the other corresponding authors of the Nature Biotechnology paper.
Dandelion is a "a holistic analysis framework for understanding single-cell lymphocyte biology," according to the paper.
In the newly published work, the authors created what they called an adaptive immune response "feature space" that supports both analysis of differential V(D)J usage and inference of "pseudotime trajectory." It is useful for mutation calling, annotation of gamma delta T cells, and analysis of both productive and nonproductive V(D)J contigs.
The researchers said that Dandelion improves annotation of contigs to the point that they were able to discover that they can map multiple J genes onto different regions of a single messenger RNA contig.
"[T]he unexpected finding of expression of nonproductive TCR contigs in specific cell types has the potential to shed new light on lymphocyte development," the Sanger immunologists concluded.
Single-cell RNA sequencing is booming, and with the growth has come the emergence of many new sequencing methods and bioinformatics tool, covering such areas as isoform analysis, transcriptomic profiling and alignment, and longitudinal analysis.
Tuong said that scRNA-seq analysis is new enough that there are still many unintegrated tools for different applications such as gene expression, chromatin accessibility, and immune repertoires because data structures vary widely and many tools were built for bulk analysis. "They don't collapse very well down to the single cell, and that's what we had to resolve," Tuong said.
"There's not a lot of efforts in this space to do joint analysis, partly because maybe there's not many immunologists that work on this kind of data together," Tuong said. "There's this gap in the middle [between RNA-seq and immune repertoire] that we decided we could plug in."
Dandelion is optimized for analysis of 10x Genomics Cell Ranger VDJ output files and is meant to support both multiomic and multimodal analysis. Tuong said that Dandelion could theoretically handle longitudinal analysis of multiomics data like the Allen Institute's Platform for Analyzing Longitudinal Multi-Omics Data (PALMO), though that is not part of his current research.
Reanalysis and annotation of AIR data has largely been standardized by the Adaptive Immune Receptor Repertoire (AIRR) community of the Antibody Society. Software that follows AIRR standards includes Scirpy for single-cell gene expression analysis, scRepertoire for single-cell immune profiling, and TcellMatch for predicting antigen specificity of T-cell receptors.
However, the Sanger team wrote, "There remain opportunities for new methods to realize the full potential of paired scRNA-seq and scVDJ-seq data."
This new research builds on earlier work led by Clatworthy on B-cell receptors in mouse cells. Tuong decided in 2019 to write the software that became Dandelion to analyze human cells, and development accelerated in the early part of the COVID-19 pandemic so Tuong could apply the Dandelion pipeline to a coronavirus dataset.
The 2021 Nature Medicine paper described a version of Dandelion that "was just a way to quickly look at data," Tuong said.
Teichmann and colleagues last year published a series of papers in Science about the Human Cell Atlas. Chenqu Suo, the lead author of one paper on fetal cells, had excess data that did not make it into that series, so she and Tuong got together to apply Dandelion to that data, according to Tuong. Suo is also lead author of the new Nature Biotechnology article.
Suo wanted to understand how T cells and K cells develop in a virus, while Tuong said that he wanted to improve multimodal integration for studying immune repertoires more, which led to this new work.
"Our main innovation in this current paper is our creation of what we call embedding," Tuong explained. Dandelion creates a matrix of values corresponding to different types of cells, specifically zooming in on immune repertoire genes.
"We anchored it to the gene expression space," Tuong said. "Does the immune repertoire follow or dictate any processes?"
The current workflow is optimized for T cells, but Tuong now wants to adapt Dandelion for B cells and eventually apply the technology to pediatrics to support research at the Frazer Centre. He is particularly interested in whether better understanding of immune repertoires might lead to targeted treatments of autoimmune diseases and cancer in children.
Tuong said that Dandelion, which is freely available on GitHub, was downloaded about 3,000 times in the first few days after the paper's release.