NEW YORK – A team led by scientists at Stanford University has found a way to diagnose multiple infectious and autoimmune diseases using T-cell and B-cell receptor sequencing data and machine learning.
T cells and B cells recognize pathogens or antigens involved in autoimmune disorders via cell surface receptors called T-cell receptors (TCRs) and B-cell receptors (BCRs).
Unique TCR and BCR repertoires are generated through V(D)J recombination. When one of these receptors successfully binds to a pathogen, it triggers rapid clonal expansion of its cell.
"This variability helps the immune system detect virtually anything, but makes it much harder for us to interpret what the immune system is targeting," said Maxim Zaslavsky, a postdoctoral researcher at Stanford and lead author of a study published in Science this week.
In the study, he and his colleagues demonstrated that immune receptor sequencing, aided by machine learning, could be used to distinguish a range of diseases without prior knowledge of antigen-specific receptor patterns.
The researchers hypothesized that decoding these sequences could prove a more accurate and efficient way to diagnose disease than existing methods, particularly in the case of autoimmune disorders, which often requires multiple lab tests coupled with imaging and other clinical information.
"We can sequence thousands, even millions of immune cells' antigen receptors in our bodies," Zaslavsky said, "each able to bind to a different target, but interpreting this complex and unique dataset from each person has remained the key challenge."
He and his team developed and validated a computational method to address that challenge called Machine Learning for Immunological Diagnosis (Mal-ID).
Immune cell receptor sequencing for disease diagnosis is not new. It is already used, for instance, to aid in the diagnosis of soft tissue cancer such as lymphomas, where BCRs and TCRs can serve as biomarkers for cancer cells. Zaslavsky and his colleagues pointed out, however, that few studies to date have investigated integrating TCR and BCR data for diagnostic purposes.
They trained Mal-ID using three separate TCR and BCR models, combining the output into a final immune status prediction model. Each of the triplicate training models focused on different aspects of T- and B-cell biology. Together, they combined traditional immunological analyses, such as shared sequences between individuals with the same condition, with more complex features derived from newer artificial intelligence (AI)-derived protein sequence-based approaches called protein language models.
The Stanford team tested Mal-ID on peripheral blood samples from 593 individuals, comprising 16.2 million BCR heavy chain clones and 23.5 million TCR beta chain clones. Mal-ID correctly identified the immune status of 542 samples from patients with COVID-19, HIV, lupus, type 1 diabetes, recent flu vaccination, and healthy controls.
The team noted that in the case of disorders known to be mediated more by one immune cell type than another, such as T cells in the case of type 1 diabetes, the models specific to those cells outperformed models specific to the other cells. In all cases, however, the models combining both cell types performed the best.
"Previous studies often focused on one or the other cell type, but we saw clear evidence that combining both gives a fuller picture of immune activity," said Scott Boyd, professor of pathology at Stanford and the study's senior author.
Importantly, the team saw little evidence of batch effects and only limited impact from the age, sex, and race of the individuals studied.
As a measure of overall accuracy, Mal-ID achieved an area under the operating characteristic curve (AUROC) of nearly 0.99. AUROC is a widely used measure of diagnostic accuracy.
Zaslavsky said the method reflects overall sensitivity and specificity over a range of decision thresholds for disease identification. "The threshold to be used will depend on the clinical context, such as which disease we are focusing on, its prevalence, and the importance of minimizing false positives or false negatives," he said.
Boyd attributed the accuracy at least in part to having conducted careful controls to ensure that observed immune signatures can be generalized to new datasets, and to the models being primarily driven by immune receptor sequence signatures rather than confounders. Nonetheless, he said his team plans to conduct further studies to confirm the results, replicate them in other cohorts, and integrate them with other diagnostic tools.
Federico Giorgi, an associate professor of pharmacy and biotechnology and head of the computational genetics lab at the University of Bologna, who was not involved in the study, called Mal-ID "a perfect application of old school multivariable regression on a huge dataset of immunological disorders with an innovative AI-influenced addition –– protein language models –– to implement structural features."
One potential concern is that the protein language models used to derive more complex features from the biological samples could prove too computationally bulky for some labs, he said.
"My fear with AI is that its execution is so computationally intensive that it will be impossible to perform without immense budgets, therefore excluding smaller labs from the race for future discoveries," Giorgi said.
Zaslavsky acknowledged the concern but said that such obstacles are likely to be overcome by technological advances elsewhere. "While our study did require a GPU for the analysis, access to GPUs is becoming more widespread, especially as large language models have exploded in popularity," he said. "Platforms like Google Colab offer free GPU access for small-scale projects."
Stanford has submitted a patent application covering Mal-ID, but Zaslavsky said that any commercial application remains far off.
"We want to emphasize that what we have today is a proof of concept showing there’s information we can extract from immune receptors, but more validation is needed before this can be used in the clinic," he said, adding that even in a clinical setting, this would likely not become a standalone test.
One key remaining question is how and to what degree comorbidities affect Mal-ID's accuracy.
"Many autoimmune disease patients have not one but several autoimmune conditions with overlapping symptoms," Zaslavsky said. "Directly measuring the immune cells that may be driving the diseases could give us deeper insight into the underlying mechanisms of disease."
Boyd added that his lab is interested in the possibility of applying the technique to cancer diagnosis and prognosis, such as evaluating which cancer patients respond to immune therapies.
Such an application would in some ways bring the Mal-ID method full circle, as it builds on work done to develop personalized cancer therapies. "Next-generation genomic sequencing allowed us to understand some types of cancer well enough to guide patients to targeted treatments," Boyd said. "These sequencing advances have only started to trickle into autoimmune disease and other fields of medicine where diagnostic challenges remain."