NEW YORK – A team of collaborators from the University of Oxford and the Wellcome Sanger Institute, among others, has developed a sepsis risk stratification method relying on a combination of gene expression testing and machine learning.
The method, which was laid out in a paper published in Science Translational Medicine last week, is based on data from patients hospitalized with sepsis, with the researchers looking at a subset of genes that indicate a patient's immune response to severe infection and that correlates to immune dysfunction.
Those genes were used to distinguish between subsets of patients that have "very different risks," said Eddie Cano-Gamez, a postdoctoral researcher at the Wellcome Centre for Human Genetics at Oxford and lead author on the paper.
The SepstratifieR algorithm developed by Cano-Gamez and explained in the Science Translational Medicine study sorts patients into two separate groups: one with high risk of developing severe disease, and one with lower risk. It also provides a quantitative sepsis response signature (SRSq), a measurement of a patient's risk that can help distinguish those who may fall between the two categories to determine which risk group they're closer to. Each patient falls between zero and one, and the closer to zero a patient's measurement is, the more their immune response resembles that of a healthy person, he said.
The development of the risk groups and one of the signatures used to create the algorithm was done after looking at several thousand genes and narrowing the field to seven using a microarray, said Emma Davenport a group leader at the Wellcome Sanger Institute and another author on the Science Translational Medicine paper. That development was laid out in previous papers, including a 2016 paper in The Lancet Respiratory Medicine, coauthored by Davenport.
The researchers added an additional 12 genes to the signature to make it more "robust," she said, and used both the seven-gene and 19-gene signatures to develop the machine learning models and the SRSq included in the algorithm. The algorithm extracts the expression values of signature genes, aligns the samples to the sepsis reference maps created by the machine learning models, and predicts the group and SRSq, according to the paper.
The study found that SepstratifieR was accurate in sample sizes as low as 20 but became unreliable as the sample size shrank below 20. As a result, the researchers used an additional classification approach to predict risk in each sample independently and included it in the algorithm as a secondary function. "Although predictive accuracy was reduced for this approach, we observed an overall agreement between predictions derived from both methods," the researchers wrote. "In particular, samples at the extremes of the [sepsis risk signature] continuum were reliably identified by both algorithms."
The new study also confirmed the seven- and 12-gene signatures can be accurately measured with both RNA sequencing and qPCR methods, and Cano-Gamez noted the ability to use the algorithm with PCR could make it easier to implement clinically, since hospitals already regularly perform PCR testing.
The original cohort of patient samples was gathered when patients were admitted to an intensive care unit, but the researchers are also looking at data from other time points, Cano-Gamez said. Some additional samples were collected from patients after admission to the ICU, and he said the scores change over time as patients recover and their risk decreases. The algorithm could be used in combination with vital signs as a measurement of risk throughout a patient's stay in the hospital, he noted.
The challenge with clinical implementation is the need for a shorter turnaround time for patients in the emergency room or ICU, Davenport said. As gene expression technology continues to advance, Davenport said it will likely be easier to quickly get gene measurements, making clinical use more feasible. She said that the team hopes the method will be available for clinical use in the next five to 10 years and noted that the method could be used similarly to other blood tests within a hospital and run in a hospital laboratory.
Cano-Gamez said that when using the algorithm clinically, it takes slightly longer since there's only one sample — and thus one data point — so separating out the background noise is harder because the sample must be compared to other samples in the system. In a research setting where multiple data points are being analyzed, it's easier to determine the background noise since it will be present in all of the samples.
He added that the measurements remain the same regardless of a patient's age, sex, or the source of the infection, but noted that the researchers haven't thoroughly explored the effect of ethnicity on the measurements. However, the algorithm is publicly available, so other researchers could apply it to broader datasets with other variables, he said.
While the initial gene expression research focused on sepsis, the newest paper explains how the researchers applied the algorithm to other infections, such as COVID-19 and H1N1 influenza, and found that it works with both. The researchers have also used the method in pediatric sepsis and are planning to expand to other sources of infection, as well as other reasons for hospital admission, such as trauma as some parts of the immune response respond to trauma like it's an infection, Cano-Gamez said. The team is also looking to apply the algorithm to samples from slightly earlier in the hospital admission process, like when someone is admitted to the emergency room.
Davenport noted the potential for the method to be used in clinical trials as well, hopefully in the next five years. Once the gene expression data has been collected, the method could be used to sort patients into groups based on what treatment could work best — an application some collaborators are already working on.