Skip to main content
Premium Trial:

Request an Annual Quote

Machine Learning Helps ID Genes, Cell Types Contributing to Severe COVID-19

NEW YORK – With the help of machine learning, a team from the US, the UK, Italy, and the Netherlands has narrowed in on genes and cell types that appear to contribute to COVID-19 severity in relatively young individuals.

"Altogether, our study unravels a genomic landscape of COVID-19 severity and provides a better understanding of the disease pathogenesis, with potential for new prevention strategies and therapeutic targets," co-senior author Michael Snyder, chair of genetics at Stanford Medicine, and his colleagues wrote in Cell Systems.

The researchers identified more than 1,000 genes involved in the risk of severe COVID-19 infections — a collection that seems to account for an estimated 77 percent of the genetic heritability of severe COVID-19 risk — while linking severe COVID-19 risk to certain cell types, including the immune system's natural killer (NK) and T cells. Severe COVID-19 has also been linked to advanced age, higher body mass index, prior health conditions such as diabetes, and socioeconomic risk factors.

Using a RefMap machine learning method, the team analyzed genome-wide association study data for more than 5,101 COVID-19 cases and nearly 1.4 million controls with European ancestry from the COVID-19 Host Genetics Initiative, bringing in single-nucleus RNA sequencing, single-nucleus ATAC-seq, and other omics profiles from a prior study of human lung tissue and related cell types to see how severity-associated variants relate to gene regulation across cell types.

"For the first time we combined single-cell data that map regulatory regions in different cell types with GWAS data and machine learning to identify both the genes and cell types responsible for COVID severity in younger folks," Snyder explained in an email.

The results were further shored up by validation analyses based on summary statistics from a large 23andMe COVID-19 GWAS that included more than 15,000 cases and more than 1 million SARS-CoV-2 infection-free controls. The researchers also analyzed relationships between the cell types expressing genes implicated by the RefMap machine learning analysis, providing a glimpse at cell-cell interactions suspected of contributing to more severe forms of SARS-CoV-2 infection in individuals under 60 years old.

All told, the investigators highlighted 1,370 genes with apparent regulatory ties to genetic variants associated with severe COVID-19 disease across multiple cell types. They noted that rare and common variants falling in RefMap regions implicated in severe COVID-19 were particularly common in hematopoietic cells relative to epithelial cells, for example, along with T cells and a subset of NK cells.

"The specific type of NK cells are involved in cytokine production," Snyder wrote. "Our results suggest the genes we identify are not as well expressed in affected individuals and thus we speculate that susceptible patients might be weakened in their cellular responses."

Although the study's authors cautioned that "genetic discovery data is largely focused on European ancestry, which may limit widespread applicability," they noted that the findings may ultimately yield strategies for boosting NK cell activity in COVID-19 patients or for genetically predicting the risk of severe COVID-19 in adults under 60 who are less likely to have immune responses that are dampened with age.

"Our findings lay the foundation for a genetic test that can predict who is born with an increased risk for severe COVID-19," Snyder said in a statement.