Skip to main content
Premium Trial:

Request an Annual Quote

Coronavirus Pathogenicity Clues Uncovered Using Machine-Learning Approach

NEW YORK – A team from the National Library of Medicine, Broad Institute, and Massachusetts Institute of Technology has started tallying the genetic features that distinguish pathogenic coronaviruses — particularly the SARS-CoV-2 virus behind the ongoing COVID-19 pandemic and the Middle Eastern respiratory syndrome-causing MERS-CoV — from less dangerous coronaviruses.

"We were able to identify several features that are not found in less virulent coronaviruses and that could be relevant for pathogenicity in humans. The actual demonstration of the relevance of these findings will come from direct experiments that are currently getting under way," senior author Eugene Koonin, a biotechnology information researcher at the National Library of Medicine, said in a statement.

For a paper published in the Proceedings of the National Academy of Sciences on Wednesday, the researchers relied on comparative genomics, phylogenetic analyses, and support vector-based machine learning to narrow in on suspicious features shared by the SARS-CoV-2 and MERS-CoV coronaviruses, which they classified as viruses with "high case fatality rate" (high-CFR) coronaviruses. They noted that the machine-learning strategy selected made it possible to pick up differences between these high-CFR viruses and "low-CFR" human coronaviruses that might be missed with genome alignment-based comparisons alone.

"[W]e trained multiple support vector machines across a sliding window to detect regions that confer clean separation between high- and low-CFR virus genomes," the authors explained. "We evaluated the performance of each [support vector machine] via cross-validation and filtered for genomic regions that significantly distinguish the high- and low-CFR genomes."

Based on analyses of more than 900 available coronavirus genomes, the team uncovered 11 seemingly distinct sites in the high-CFR SARS-CoV-2 and MERS-CoV genomes, including sequences coding for the nucleocapsid protein and the spike glycoprotein that interacts with host cell receptors.

When they took a closer look at these changes, the researchers saw signs that the high-CFR viruses produce a version of the nucleocapsid protein with an enhanced nuclear localization signal, while the spike protein for the potentially deadly SARS and MERS coronaviruses shared insertions not found in more mild-mannered, low-CFR coronaviruses.

"The enhancement of the NLS in the high-CFR coronaviruses nucleocapsids implies an important role of the sub-cellular localization of the nucleocapsid protein in coronavirus pathogenicity," the authors suggested, adding that "insertions in the spike protein appear to have been acquired independently by the SARs and MERS clades of the high-CFR coronaviruses, in both the domain involved in virus-cell fusion and the domain mediating receptor recognition."

While functional studies are needed to dig into the potential connections identified in their new analysis, the authors suggested that the features found so far "could be crucial contributors to coronavirus pathogenicity and possible targets for diagnostics, prognostication, and interventions."

"These features correlate with the high fatality rate of these coronaviruses as well as their ability to switch hosts from animals to humans," Koonin and co-authors explained. "The identified features could represent crucial elements of coronavirus virulence and allow for detecting animal coronaviruses that have the potential to make the jump to humans in the future."

The Scan

Enzyme Involved in Lipid Metabolism Linked to Mutational Signatures

In Nature Genetics, a Wellcome Sanger Institute-led team found that APOBEC1 may contribute to the development of the SBS2 and SBS13 mutational signatures in the small intestine.

Family Genetic Risk Score Linked to Diagnostic Trajectory in Psychiatric Disorders

Researchers in JAMA Psychiatry find ties between high or low family genetic risk scores and diagnostic stability or change in four major psychiatric disorders over time.

Study Questions Existence of Fetal Microbiome

A study appearing in Nature this week suggests that the reported fetal microbiome might be the result of sample contamination.

Fruit Fly Study Explores Gut Microbiome Effects on Circadian Rhythm

With gut microbiome and gene expression experiments, researchers in PNAS see signs that the microbiome contributes to circadian rhythm synchronicity and stability in fruit flies.