
NEW YORK – An Italian research team has identified multiple genetic variants that may play a role in determining the severity of COVID-19, the disease caused by the virus SARS-CoV-2.
The data is the outcome of an ongoing study called GEN-COVID that involves 35 hospitals across Italy. The researchers presented their findings during the European Society of Human Genetics virtual annual conference, which was webcast this week.
Alessandra Renieri, director of the medical genetics unit at the University Hospital of Siena is leading the project and provided an overview of the study during her talk. According to Renieri, the GEN-COVID project aims to develop a patient registry and biobank for studying COVID-19 and is focused on understanding the genetic and molecular basis of susceptibility to the virus as well as better comprehending the genetic profiles of patients to guide future therapy decisions.
According to Renieri, the GEN-COVID project began collecting samples from around Italy on March 16 with these goals in mind. The researchers collected phenotypic information on patients via questionnaires containing 160 clinical items, as well as whole-exome sequencing data generated using the Illumina NovaSeq 6000 instrument with a mean depth of 200x.
"That way we would be able to capture not only heterozygous and homozygous germline mutations, but also somatic variants and multicopy genes," said Renieri.
A pilot phase involved 35 cases and 150 controls from hospitals in Tuscany. They at first set out to test the hypothesis that susceptibility was due to common factors, yet a search for common genes that might influence clinical outcome failed to give statistically significant results with the exception of two genes, OR4C5 and ZNF717. The authors reported the results of the pilot phase in a medRxiv preprint last month.
This result was not unexpected though, and using autism spectrum disorder research as an example, they reanalyzed the cohort to look for rarer mutations that may be impacting clinical outcome. Using this Mendelian-like approach, they identified for each patient about three pathogenic mutations involved in virus infection susceptibility.
"We identified rare variants relevant for infection in many patients," said Renieri, noting that many were also reported as pathogenic in the database ClinVar. "Whole-exome sequencing showed that a combination of common and rare, or even private variants, is responsible for COVID-19 susceptibility and severity," said Renieri.
The investigators then conducted a first validation phase that included 131 cases and 250 controls drawn from a network of 35 Italian hospitals and clinics. A variety of machine-learning tools yielded a larger cache of genes of interest. Renieri underscored the involvement of the Siena Artificial Intelligence Lab in the effort, which provided a host of machine-learning tools and algorithms, including logistic regression, Extra Trees, linear support vector machines, neural networks, and a tool called RuleFit.
"New mathematical models and machine learning approaches are necessary to untangle the multisystem presentation of COVID-19," said Renieri. Still, they felt that different variations in different individuals were more likely to underpin disease susceptibility, and using other machine-learning tools they determined that rare variants in genes such as JAK2 and CCDC114 were present in some cases.
Using another tool, semantic-based regularization for learning and interference, which was developed by partners at Siena, they were able to identify a number of genes that were specifically relevant for different organs and how individual organs responded to SARS-CoV-2.
A second validation phase to follow up on these discoveries is now underway, Renieri said. It will involve 2,000 cases and 3,000 controls. She said about 500 cases of the second validation phase have already been run to date and that by August, they expect to have collected the remaining 1,500 samples for the study.
In addition to looking at common and rare variants and how they might impact disease susceptibility and severity, Renieri noted that the researchers will also look at gender in the second validation phase, as men are typically considered to experience more severe reactions to the virus.
"We are analyzing the cohort as a whole, and the next step will be to divide it by sex, to see if the results are the same," Renieri said. "We expect some differences, of course."
Renieri added that the outcome of GEN-COVID could have implications for healthcare decisions, including the possible repurposing of medicines for treating COVID-19, as well as in drug development efforts. She added that the COVID-19 biobank developed via the project will be made accessible to academic and industry partners.