CHICAGO – Bioinformaticians at the Free University of Brussels have developed a system that predicts gene variant combinations that are pathogenic when present in pairs, giving researchers and, potentially, clinicians and genetic counselors, a computerized way to identify oligogenic diseases.
The platform, called the Oligogenic Resource for Variant Analysis, or ORVAL, combines machine learning and visualization tools to predict the pathogenicity of oligogenic gene combinations. It analyzes networks of pathogenic genes and protein interactions, ranks pathogenic gene pairs, and maps cellular locations and mutation pathways.
"While numerous bioinformatics tools exist that allow the discovery of causal variants in Mendelian diseases, little to no support is provided to do the same for variant combinations, an essential task for the discovery of the causes of oligogenic diseases," the Free University team explained in a recent article in Nucleic Acids Research.
"By identifying and supporting with biological evidence combinations of variants, needing further validation, this tool aims to become an important agent in exploring the relevant molecular and biological patterns underlying oligogenic diseases," the paper said.
ORVAL is a web-based platform to help clinicians and researchers predict the pathogenicity of oligogenic variant combinations in search of some biomarkers and hard-to-diagnose diseases, according to Alexandre Renaux, a PhD student in bioinformatics at the Free University. Renaux presented the work late last month at the Intelligent Systems for Molecular Biology and European Conference on Computational Biology (ISMB/ECCB) conference in Basel, Switzerland.
Users submit genetic variant data either manually or from VCF files, plus desired data filters. The system then generates and annotates variant combinations, and predicts and ranks combinations that might be associated with a disease.
Techniques like genome-wide association studies start from a "monogenic base," potentially making them unsuitable for finding oligogenic causes of disease, according to the Nucleic Acids Research article and the poster Renaux presented. "This problem may be overcome when predictive methods immediately identify the cross-gene pathogenic associations as opposed to first identifying single genes and then trying to link them using alternative sources of information," the article said.
Renaux explained that oligogenic prediction has become feasible only in the last few years as there has been enough scientific literature about digenic and bilocus cases to create the Digenic Diseases Database (DIDA), which others at the Free University of Brussels and the related Interuniversity Institute of Bioinformatics did in 2015.
DIDA led to the university developing the Variant Combination Pathogenicity Predictor (VarCoPP), a computing method to predict bilocus variant combinations that may indicate specific diseases.
The VarCoPP machine learning method is described in an article that appeared in May in the Proceedings of the National Academy of Sciences. After training and testing VarCoPP with data from the 1000 Genomes Project and pathogenic variant combinations previously reported in DIDA, the Brussels team validated the tool on almost two dozen pathogenic variant combinations reported in studies that came out after the latest DIDA update.
With the development of ORVAL, VarCoPP been folded into the new platform.
The ORVAL platform predicts "edges," according to Renaux. "It's a way to see if there are synergies between the variants inside the genes and then aggregate them," he said.
Renaux noted that there can be several variant combinations between two genes, each carrying different mutation burdens. VarCoPP generates a prediction score that shows the probability of whether a variant combination causes a disease.
"The link between digenic prediction and this is to say that together we have several synergies happening, and then you will get this kind of leverage [in making a decision]," Renaux said.
The existing 250 oligogenic variant combinations in DIDA now have been linked to 54 different diseases. Even with a small dataset, "the signal is pretty clear," Renaux said at his poster presentation. The Brussels team, led by machine-learning specialist Tom Lenaerts and Interuniversity Institute of Bioinformatics founder Guillaume Smits, reported low rates of false positives for most of the gene combinations they studied.
However, the system is far from perfect. The researchers admitted in the paper that interpretation of oligogenic characteristics of individual phenotypes still requires some manual inspection.
Renaux said that his team is unsure whether this is the right path in bringing digenic and oligogenic prediction together. "It's kind of a hypothesis that we are just starting with," he said.
Next month, the group will add 100 more oligogenic data combinations to the 250 already in DIDA, based on recently published literature. They will retrain the machine learning models on the new data.
In the future, they are looking at integrating parent-child trio variant data to look for patterns of inheritance. "The integration of the patient's phenotypic information and its relation with other phenotypes could also offer more context to the results," they wrote in the journal article.
They also want to be able to support cohort analyses and are building a data pipeline to do so.
"The predictive quality of the methods used in ORVAL is dependent on the quality of the data in DIDA, which is noisy in the sense that not every instance has the same quality, even though they are all from peer-reviewed publications and curating efforts were made," the researchers wrote.
To improve the quality of DIDA, the researchers aim to expand it and introduce mechanisms to enable community rating of its content so high-quality subsets can be used for training and debated cases can be excluded with clear motivations, the researchers noted.
This early iteration of ORVAL is mainly for research, but the two hospitals affiliated with the university occasionally use the platform to match specific patients to variant combinations. "They will not use it directly for counseling because it's still a research and discovery tool, but it will help them with a combination of different evidences to make a decision for counseling," Renaux said.
However, he sees the platform evolving into something that could be used for counseling. Renaux said that the genetic team leader on the project wants ORVAL to become a "kind of diagnosis tool."