NEW YORK (GenomeWeb) – Researchers from Omicia, the University of Utah, and other institutions have developed a method of ranking relevant disease-causing mutations, particularly in genomic samples collected from individuals and family trios, that uses biomedical ontologies to combine phenotype, genotype, and disease information.
In a paper that was published in the April issue of the American Journal of Human Genetics, the developers explain that the Phenotype Driven Variant Ontological Re-ranking tool, or Phevor, works "by combining the outputs of widely used variant-prioritization tools with knowledge resident in diverse biomedical ontologies, such as the Human Phenotype Ontology (HPO), the Mammalian Phenotype Ontology, the Disease Ontology, and the Gene Ontology (GO)." It does so using an algorithm "that propagates an individual's phenotype information across and between ontologies," a process which enables Phevor to "reprioritize candidates identified by variant-prioritization tools in light of knowledge contained in the ontologies." They also describe three case studies where Phevor helped researchers identify mutations involved in undiagnosed conditions.
Charlene Son Rigby, Omicia's vice president for products, told BioInform that the company is the exclusive commercial licensee of the new software and that it plans to integrate the tool into Opal, its commercial variant data annotation and interpretation platform, in the second half of this year. The company expects the tool to appeal scientists analyzing disease in both research and clinical contexts, she said, and so it will market Phevor as part of a product portfolio it has created for both market segments based on its platform. The exact details of how Phevor will be offered as part of Opal, whether it will be part of the free or priced pro features of the platform, for example, are still being worked out.
Omicia has sold Omicia Research for a number of years, providing clients with tools to identify, classify, and annotate genomic variants. The company is now beta testing a second offering called Omicia Clinical, which will provide diagnostic labs with standardized workflows to support NGS-based diagnostic tests as well as mechanisms to return results — Omicia is using the proceeds from a $6.8 million round of financing to develop this product. Son Rigby said that the company is currently wrapping up the second phase of the beta and intends to begin selling the product at the end of Q2 2014, though it has yet to determine pricing.
Essentially, Phevor offers a more automated means of selecting the variants that likely play the most pertinent roles in a particular patient's phenotype from lists of candidates that existing software such as the Variant Annotation, Analysis, Search Tool (VAAST) or Annotate Variation (Annovar) identify, Omicia CEO Martin Reese told BioInform. It's an alternative to relying on disease experts to make the connections between phenotypes and genotypes based on their prior knowledge of the disease or by manually searching for information from the literature, he said. With these less standardized approaches to disease-gene identification "genes not previously associated with the phenotype are not considered — often preventing the discovery of [new] associations," the researchers wrote in their paper. It's also resulted in a lack of "general standards, procedures, or validated best practices," the researchers wrote.
This current implementation of Phevor is a more general version of the first iteration of the software, which, according to an abstract that was submitted for last year's American Society of Human Genetics meeting, used the capabilities from VAAST and Phenomizer — both of which were developed by this same team and incorporated information on gene function, location, and biological processes from the GO. The abstract states that the researchers benchmarked Phevor on 50 known disease-causing variants that were "spiked" into healthy exomes, comparing its performance to VAAST and Phenomizer alone.
The updated version of Phevor that's described in the AJHG paper, Reese said, lets users "replace Phenomizer with a gene list, VAAST with Annovar, and GO with other ontologies if you want," although "it still works with all three as well."
The software works by translating patients' phenotype information into terms used by ontologies such as the HPO and GO, Reese explained. It then lets researchers combine this information with the ranked list of genes generated from analyzing the patient's genomic sample, using solutions such as VAAST, and re-rank the genes with the ones most relevant to the observed phenotype now placed higher up on the list.
To use the software, scientists can either translate a patient's phenotype information into the language of HPO or other ontologies themselves, or they can use Phenomizer to describe the phenotype and generate a list of gene candidates — these are the inputs to Phevor. Once the data is in, the software then associates genes with ontology concepts based on their shared gene annotations. It then "propagates" this information across each ontology by assigning a value of one to "seed nodes" and then "each time an edge is crossed to a neighboring node, the current value of the previous node is divided by two," the paper explains. This process — referred to as "ontological propagation" in the paper — continues until a "terminal leaf is encountered."
Next, Phevor normalizes the values for each node and then assigns a score to each annotated gene that corresponds "to the maximum score of any node in the ontology to which it is annotated" — this step is repeated for each ontology. "These scores are added to produce a final sum score for each gene and renormalized again," the paper states. It then uses a set of equations to rank genes "by their gene sum scores" and to combine their percentile ranks with variant and gene-prioritization scores.
The paper includes the results of benchmark tests that highlight the improved disease-gene associations that Phevor provides when it is coupled with variant annotation software such as VAAST, SIFT, and Annovar. It also describes the results of three studies where researchers involved in the Utah Genome Project demonstrated the efficacy of the approach in identifying alleles associated with genetic diseases. In one family affected by a type of immunodeficiency syndrome, for example, the software associated the NFKB2 gene with the condition — an association confirmed by an earlier study. In a second case, they used the software to identify a dominant allele of STAT1 as a culprit in a patient with intestinal inflammation — this finding was verified with Sanger sequencing.