Skip to main content
Premium Trial:

Request an Annual Quote

HudsonAlpha Software Targets Gene Rankings From Phenotypic Data


CHICAGO (GenomeWeb) – A new software tool from the HudsonAlpha Institute for Biotechnology promises to simplify and improve the accuracy of ranking genes from phenotypic information. The web-based tool, called PyxisMap, also helps to alleviate some of the bottlenecks that bioinformaticians face in secondary analysis of genomic data.

HudsonAlpha software developer Brandon Wilk and informatics postdoctoral fellow Matthew Holt publicly unveiled PyxisMap at the International Society for Computational Biology's Intelligent Systems for Molecular Biology (ISMB) conference in Chicago this month. "We've demonstrated that using the tool to re-rank variants in rare disease cases significantly improves the ranking of clinically reported variants," they said in their presentation.

PyxisMap builds on earlier software from Huntsville, Alabama-based HudsonAlpha called Codi, which itself was an outgrowth of the Medical College of Wisconsin's CarpeNovo genomic analysis software. The common thread in both is Elizabeth Worthey, director of software development and informatics at HudsonAlpha, who formerly worked at MCW.

"Codi is a tool that makes sense of the genomic data," Worthey told GenomeWeb. "You go from variants through the whole annotation and filtering and prioritization process."

However, Codi is not built to handle phenotypic data. PyxisMap fills in that gap.

The new application pulls patient descriptors from tags in electronic medical records, clinical summaries, and indications for genetic testing. "You can put that into the tool and it will, at the end, give you a prioritized list of genes that are associated with those phenotype terms," Worthey said.

PyxisMap generates rankings from a graph-based structure of phenotypic terms, following numerous public databases, including the Human Phenotype Ontology, the Online Mendelian Inheritance in Man, PubMed, and PubTator. This helps ensure that the results are based on the most current medical knowledge, according to Worthey.

"It's not just using a single mapping," she said. "If you use just the data from OMIM or ClinVar or [the Human Gene Mutation Database], you're missing the most recent mappings or relationships between genes or variants in phenotype terms."

This actually takes some of the human variability out of the decision-making process. "You can actually do a really good job in silico, with no human interaction, of finding the causal variants," Worthey said, an important consideration for so many institutions with a shortage of bioinformatics talent.

"We have to innovate so that we can sequence all of the rare-disease patients," Worthey said. "We try to develop tools that an MD or a PhD lab director can use."

Worthey noted that other tools for organizing phenotype data rely on command lines, à la MS-DOS of several decades ago. "They're not necessarily intuitive, but this tool is super easy. You literally can give it the text and it will give you the genes back," she said.

In the year or so that PyxisMap has been under development, HudsonAlpha tested both it and Codi against variants known to be associated with very rare disorders from Harvard's Undiagnosed Diseases Network. Codi was able to identify the causal variant about half the time from just the indication and the variant call, Worthey said.

Working together, PyxisMap and Codi have been able to put the causal variant in the top 20 about 95 percent of the time.

"When you can do that, you're in some cases all the way to being able to, in the future, kick humans out of that process," Worthey said. "Even if all you can do is get it in the top 20, it still saves a hell of a lot of time."

HudsonAlpha has applied the technology to a research project involving whole-genome sequencing of patients with cystic fibrosis.

"On the face of it, it seems like a really crazy idea because you wouldn't have to look at CFTR anymore to know if that patient has got CF. You just do a sweat test," Worthey said.

"But patients with CF, even with the same molecular causal variant, they're all very different. Clearly, some of that may be environmental, but a lot of it, the hypothesis is that it is genetic, and that it what we are finding," she explained.

"When you're looking for modifiers, you want to find variants in the loci that are associated with that same pathway that's gone awry. You can also use something like PyxisMap for that, because once you make that graph, that gives you the ability to map between different phenotype terms and genes or loci," Worthey said.

For now, PyxisMap is only available through the HudsonAlpha website. The institution plans on publishing the source code at a later time.