Skip to main content
Premium Trial:

Request an Annual Quote

PathOlogist: An Open Source Tool For a Pathway-Centric View of Data

PathOlogist is a new quantitative pathway analysis tool being developed at the National Cancer Institute to allow researchers to enter data of various types, for example gene expression analysis data, and then automatically determine to which pathways a given gene set may belong. Once complete, scientists will be able to associate clinical data with pathway behavior or to isolate pathways or parts of pathways that might be implicated in disease.
PathOlogist was developed in Kenneth Buetow’s National Cancer Institute Laboratory of Population Genetics. Buetow is also director of the NCI’s Center for Bioinformatics and project leader for caBIG, the Cancer Biomedical Informatics Grid.
“The key feature of PathOlogist is that it evaluates biologic network interactions,” Buetow told BioInform in an e-mail. Because these interactions reflect the structure of the pathway, the quantitative measure of activity or consistency of each interaction is different from signatures obtained by gene state alone, he said. “It therefore allows for the analysis of multiple genes acting through the constraints specified by the network.”
The tool converts quantitative gene expression information into up and down "states," and calculates the probability of a gene’s state. Using the calculated scores “it identifies pathways that classify observations of interest and it can also perform survival analysis for classified groups,” Buetow said.
Basic researchers, translational scientists, clinical investigators, and prevention scientists could all benefit from this tool, he said.
In his view the tool will not just allow scientists to compare results to pathways stored in databases, but it could also enable discovery about the functional roles gene interactions and networks play. “It allows researchers to map phenotypes to processes or networks, understand biologic response, associate disease with pathway, understand coordinated effects of genes, and understand how pathways modulate drug response,” he said. “One can assess if canonical pathways are accurate and explore effects associated with additional or modified interactions.”
Sol Efroni, a post-doctoral fellow in Buetow’s lab, has been focusing his work on signaling pathways, especially those in cancer, and has developed the algorithms in this tool. Writing in PLOS One last year, Efroni, Buetow, and Carl Schaefer, who curates the NCI Pathway Interaction Database, outlined out in their paper published last year, networks spell out the molecular interaction of genes and their products but mapping disease phenotypes to changes in these networks is currently “accomplished indirectly and non-systematically.”
One way for a disease to develop, is when a pathway becomes disrupted in more than one spot and in more than one way. With that mechanism in mind, the scientists sought to take tumor traits such as malignancy or stage and link them to pathways in a quantifiable way.
The phenotypic and molecular heterogeneity of cancer give clinician-scientists much information about potential clinical outcomes. To better parse this data, Efroni and his colleagues developed algorithmic methods to characterize and quantify these tumor differences and link them to the pathways these characteristics represent. The pathway quantification method is a way to obtain a pathway-centric analysis of genome-wide data.
As Efroni explained to BioInform in an e-mail, “The network information was out there. So was a lot of the gene expressions data. So it seemed natural to ask the question ‘How do the two relate?”’
Pathway as the Measuring Unit
The NCI researchers creating PathOlogist are currently testing the tool, which essentially uses pathways as the unit of analysis. Efroni said the tool delivers answers to questions scientists might ask of their gene expression data such as: “How well does this data fit current pathway knowledge and given the pathway knowledge, how active are each of the pathways in a given sample?”
Efroni said PathOlogist uses his algorithms to determine gene states from the experimental data; genes are “up” or “down” and the gene states are applied onto the context of gene and protein networks “to statistically characterize and quantify phenotypic differences,” he said.
PathOlogist involves two descriptive metrics, a consistency score and an activity score. These scores let users “look for disease signatures at the pathway level, instead of the gene level,” Sharon Greenblum, an NIH Research Fellow and co-collaborator on the project told Bioinform. She added that the tool helps scientists move beyond quantitatively analyzing single genes to being able to quantitatively and systematically analysis of pathways.
She is integrating the various modules of PathOlogist and “making it easy to access” for researchers who want to run their data through it.
Running Through Pathways
The database is the starting point for the analysis of a given gene set. PathOlogist takes a set of gene expression data and also the intersection of those genes with each other to quantify the entire set of interactions. The tool calculates scores based on the set of more than 500 canonical pathways in the Pathway Interaction Database, a curated resource run jointly by the National Cancer Institute and Nature Publishing Group.
“The idea is to identify out of that set of 500 pathways [those that] play an important role in distinguishing the samples,” she said. The pathway activity score tells scientists if the pathway is on or off, if the average likelihood of the pathway’s individual interactions adds up to the pathway itself being active, given the calculated gene states.

“It allows researchers to map phenotypes to processes or networks, understand biologic response, associate disease with pathway, understand coordinated effects of genes, and understand how pathways modulate drug response.”

The activity score delivers information about “whether the interactions in the pathway are primed and ready to take place, whether all the components that need to be expressed are expressed for the pathway to run its course,” she said.
Consistency scores are a measure of pathway logic, telling users if the gene interactions are occurring as would be anticipated by previous pathway knowledge. By comparing the expected outcome with de facto outcome, it calculates for users “if the expression values of genes that interact with each other … are what you’d expect,” Greenblum said.
For example, if two genes are expressed and are expected to interact with each other as well as with a third gene, the algorithms determine whether the expression of that third gene matches the first two.
Activity and consistency values are calculated for each gene interaction within a pathway. Then there is an overall pathway activity score, a measure applied to the entire pathway, as the average of activity scores for each interaction within the pathway. The same holds true for overall consistency scores.
For example, as Efroni explained, if a clinician-scientist has 50 breast cancer samples with gene expression results and seeks to do pathway analysis on these samples, the software analyzes the CEL files and tells the researcher which pathway is most critical in any phenotypic classification they find interesting in their samples. 
A scientist could probe “inside the pathway, look at specific interactions,” he said. “Such interactions could serve as the basis for further investigation into the reasons [why] the pathway is of critical importance.”  
Finding a Home for the Tool
“Right now [PathOlogist] doesn’t live anywhere, it is just on my computer, it is still a work in progress,” Greenblum said. It is currently in beta testing stage and the team is still working on ways to benchmark the tool. Soon, she said, without a firm date, the scientists will be putting it in an ftp server to take it to the next round of testing.
“It is all programmed in Matlab and uses a lot of the Matlab tools and functions,” Greenblum said. The team has not yet decided if the tool will be web-based or a piece of software that researchers can download, but they are leaning toward the download model, she said.
PathOlogist is also being tested at other, undisclosed, academic labs in which the researchers seek to see “how well the tool does diagnosis,” he said. He explained that the tool “does outperform any other method we looked at,” although he did not offer details. As soon as the tool runs smoothly, he said, the group will release it publicly.
The tool and algorithm also do well at stratifying patient groups according to survival, he said. “Future benchmarking will concentrate on our ability to drill further into specific interactions within classifying pathways.”
Once the tool has been validated, users will not only be able to load in their own gene expression data, custom-made networks can be included in the tool, too, Efroni said. Researchers will then be able to compare their pathway results with pathways in curated databases.
The scientists say PathOlogist is not like commercially available tools. “I am unaware of commercial tools with comparable capabilities,” Buetow said.
“The idea is not really to find new pathways, which I think a lot of commercial tools do, but to really go through and systematically identify pathways that are implicated,” Greenblum said.
Some feedback the team has received includes requests for adding basic statistical analyses at the back end of the output. Greenblum said she recently added a heat map view of the pathway scores to visualize the PathOlogist data output.
Physician-scientists are also potential users of the tool, and might benefit from a quick first overview of their data in terms of pathway analysis, she said.
As the scientists state in a poster describing their tool, PathOlogist can generate scores for “any number of samples” and for any subset of the entire pathway collection. Pathways can be viewed individually and scores can be visualized in several ways. For example, the tool can display genome alternations by reading copy number data and overlaying it in the pathway context.
These features deliver a comprehensive view of pathway flow to the users and can help identify perturbations from expected pathway behavior, they stated. “We believe the integrated analysis made possible by this tool will prove helpful for pathway-based study of biological information.”
Is it Reality?
As caveats, the scientists pointed out in their paper that the current knowledge of biological pathways is “incomplete and imperfect” meaning also that the identified processes are “almost assuredly not the only factors influencing the phenotypes of interest.”
Also, they noted that “probabilistic classification of genes into alternative states of down and up is a simplification of much greater complexity patterns of gene behavior and action.”
PathOlogist is an addition to the few open source tools available for pathway analysis including Cytoscape, an open source network analysis workbench to visualize data and integrate it with data from a variety of platforms. Its most recent version 2.6.0 is available here.
GenMAPP is a free visualization software platform scientists can apply to pathways. As an extension of GenMAPP, Pathvisio1.1 released earlier this month lets researchers draw pathways and include many different types of data. The layout of pathways is not dynamic; Pathvisio incorporates pathways as drawings.
PathVisio was developed by BIGCat Bioinformatics, a collaborative effort between several universities started by bionformatician Chris Evelo of the University of Maastricht in the Netherlands where it now has its home.
Then there is the newly introduced WikiPathways, a wikipedia-based community-driven annotation of pathways.
Whether commercial or open source, pathway analysis methods are set to be energized by efforts that include Cancer Genome Atlas project that will deliver “an unprecedented level of molecular detail about tumors,” said Chris Sander, chair of the computational biology center at Memorial Sloan Kettering Cancer Center in his talk in the BioPathways special interest meeting at the recent Intelligent Systems for Molecular Biology conference in Toronto.
He also struck a cautionary note. While there is generally great interest in creating pathway models motivated by cancer biology, “pathways, in my view, are not a reality but they are models we have,” he said.
As models of biological and cellular goings-on “they are a reduction, beautiful and intellectually rewarding reductions of a lot of experiments,” he said. The models are useful to capture some aspects of cell biology, especially those that cause disease. Looking at these models “we constantly have to ask ourselves how good those models are relative to the underlying reality, relative to what we want to do with those models,” how to move them forward to reflect disease understanding as well for example in disease prediction or as part of disease therapy.

Filed under

The Scan

Study Finds Sorghum Genetic Loci Influencing Composition, Function of Human Gut Microbes

Focusing on microbes found in the human gut microbiome, researchers in Nature Communications identified 10 sorghum loci that appear to influence the microbial taxa or microbial metabolite features.

Treatment Costs May Not Coincide With R&D Investment, Study Suggests

Researchers in JAMA Network Open did not find an association between ultimate treatment costs and investments in a drug when they analyzed available data on 60 approved drugs.

Sleep-Related Variants Show Low Penetrance in Large Population Analysis

A limited number of variants had documented sleep effects in an investigation in PLOS Genetics of 10 genes with reported sleep ties in nearly 192,000 participants in four population studies.

Researchers Develop Polygenic Risk Scores for Dozens of Disease-Related Exposures

With genetic data from two large population cohorts and summary statistics from prior genome-wide association studies, researchers came up with 27 exposure polygenic risk scores in the American Journal of Human Genetics.