NEW YORK (GenomeWeb) – Researchers from Danish informatics firm Intomics have completed a study detailing use of the company's protein-protein interaction network to aid in the interpretation of genomic data.
Detailed in a paper published this week in Nature Methods, the work, which was done in collaboration with scientists at Harvard University, Technical University of Denmark (TUD), and the University of Copenhagen, employed Intomics' InBio Map protein interaction network to interpret large sets of cancer and autism sequencing data and place significantly mutated genes into networks associated with these conditions.
The publication is the first public demonstration of Intomics' InBio Map network, Thomas Jensen, the company's co-founder and CEO told GenomeWeb. It comes as the Copenhagen-based firm, which launched in 2009 as a spinout from TUD, prepares to open a US office in Cambridge, Massachusetts.
At the time of submission of the Nature Methods paper, the InBio Map tool contained data on 585,843 protein-protein interactions, making it one of the largest such networks. At the time of publication, it contained 625,641 interactions, and the company has continued to add data, Jensen said.
Intomics has developed the network by collecting and integrating protein interaction data from a variety of databases covering not only humans but also a number of model organisms. Of the interactions included in the version of the network used in the recent study, 57 percent came from human experiments, 68 percent came from either mouse or human, and 95 percent came from human, mouse, rat, cow, nematode, fly, or yeast.
Integration of such heterogeneous datasets presents informatics challenges, Jensen noted, but it allows the company to generate a much larger network than were it to use human data alone.
"What we do differently from many other resources is we combine interaction data not only from multiple databases but also from multiple species," he said. "Much of the data is transferred from model organisms, because that is where a lot of new [protein-interaction] discoveries can be made."
Beyond improving the breadth of the network, use of data from multiple datasets and organisms also improves the data quality, Jensen said.
"We score the quality and reliability of each interaction based on a number of metrics including how many different publications show the same interactions and how many different model organisms show the same interaction," he said. "These [factors] go into a reliability score that we benchmark against a small set of high-quality interactions, and we can see that by applying this approach we get much better data."
This is key due to the high percentage of false positives — typically between 30 and 50 percent — in many protein interaction experiments, Jensen said.
Protein interaction data can be useful in interpreting a variety of different omics data, allowing researchers to identify relationships and functional characteristics not apparent from looking at one form of molecular data alone. Jensen said that Intomics has had particular success in using the InBio Map tool to aid interpretation of genomics data, where techniques like next-generation sequencing have identified large numbers of mutations and variants whose biological impacts are, in many cases, unclear.
Indeed, the challenge of distinguishing between mutations with and without major biological significance is a primary goal of proteogenomics, which integrates proteomic and genomic data, allowing researchers to interpret one form of molecular data using insights from the other.
One line of thinking informing this approach holds that because proteins are the functional molecules that mediate many biological processes, a genetic mutation that leads to an actual change at the protein level is more likely to be disease-related than one that doesn't lead to protein changes.
Intomics takes a somewhat different tack, using protein interaction data to identify, for instance, molecular pathways that might be altered by a particular genetic variation or pathways common to different mutated genes.
"With omics data integration, if you do it on a scaffold of protein-protein interactions you will typically do better at filtering out the noise in the data," Jensen said. "Especially in the genetics field, we see such a large number of genetic variants, [and] the complexity in the genetics data is simply overwhelming. You have so many SNPs in an individual genome or exome that it is almost impossible to find the right combination of those variants that are important for, let's say, drug response."
Jensen added that in work with pharmaceutical clients, Intomics has used its protein interaction data to significantly improve identification of likely responders to a given drug. He cited one particular project in which he said incorporation of this data helped a firm take a drug from around a 45 percent patient response rate to a rate of around 70 percent.
In the Nature Methods paper, the researchers used the tool to place into networks 219 "significantly mutated cancer genes" identified in sequencing data from more than 4,700 tumors genomes, providing information on their potential interactions. They similarly used the tool to identify potential connections between 65 recently identified autism-linked genes.
Intomics has to date used the InBio Map network as part of its contract research work, from which it derives the bulk of its revenues. Recently, however, the company has decided to offer the network as a standalone product, Jensen said.
He noted that Intomics currently has service agreements with roughly one third of "all medium and large pharmaceutical companies in Europe," and that it is opening its new Cambridge office in response to growing demand for its services from US pharmaceutical firms. It plans to open the office in May of 2017.
Jensen said the company, which counts 20 employees and plans to expand that number in the near future, is profitable and is funding its expansion and product development using the revenues from its services business.