NEW YORK – A team led by researchers at Mount Sinai Hospital's Icahn School of Medicine have developed a software tool for proteogenomic analyses.
In a paper published this month in Molecular & Cellular Proteomics, they described using the tool, named iProFun, to analyze genomic, transcriptomic, proteomic, and phosphoproteomic ovarian cancer data generated by The Cancer Genome Atlas (TCGA) and Clinical Proteomic Tumor Analysis Consortium (CPTAC), identifying a number of links between genomic alterations and protein and phosphoprotein changes.
The ability to combine multiple levels of omics data provides better statistical power for confidently detecting links between alterations at the DNA level and changes to other molecules like proteins or phosphoproteins, said Pei Wang, professor of Genetics and Genomic Sciences at Mount Sinai and senior author on the study.
Wang is also a CPTAC investigator, and the Icahn School of Medicine is one of the consortium's Proteogenomic Data Analysis Centers. CPTAC has moved heavily into proteogenomics over the last decade, with its current stage — the third iteration of the initiative — exploring how proteogenomic data might help researchers understand patient drug resistance and its development.
Wang noted that analyzing several layers of omics data from the same sample provides a number of potential advantages compared to experiments looking at, for instance, genetic mutations or protein expression in isolation.
"With studies with a genomic focus, people have usually tried to use RNA analysis to see what the effects are of mutations or copy number or methylation in the functional domain," she said.
However, Wang noted, correlation between RNA and protein expression varies widely, and phenomena like protein post-translational modifications aren't reflected at the RNA level.
Additionally, she said, studies looking at multiple levels of omics data have more statistical power, which helps address the problem of false-positive findings
"I think the major benefit of the integrative analysis framework we proposed is that by modeling all the data together we can nicely reduce false positives," she said.
In much of the previous literature, researchers have pursued pairwise correlation analyses, looking at, for instance, the link between DNA copy number and protein expression or DNA mutations and protein expression.
"We, instead, are trying to model [DNA] copy number, mutations, and methylation altogether in this framework, and that actually gives us a more comprehensive understanding of how the cumulative effect of all these different DNA-level events may impact function [at the protein level]," Wang said.
Xiaoyu Song, assistant professor at Mount Sinai and first author on the study, added that the integrated analysis performed by the iProFun tool allowed the researchers to make fuller use of the samples they had access to.
Using more conventional pairwise analyses, the researchers were limited by the size of the smallest available sample set, she noted.
"If, for the proteomic data, we had only 100 subjects but we had 500 with [genomic] data, then we would only be able to get [proteogenomic] information for 100 samples," she said, noting that using the iProFun software the researchers are able to integrate information from the full set of samples they have available for each data type.
In the MCP study, the researchers looked at mRNA expression in 569 patient samples, DNA copy number alterations in 559 samples, DNA methylation from 550 samples, proteomic data from 206 samples, and phosphoproteomic data from 69 samples.
They found that DNA CNAs had much stronger impacts on RNA, protein, and phosphoprotein levels than did DNA methylation with 117 of 676 gene CNAs significantly associated with RNA, protein, and phosphoprotein levels; 340 linked to RNA and protein levels but not phosphoprotein levels, and 43 linked to RNA levels only. In the case of methylation, one out of 1,103 sites was linked to RNA, protein, and phosphoprotein levels; 27 were linked to RNA and protein expression, two were linked to RNA and phosphoprotein levels, 90 were linked to RNA levels only, and one was linked to protein levels only.
Using network analysis to investigate the 117 CNAs that impacted RNA, protein, and phosphoprotein levels, which the researchers termed "cascade" CNAs, they identified the oncogene AKT1 as a key node connected to a number of other cascade CNAs.
"Especially for ovarian cancer, we know that copy number events are a key player [in disease development]," Wang said. "But there are triggering [copy number] events and non-triggering events, and so that is why we are trying to use this kind of analysis to identify the more important events that have large functional consequences."
"AKT1 was one of the key players in that cascade gene set, and that has a lot of support in the literature," she added.
The results also point toward potential drug targets, Wang noted. The researchers identified the genes KRT8 and MAP2 as cascade genes, both of which, like AKT1, "are druggable genes with approved drugs already on the market with indications for other tumors," they wrote.
On the methylation side, the researchers identified the gene BIN2 as impacting RNA, protein, and phosphoprotein levels. Upregulation of this gene, they noted, has been linked to improved outcomes in cervical, endometrial, breast and ovarian cancer in the TCGA studies. The MCP study found that methylation of the gene led to lower protein levels in a set of the ovarian cancer samples analyzed.
"So, this opens up some hypotheses for directions for ovarian cancer research," Wang said.
She said that she and her colleagues have applied the tool to a variety of other CPTAC datasets including kidney cancers and brain tumors.
"We feel this is a very nice framework to take advantage of this large-scale study that is making multi-omics data available," she said.