NEW YORK — Using multiomic data, researchers from the US National Cancer Institute have developed a comprehensive dataset that links genetic mutations found in cancers to their resulting phenotypes.
According to the researchers, a proteogenomic approach links genomic mutations to their effect on cellular physiology, and their new dataset encompassing this could facilitate pan-cancer studies.
In Cancer Cell on Monday, the NCI's Clinical Proteomic Tumor Analysis Consortium (CPTAC) described their methods for developing harmonized genomic, transcriptomic, proteomic, and clinical data for more than 1,000 tumors from 10 cohorts, which will be publicly available to researchers worldwide.
The CPTAC, launched in 2006, is a long-running and well-funded proteomics initiative that began with the launch of the NCI's Clinical Proteomic Technologies for Cancer (CPTC) initiative, a five-year, $104 million effort focused primarily on developing and evaluating proteomic tools and workflows.
Having standardized procedures to generate and process data is important for consistency, as applying different tools to the same dataset may lead to different results and sometimes even different conclusions.
For somatic mutation calling in their dataset, for instance, the researchers integrated results from the Broad Institute and Washington University in St. Louis pipelines, which each include multiple algorithms. They also developed a tool called OmicsEV to compare different proteomic data quantification pipelines. The tool uses more than a dozen metrics to assess data depth, normalization, batch effect, biological signal, platform reproducibility, and multiomics concordance.
Meanwhile, in two accompanying papers, both published in Cell, CPTAC investigators showed how the dataset could be applied. In the first paper, the researchers used a multiomic pan-cancer analysis to identify shared oncogenic driver pathways across 10 cancer types.
They processed and analyzed proteogenomic data from 1,064 participants with 10 cancer types, including information on genetic alterations, DNA methylation, transcriptomics, global proteomics, and phosphoproteomics. They found that genetic changes correlated with altered, tumor-specific protein-protein interactions. Moreover, they unraveled molecular mechanisms of oncogenic mutations, noting that most cancer genes converge toward similar molecular states denoted by sequence-based kinase activity profiles.
"Our findings support the proteome as a missing link between the genotype of oncogenic drivers and their functional states," the authors wrote.
The researchers also used the new dataset in the second study to investigate post-translational modifications (PTMs) across 11 cancer types, and identified 33 pan-cancer multiomic signatures. Their findings underscore the contribution of PTMs to processes known to be affected in cancer such as DNA repair, immune response, metabolism, histone regulation, and kinase regulation.
According to the CPTAC researchers, some investigators have already used their proteogenomic data to find new molecular subtypes, prognostic markers, novel protein variants from alternative splicing and RNA editing, and extensive post-translational regulation for protein complexes for various cancers.
They however cautioned that pan-cancer data researchers must carefully adjust for batch effects across different cancer types.
"Although large-scale DNA-sequencing studies have been vital in identifying cancer driver mutations, proteogenomics further enhances epigenomic, transcriptomic, and proteomic data to reveal functional consequences and therapeutic vulnerabilities," the authors said in the first Cell study.