NEW YORK (GenomeWeb) – Researchers from the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium have completed a proteogenomics analysis of human colon and rectal cancer tumors.
Published this week in Nature, the study used mass spec to characterize the proteomes of 95 tumors that had been previously analyzed at the genomic level by the NCI's Cancer Genome Atlas initiative.
The analysis identified several proteomic subtypes of the disease, including subtypes not apparent in the genomic data, Vanderbilt University researcher Daniel Liebler, senior author on the study, told ProteoMonitor.
In addition, Liebler said, the study demonstrated that – as previous studies have similarly suggested – mRNA levels are not reliable predictors of protein expression. It also found that gene copy number variations are not, broadly speaking, predictive of protein expression.
These findings, he said, reinforce the importance of proteomic data, both for better understanding tumor biology and for evaluating the significance of observed genomic variations.
Launched in August 2011, CPTAC is a five-year project slated to cost between $75 million and $120 million that aims to combine protein biomarker discovery and verification studies in tumor tissue samples with genomic characterizations of those same samples done by TCGA.
The effort builds on the initial five-year, $104 million CPTC initiative launched in 2006, which worked to build a foundation of technologies and standards to advance the application of proteomics to cancer research. That project established five multidisciplinary, multi-institution research centers and developed collaborations with more than 60 public and private institutions around the world.
The second phase of the program established research centers at eight institutions including Washington University in St. Louis, the University of North Carolina, Boise State University, Pacific Northwest National Laboratory, the Broad Institute, Fred Hutchinson Cancer Research Center, Johns Hopkins University, and Vanderbilt University.
The groups undertook analysis of three tumor types – breast, colorectal, and ovarian – with the aim of profiling around 100 samples of each. This week's Nature paper marks the first publication stemming from these analyses.
Using a Thermo Fisher Scientific Orbitrap Velos for their analysis, the CPTAC researchers identified a total of 124,823 peptides across the 95 samples, corresponding to 7,526 proteins identified with a false discovery rate of 2.64 percent.
In addition to searching against standard wild-type databases, Liebler and his colleagues also searched their spectra against custom databases derived from the RNA-seq data collected for each tumor by the previous TCGA analysis. This, he noted, allowed them to look for genetic variants specific to the individual tumors, which would not have been possible searching only the standard wild-type databases.
In total, the group identified 796 single amino acid variants across the 86 tumors for which RNA-seq data was available, 64 of which corresponded to somatic variants identified by TCGA and 101 of which have been reported in the Catalogue Of Somatic Mutations In Cancer (COSMIC) database. Meantime, 562 of the variants were listed in the Single Nucleotide Polymorphism database and, Liebler said, are likely germline variants unrelated to the patients' cancers. The remaining 162 variants could be novel variants, examples of RNA editing, or perhaps false discoveries, he added.
Liebler said that protein expression of the somatic variants tended to be "considerably lower" than that of the germline variants, suggesting that "there are perhaps some quality control mechanisms that reduce" the expression of these variants.
As Liebler noted, comparing the CPTAC researchers' protein expression data to the mRNA and DNA copy number data generated by TCGA, they found little concordance between the genomic and proteomic data. This, he said, suggests a role for proteomics in helping researchers prioritize gene copy number alterations most likely to be involved in a disease state.
"Even though there are more copies of some parts of the DNA and the RNA may be elevated, that doesn't necessarily mean the proteins will be elevated," he said. "But some of them are and have dramatic effects."
Which will have dramatic effects, however, isn't clear from the genomics information, and "so you really need the proteomics to help you determine which [genetic alternations] are going to be high impact," he said.
In the case of the Nature paper, the CPTAC researchers were able to demonstrate that the chromosome 20q amplicon was linked to the largest changes in protein expression. Of the 79 genes in this region, 40 showed significant correlation between copy number alteration data (as collected by TCGA) and protein expression. Among these 40 were several strong candidates for further investigation, including the genes HNF4A and TOMM34, both of which have been linked to CRC.
The Nature study is also the first, Liebler said, to use shotgun mass spec-based profiling to identify subtypes of a particular cancer. In this, the work is similar to proteomic analyses of TCGA tumor samples done by MD Anderson researcher Gordon Mills using reverse phase protein arrays. However, Liebler noted, while RPPA typically looks at only around 200 proteins per sample, shotgun mass spec measures thousands of proteins, making for subtypes based on a much larger portion of the proteome.
The extent to which this is an advantage remains to be seen. Mills told ProteoMonitor that he has begun an analysis to compare the CRC subtypes generated by his RPPA studies with those from the CPTAC mass spec effort. And while Liebler noted the potential advantage of the larger mass spec datasets, he said that "there is no absolute answer to [the] question" of "how much depth of coverage is needed to see the relevant biological features that distinguish [cancer] phenotypes."
Even looking only at mass spec analyses, this is an open question, Liebler said, noting the tradeoffs between factors like analysis time, amount of fractionation, starting sample size, and coverage depth.
"It's an interesting problem, particularly as we go into the future and analyze more tumors – what is the sweet spot between throughput and depth of coverage," he said. "I think we'll learn that as a consequence of the CPTAC studies. We don't have a good answer right now."
For the Nature paper, the CPTAC researchers used a workflow involving division of each tumor into 15 different fractions, an analysis that took roughly nine months for running the 95 tumors.
With this data they were able to identify five proteomic subtypes among the tumor set, finding that the proteomic subtypes did not mirror the genomic subtypes but, in some cases, further subdivided genomic subtypes in meaningful ways.
Most notably, the proteomic data identified two different subtypes within the TCGA-identified transcriptomic microsatellite instability/CpG island methylator phenotype (MSI/CIMP) subtype. In its comparison of the proteomic and transcriptomic data, the CPTAC team found one proteomic subtype – subtype B – was associated with MSI/CIMP tumors with features including hypermutation, high methylation, and a lack of TP53 mutations and chromosome 18q loss.
However, Liebler said, proteomic subtype C—the other proteomic subtype associated with the TCGA MSI/CIMP classification—displayed protein network features characteristic of the epithelial-mesenchymal transition (EMT), which is associated with rapid metastasis and poor survival. He noted, though, that clear association of subtype C with adverse outcome will require validation studies.
Given these differences and the fact that biological information flows from the genome to transcriptome to proteome, these proteomic subtypes are likely important above and beyond the data offered by genomic subtyping, Liebler said.