NEW YORK — Researchers have generated a pan-cancer proteomic atlas of cancer cell lines that shows the cancer proteome can be interrogated to find vulnerabilities, including drug sensitivities.
Using a data-independent acquisition mass spectrometry (DIA-MS) approach, the team generated the proteomes of nearly 950 cancer cell lines, representing more than 40 different cancer types, a resource they have dubbed the ProCan-DepMapSanger. By combining this proteomic dataset with other molecular and phenotypic data, the researchers uncovered thousands of protein biomarkers tied to cancer vulnerabilities, including ones not found at the transcriptome level.
"This proteomic dataset is a high-quality resource for mechanistic investigation of network organization and regulatory principles of the proteome, as well as for translational discoveries," senior author Roger Reddel from the University of Sydney and his colleagues wrote in their paper, which was published Thursday in Cancer Cell.
The researchers generated proteomes for each cell line using six replicates, for a total 6,864 DIA-MS runs. This approach, they noted, enables the reproducible generation of proteomes at scale. Their resulting ProCan-DepMapSanger dataset, which includes more than 40 cancer types and represents 28 different tissue types, encompasses nearly 8,500 proteins and expands the molecular characterization of these cancer cell lines.
By analyzing the dataset, the researchers could further identify the single-cell origins of the cell lines, as well as uncover drivers of the protein expression patterns they observed. For instance, using multiomics factor analysis, they combined their dataset with a range of other omics data, including gene expression and promoter methylation data, to find an enrichment of epithelial-to-mesenchymal transition markers across the cell lines.
Additionally, the researchers combined their proteomic dataset with data from drug and CRISPR-Cas9 gene essentiality screens. Through this, they uncovered dozens of drugs that were associated with the protein-level abundance of their target. For instance, they noted a negative association between EGFR abundance and its inhibitor gefitinib and a negative association between MET abundance and its inhibitor. For other drugs, they uncovered associations with the abundance of proteins functionally related to their targets.
The researchers also developed a deep learning-based computational pipeline called Deep Proteomic Marker, or DeeProM, to identify protein-level biomarker associations that are not captured at the transcriptomic level. By analyzing more than 4 million drug-protein and 86 million CRISPR-Cas9-protein associations, they homed in on 7,698 drug-protein and 5,823 CRISPR-Cas9-protein associations indicating possible cancer vulnerabilities and protein biomarkers. After tissue-level filtering, they uncovered 108 drug-protein and 1,538 CRISPR-Cas9-protein associations that could not be predicted from gene expression data alone.
These included links between FOXA1 transcription factor knockout and levels of basigin (BSG), a plasma membrane protein that is expressed in breast cancer cells. BSG, the researchers noted, has been found to be involved in breast cancer progression and is linked to poor overall survival in patients with basal-like and triple-negative disease.
The pipeline further identified more than 100 tissue-type-level drug-protein associations, such as a link between sensitivity to an Aurora kinase inhibitor and the protein level of peptidyl-prolyl cis-trans isomerase H in bone-derived cell lines.
Random downsampling to 1,500 proteins had little effect on the power to predict drug response, as compared to the full proteome, the researchers added, noting that this finding underscores the connectiveness and co-regulation of protein networks.
"[T]his dataset represents a major resource for the scientific community, for biomarker discovery, and for the study of fundamental aspects of protein regulation that are not evident from existing molecular datasets," Reddel and colleagues wrote. "This will enable the identification of targets (including cell surface proteins) and treatments for validation in cancer tissue cohorts, with applications in precision oncology."