Skip to main content
Premium Trial:

Request an Annual Quote

MD Anderson Building Proteomic Database of Cancer Cell Lines


NEW YORK (GenomeWeb) – A team led by researchers at the University of Texas MD Anderson Cancer Center has completed a proteomic analysis of more than 700 cancer cell lines and compiled them in a publicly accessible database.

Published this week in Cancer Cell, the study complements similar tumor characterization work the researchers have done in patient samples and helps establish that cell lines can be effective models for testing hypotheses generated using patient data, said Han Liang, associate professor of bioinformatics and computational biology and MD Anderson and senior author on the paper.

The work is part of a broader effort by Liang and his MD Anderson colleague Gordon Mills to develop a resource containing proteomic characterizations of thousands, and ultimately tens of thousands, of cancer patient tissues, cell lines, and other samples.

In the Cancer Cell work, the researchers used reverse phase protein arrays (RPPAs) to measure levels of roughly 230 cancer-related proteins across 706 cell lines, including 651 independent lines. Of those lines, 246 were unique to the MD Anderson study, while 460 had undergone DNA and RNA profiling as part of other projects. The cell lines spanned 19 different lineages, with six of these lineages — lung, blood, head and neck, breast, ovarian, and skin cancer — each containing more than 50 different lines.

One of the major findings from the study was the fact that the researchers were able to use the cell line data to largely recapitulate the different groupings and cancer subtypes identified in their previous analyses of patient RPPA data. This, is significant, Liang noted, because it suggests that cell lines, which are relatively easy to access compared to patient samples, show similar relationships and correlations as actual tumors.

This wasn't certain to be the case, given the complexity of patient samples, Liang said. "In patient cohorts you always have concerns like: Is the tissue contaminated? What is the tumor purity?" Additionally, such samples also typically contain some cells from the tumor microenvironment, which are thought to affect tumor development and behavior.

On the other hand, "in cell lines you are working on a highly purified environment where the data is presumably only from the cancer cells," he said. "When people want to test their hypotheses, they often really want to use a cell line model as a first line to test if the correlation [they are investigating] really does demonstrate a causal effect. And I think that this [study] gives confidence that we can use a diversity of cell line models to quickly screen hypotheses generated from patient cohorts."

One issue had been that, previously, cell line studies had not looked at enough different lines to reproduce the kind of diversity present in actual tumor samples, Liang said. While George Mason University researcher Emanuel Petricoin and his colleagues have generated RPPA-based proteomic profiles of the National Cancer Institute's NCI-60 collection of 60 cancer cell lines, and Technical University of Munich professor Bernhard Kuster has generated mass spec-based profiles of the same cell line collection, studies of that size were not sufficient to evaluate the kind of diversity observed in patient samples], Liang said.

"Our study really expanded that sample size by another order of magnitude so we could capture that diversity across many different cancer types," he said, noting that it took his team several years to collect the cell lines for analysis.

The researchers also looked at the extent to which the cell line protein profiles were correlated with drug response and how predictive protein-level data was compared to genomic and mRNA data.

Comparing protein data to mRNA data, they found that, as many proteomics researchers have suggested, protein level data was more predictive of drug response.

"Proteins show reasonably good correlation with mRNA in these cell lines, but it varies from protein to protein," Liang said. "Some show relatively high correlation across different cancer lineages, but there are other proteins, especially phosphorylated proteins, that cannot be predicted well from the messenger RNA data."

The study highlighted the link between the protein ARID1A and the MEK inhibitor trametinib as an example of a relationship apparent only at the protein level. ARID1A expression, the authors noted, "was significantly higher in [trametinib]-sensitive cell lines than resistant cell lines," while mRNA expression levels and ARID1A mutation status were not predictive of trametinib sensitivity.

To enable use of the cell line data by the broader community, Liang and his colleagues developed an online platform, the MD Anderson Cell Lines Project (MCLP) platform, that allows researchers to analyze and visualize the RPPA data. The platform consists of four modules: My Protein, which provides detailed information on proteins measured in the RPPA; Analysis, which allows researchers to explore correlations between different proteins, proteins and mutations, and proteins and drug response; Visualization, which allows users to look at protein data either as part of protein interaction networks or heatmaps; and the Data Sets module, which allows researchers to examine and download the different datasets comprising the MCLP.

The MD Anderson team is now in the process of integrating this resource with the patient data resource and plan in the future to add another resource that will contain RPPA data on patient-derived xenograft (PDX) samples.

"So people will be able to test their hypothesis moving from the patient cohorts to a cell line model to the PDX data," Liang said.

The MD Anderson team plans to continue adding to the resources, he said, noting that they will soon have data from more than 1,000 cancer cell lines. They currently have patient sample data from more than 8,000 patients, most of which came from work on the NCI's Cancer Genome Atlas project, for which Mills has led the proteomic characterization efforts. This patient data comprises MD Anderson's Cancer Proteome Atlas (TCPA), a database of RPPA-based analyses that Mills launched in 2013.

MD Anderson was in 2015 named the site of one of the NCI's Genome Characterization Centers. The center's GCC, which Mills heads, is focused on proteomic analysis, and RPPA analyses in particular and participates in a number of NCI initiatives including the Exceptional Responders Initiative, the ALCHEMIST precision medicine trials, the Cancer Driver Discovery Project, the Cancer Trials Support Unit, and the Cancer Therapy Evaluation Program.

This status, Liang said, provides the center with a steady flow of samples, data from which he and his colleagues plan to continue to add to their databases.

"We are going to receive a lot of samples from these sort of consortium projects, so we will continue to incorporate this new data into this proteomics resource," he said.