NEW YORK (GenomeWeb) – Researchers at the Karolinska Institute's Science for Life Laboratory have developed a protein localization database that provides subcellular mapping for nearly 12,500 proteins across five cell lines.
Detailed in a study published last week in Molecular Cell, the resource provides an in-depth look at protein localization patterns across the proteome as well as their responses to events like alternative splicing and drug perturbation.
The researchers are now looking to expand the database, which they have termed the SubCellBarCode, to include information on the effect of post-translational modifications on protein localization and on early responses to treatments with various targeted cancer therapies, said Janne Lehtiö, head of the cancer proteomics mass spectrometry group at Karolinska and senior author on the paper.
The researchers built the database by combining subcellular fractionation with mass spec-based quantitation of proteins across these different fractions.
Key to the effort was development of a fractionation approach that would provide good subcellular resolution while maintaining the robustness and reproducibility needed to compare protein levels across samples and cell types, Lehtiö said.
While gradient centrifugation is commonly used to separate subcellular compartments from each other, Lehtiö said that it quickly became apparent that this approach would not be reproducible enough for the study's purpose. Instead, the researchers first extracted the soluble proteins from the cell cytosol and then followed that by a series of centrifugations in which they separated the remaining sample into four fractions.
This approach provides less resolution than would more extensive fractionation, Lehtiö said, but noted that "the robustness is very good, which allowed us to do good comparisons" across samples and cell types.
Keeping the number of subcellular fractions at five also allowed Lehtiö and his colleagues to run them in duplicate in a single 10-plex isobaric tagging mass spec experiment, which further reduced variability.
The researchers looked at five different human cancer cell lines, (epidermoid carcinoma A431, glioblastoma U251, breast cancer MCF7, lung cancer NCI-H322, and lung cancer HCC827), identifying and quantifying 12,418 proteins across the full set of samples, with 8,140 proteins quantified in all five cell lines. The authors noted that they observed a high level of correlation of protein localizations between replicates and that principal-component analysis and protein correlation network analysis "showed distinct clustering of samples as well as proteins based on fractionation profiles, indicating that the generated data enable resolution of distinct subcellular compartments."
The researchers found that relatively few proteins had multiple localizations, which Lehtiö said came as something of a surprise.
"It has been claimed that many proteins have multiple localizations, and we could find some clear examples of that, but it was not really a predominant feature when you look in a proteome-wide analysis," he said. "We made a lot of effort to test for [multiple localizations], but it was very difficult to find evidence that more than maybe 10 or 15 percent of proteins have multiple localizations."
They also looked at the role of splice forms altering protein localization, combining RNA-seq data with high protein mass spec coverage enabled by the use of isoelectric focusing in combination with conventional LC separations. Lehtiö noted that although the group initially found several splice variants with localization different from the normal protein form, these were the exceptions, not the rule.
"What we found was that it was not predominantly that different splice variants are in different locations," he said. "We were able to robustly detect splice variants, but they were predominantly in the same cellular location, so our conclusion was the opposite of what we started with. As things look now, we can't find evidence that [alternative splicing] is a predominant way of controlling protein localization."
The researchers also demonstrated the potential of their approach to illuminate the mechanisms of drugs, looking at protein localization in an EGFR-mutated lung cancer cell line after treatment with an EGFR inhibitor.
Lehtiö said the researchers observed that, as expected, "when you inhibit EGFR, the EGFR adaptor proteins immediately lose contact [with EGFR] and are sequestered in the cytosol. We could easily pick up those hallmark protein localization changes."
They were also able to see early changes in the localization of transcription factors moving in and out of the nucleus, he said, noting that the localization information could prove an interesting window into what genes and proteins are involved in a cell's immediate response to a drug or other stimulus.
Lehtiö said that he and his colleagues were now expanding their work to look at the effects of other targeted cancer drugs.
The researchers also plan to study how protein localization is altered by different post-translational modifications, which Lehtiö noted is a one of the major ways cells direct proteins to different locations.
"We scratched the surface of this [in the Molecular Cell study], but in order to do it properly we need to generate more data," he said, adding that he and his colleagues are currently looking to collaborate with labs with expertise in working with particular protein PTMs.
Lehtiö said he plans also to expand the analysis to additional cell types and to convince other proteomic labs to use the approach for analyses of their own.
"It's a very robust and easy method, so we hope to get more people within the community to contribute data and help build up the subcellular proteome database," he said.
The researchers are also collaborating with the Stockholm Royal Institute of Technology's Cell Atlas project, which provides intracellular localization data for more than 12,000 proteins.
Lehtiö said he obtained several of the cell lines used in the SubCellBarCode project from the Cell Atlas researchers "so that we could use exactly the same batch of cells they used.
The Cell Atlas is antibody-based, "so our mass spec data offers an orthogonal method to study the same phenomenon," he said.