NEW YORK (GenomeWeb) – Researchers at the University of California, Berkeley have developed an enrichment and tagging workflow for improved analysis of glycoproteins on a proteome-wide scale.
The approach, detailed in a paper published this week in Nature Methods, allows researchers to simultaneously analyze the location and structure of glycan modifications on proteins, said UC Berkeley researcher Carolyn Bertozzi, senior author on the paper and leader of the effort.
Glycoproteins are of considerable interest in protein biomarker research as the majority of clinically approved cancer protein markers are glycosylated proteins. To date, however, it has been difficult verging on impossible to thoroughly analyze glycoproteins on a large scale, Bertozzi noted.
This is largely due to the high degree of complexity and heterogeneity of the glycan forms modifying proteins, she told GenomeWeb.
For instance, in the case of post-translational modifications like phosphorylation or acetylation, researchers are looking for the addition of molecules of known masses and charges, allowing them to search for the presence of these modifications with relative ease using existing mass spec approaches.
Glycoproteins, on the other hand, are modified by a wide variety of glycan structures, which significantly complicates mass spec-based analysis.
"There is so much diversity and structure and heterogeneity [with glycans] that there is no way you can compute all of the possible structures and add them to a database," Bertozzi said. "So there has been no method out there to actually find glycopeptides from a full proteome digest where the full heterogeneity of the glycans remains intact."
Instead, researchers have typically focused on either half of the question — using treatments to truncate glycans in a consistent manner so the resulting glycopeptides can be identified in much the same way as phosphopeptides; or, removing the intact glycan structures from the proteins and analyzing them separately.
Both approaches, though, result in the loss of some information. In the former, researchers are able to analyze the number or glycosites and where on peptides they are located, but they get no information about the structure of the glycans attached to those sites. In the latter, researchers are able to characterize the glycan structures in their full complexity, but at the cost of severing the link to the specific proteins to which they were attached.
Ideally, researchers would like to look at both, Bertozzi noted, particularly given the thinking that change in glycosylation structure could be as important, or in some cases more important, than changes in expression of the modified protein.
She cited the example of prostate-specific antigen. "Levels of PSA are modestly useful [prostate cancer] biomarkers," she said. "But dig down into the glycosylation structures and you get better correlation with disease."
"So then the fundamental problems are, how do you enrich the glycoprotein and how do you figure out which ions correspond to glycosylated peptides when you cannot know the mass?" she said.
To do this, Bertozzi and her colleagues developed an enrichment and tagging workflow for mass spec analysis of intact glycopeptides.
Around a decade in the making, the method, named IsoTaG, uses chemical enrichment with an isotopically labeled probe to both enrich for glycopeptides and label them for easier detection by the mass spectrometer.
Glycoproteins are first metabolically labeled at the proteome scale with labels that allow them to be enriched via standard click chemistry. They are then pulled down using isotopically coded probes. After this enrichment and trypsin digestion, the resulting glycopeptides are sent on for mass spec analysis while still attached to the isotopically coded probe.
This isotopic coding allows Bertozzi and her colleagues, using an algorithm developed in their lab, to identify through an initial MS1 scan which analytes are glycopeptides. With this information they are able to build inclusion lists used for selecting only the glycopeptides for MS2 analysis and, through this focused analysis, they are able to characterize the peptide and its glycosite, as well as the structure of the attached glycan.
In the Nature Methods paper, the researchers used the approach to look at Jurkat, PC-3, and MCF-7 cell lines and identified 32 N-glycopeptides and more than 500 O-glycopeptides from 250 glycoproteins, of which 220 peptides and 120 proteins had not been previously shown to be glycosylated.
They also observed a number of different glycoforms on the same peptides, including one peptide on which they detected five different forms of glycans and several on which they detect four different forms.
"There is still room for improvement," Bertozzi said. "But even to get to where we have both the structure of the glycan and the site [for a limited number of proteins], we're pretty happy."
She predicted that in future experiments using modified versions of the isotopically coded probes, the researchers would be able to observe 10 times the number of glycopeptides they did in this study. Key to improvement of the method are modifications of the probe to remove bromine atoms that reduced the efficiency of the electron transfer dissociation mass spec approach they used in their work.
"That tag is not very good for ETD fragmentation," Bertozzi said. "We have second-generation tags without the bromine atoms and the ETD fragmentation is much better."
She said the group would also like to get its software for identifying the isotopically tagged glycopeptides incorporated into vendor software in order to generate inclusion lists for MS2 on the fly as opposed to having to do two separate mass spec experiments. The researchers used a Thermo Fisher Scientific LTQ Orbitrap XL for the Nature Methods work.
Bertozzi and her group are currently using the technology in collaboration with Stanford University researcher Donna Peehl to profile the glycoproteome of prostate cancer samples, looking for possible biomarkers and developing a better understanding of the glycoproteomics underlying the disease.