The most interesting questions in biology — What happens if this is added to a cell? What about this? What about in this cell line? — move beyond characterization and into quantification. When quantifying proteins, researchers have to choose whether or not to label their proteins. Labeling certainly seems like a solid, well-defined route to take. There are plenty of options — ICAT, cICAT, iTRAQ, and SILAC, among others — and overall, labeling tends to offer more precision in results. If you don’t label, well, to begin with you’ve just lost a trendy acronym in your next paper.
But recent advances in label-free approaches are drawing more interest to the area. And it’s widely known that labels come with a variety of problems, ranging from added complexity and potential bias to limiting experimental approaches to one-to-one comparisons.
“The attraction of the label-free is … you don’t need to do the stable isotope labeling, which reduces complexity, reduces cost, [and] also allows you to directly compare many different experiments,” says Richard Smith, a chief scientist who heads up proteomic research at Pacific Northwest National Laboratory.
Still, scientists who use label-free methods — either ion precursor signal or spectral counting — face other problems, especially since not all proteins have an equal chance of being observed in mass spectra. Certain proteotypic proteins pop up in the data more frequently than other proteins do, and researchers have to wade waist-deep into statistics to sort things out.
A Blessing and a Curse
Labels didn’t just come about because proteomics researchers felt bored one day. “The reason the labeled techniques were developed in the first place was because proteomic experiments are big, complicated, messy experiments,” says Parag Mallick, an assistant professor of biochemistry at the University of California, Los Angeles.
Labeling the samples simplifies the experiment since the labels provide a way to tell case proteins from controls as they sail out of the mass spectrometer. Currently, there is a handful of different protein labeling techniques that tag the samples before they get mixed together and sent through the mass spec. ICAT and its sister method cICAT add either a heavy or a light biotin group to a protein’s cysteine residues. Another method, iTRAQ, tags proteins at their N-terminus. But the most common way to label proteins for quantification is through stable isotope labeling, especially with O-18. In isotopically labeled experiments, those samples with the tag are chemically identical to the other samples, but their mass is shifted a little because of the heavy ion, says Ruedi Aebersold, a professor at the Institute of Molecular Systems Biology in Zurich.
Though labeling may simplify identification of the proteins, it creates other problems. “[Labels] add additional complexity to the fixtures, and that’s always a challenge in proteomics,” says Smith. Some researchers worry about how that added complexity affects the peptide samples. One concern is that labels are not easily applied to low-abundance proteins. Others say protein labeling introduces bias in unknown ways. Also, many labeling experiments can only be run in sets of two, four, or eight. “There is just a host of issues that make isotopic labeling imperfect,” Smith adds.
That additional step of work to add a label to the proteins may systematically change the data by not labeling certain proteins or by introducing side products from the labeling reaction. “I was concerned from the very beginning about the ability of labeling technologies to appropriately label low-abundance proteins and peptides in samples because of the chemical kinetics involved in these complex samples,” says Katheryn Resing, a research associate professor at the University of Colorado. “It just seemed to me there was an inherent limitation in the ability to label low-abundance things in a reasonable length of time.”
Even if most of the proteins in the sample do become labeled, just having that extra chemical reaction can introduce chemical artifacts or unwanted side products. In O-18 labeling, Smith says, trypsin is not completely effective so there are partial reactions left behind. “Any sort of chemical introduces biases. Those biases are very poorly characterized,” adds UCLA’s Mallick.
Furthermore, some researchers are looking to multiplex beyond the eight-sample limit of labeling. For the kinds of comparative work scientists are interested in, they might want to compare as many as 20 samples, Resing says.
Recently, more problems have emerged with running samples in multiplex: there are difficulties in finding and isolating both peaks from a complex sample in parallel comparisons, says Resing.
Using a label-free approach solves a lot of those problems, such as added complexity, chemical artifacts, and one-to-one comparisons. Through the two main approaches — ion precursor signal and spectral counting — scientists can compare hundreds of experiments by measuring the peptides directly, although the results are still a relative measure of protein abundance.
“The advantage is that it is easy. It takes advantage of data that you are already collecting in the mass spec. It’s reasonably well-behaved statistically, and it doesn’t require additional treatment of your sample beyond the typical shotgun proteomics experiment,” says Edward Marcotte, an associate professor of chemistry and biochemistry at the University of Texas.
The basic way to quantify proteins without a label is through peak analysis of precursor peptides as they come through the mass spectrometer. By recording the signals of each peptide detected, scientists can use peak heights on the mass spectra to measure intensities. Integrating the peaks to determine the peak area gives a relative indication of protein abundance, and those peaks can be compared. However, other factors such as chemical composition affect how a peptide appears in a mass spectrum; good experimental protocols have to include steps to correct for these different efficiencies.
A more refined approach is spectral counting. This method, steeped in sampling statistics, says that of the pool of peptides that passes through a mass spectrometer, only a random sample is actually sequenced. Thus, the more a particular peptide’s spectrum shows up, the more abundant the protein from which it came must be in the original sample. “If something is very highly expressed, highly abundant, it is more likely to be sampled many times than if it is sampled at a low level, at a lower frequency,” says Aebersold.
Though simpler and less costly to perform, these label-free protein quantification techniques come with their own challenges. First of all, they’re not as precise. According to Smith, stable isotope labeling gives a coefficient of variance of a little more than 10 percent; a label-free approach weighs in at roughly 20 percent. Resing points out, though, that recent data from the Association for Biomolecular Resource Facilities shows that the spectral counting method is as good as, or better than, the labeled approaches in determining protein abundance.
Meanwhile, the precursor ion spectral analysis method has computational limitations. Researchers must have software that can separate a signal cleanly from noise and integrate the peaks, says Aebersold. Spectral counting eliminates that problem but adds its own, many of which can be overcome. This method assumes that every peptide has an equal chance of being ionized — which is known not to be true. Beyond efficiency differences, bigger proteins have more of their peptides pass through the mass spec, so protein size also has to be taken into consideration. “Once all those corrections are put in, then you get to something that … works pretty well,” Marcotte says.
Absolute Quantification: Pipe Dream?
All of these methods are still relative measures of protein abundance. Using label-free quantification to measure absolute abundance proteins would be hard to do, says Aebersold. “I think both spectral counting and also the quantification at the precursor ion level has potential to be reasonably quantitatively accurate, provided you have proper calibration at one time,” he says. “Not much work has gone that far.”
To get that far, proteomic researchers need methods to correct and normalize spectral data. This past January, two papers published in Nature Biotechnology investigated ways to make label-free protein quantification more standardized and reproducible. UCLA’s Parag Mallick and his colleagues found a computational way to determine which proteins are more likely to be those proteotypic ones. In the other publication, Edward Marcotte and his group at Texas developed a way to hypothesize how many peptides are likely to show up in a mass spec sample. Merging these predictions with analytical statistics gives a better peek into the actual protein abundance in a sample.
By analyzing the physiochemical characteristics of the amino acids of proteotypic proteins, Mallick and his colleagues created a tool that predicts which of the peptides from a given protein will be proteotypic. This tool does not fully explain why certain peptides are more common, but it does provide a way to correct for the assumption that all peptides are equally likely to appear in mass spec data. “When we started the project, there was a vague observation that there were some peptides that you just kept on seeing over and over and over again. Nobody had any way to explain why those peptides were there,” Mallick says.
Having an a priori expectation of what is going to be in the results can be used as a means of normalizing and refining the statistics surrounding label-free protein quantification. If the prediction says that a certain protein can be expected nine times out of 10 in the mass spec and it shows up in the data, you can be pretty confident about the experiment. But if there’s little chance that a peptide is going to be in the results and it still pops up, you can assume there may be problems with the data. “The hope was to try to bridge that gap as well so that our models and our probabilities were a little bit more accurate,” Mallick says.
The other paper describes a different yet related tool that predicts the number of peptides that should be seen in mass spectrometry data. The Marcotte lab’s absolute protein expression measurement method, called APEX, takes a quantitative look at protein abundance. By having a known amount of the number of spectra that should be seen in a mass spec, it creates a correction factor. The number of spectra is corrected by the expected background to give a quantitative indicator of protein abundance.
Marcotte describes the method by analogy — it’s like SAGE for mRNA, he says. All the proteins from the cell are cut up into peptides, which are equivalent to the SAGE tags. These proteins are then processed through shotgun proteomics (à la sequencing) and the number of spectra produced from a given protein is counted. APEX allows researchers to go back and correct for how many peptides are expected per protein, for protein size, and for different chemical properties to get a better estimate of protein abundance.
“[APEX is] strictly counting spectra or repeat observations of peptides from a given protein and correcting by this expected number, and that gives you the abundance. It’s actually very simple,” Marcotte says. “The goal of the APEX approach was to put things into an absolute standard so you could actually say that protein X was greater in abundance than protein Y.”
To be sure, there will be plenty of work as scientists continue to tweak the label-free approaches. Engineering and methods development to help conquer the mass spec’s dynamic range issue and accommodate more variables in data will both be prime areas of focus for the field.
“There’s a lot of technical challenges both for label-free and labeled analysis that require further development. It’s clearly a very important direction for proteomics,” Aebersold says.