Skip to main content
Premium Trial:

Request an Annual Quote

Kelleher Lab IDs 5,000-plus Proteoforms in Largest Human Top-Down Analysis to Date


Northwestern University researchers have completed the largest top-down proteomics study of a human cell line to date, identifying 1,220 proteins and more than 5,000 proteoforms in H1299 cells.

The study, detailed in a paper published last week in Molecular & Cellular Proteomics, focused in particular on mitochondrial proteins, identifying 347 such proteins, a figure representing roughly 23 percent of all annotated mitochondrial proteins and nearing the number of IDs made in comparable bottom-up experiments, which have typically identified in the range of 500 to 800 proteins.

The results suggest that – at least for low-mass proteins – top-down proteomics can offer coverage approaching that of bottom-up work, said Adam Catherman, first author on the paper and former graduate student in the lab of Northwestern researcher Neil Kelleher, who led the project.

"In the low-mass [range], we can approach comparable coverage, definitely the same order of magnitude [as bottom-up experiments], and with much more information about [post-translational modifications] and sequence cleavages," he told ProteoMonitor, noting that, "five or 10 years ago," the field "was in a very different place."

Indeed, the MCP paper is the latest in a string of recent high-throughput top-down proteomics studies. For instance, in May, a team led by Pacific Northwest National Laboratory researcher Ljiljana Pasa-Tolic published a paper in Proceedings of the National Academy of Sciences presenting a top-down study of Salmonella typhimurium in which they identified 563 unique proteins and 1,665 proteoforms (PM 5/31/2013). Pasa-Tolic and her colleagues are also preparing for publication of a top-down study of Escherichia coli in which they identified 1,249 unique proteins and more than 4,000 proteoforms.

Kelleher's lab was the first to demonstrate truly high-throughput top-down proteomics, identifying more than 1,000 unique proteins and 3,000 proteoforms in a 2011 Nature paper (PM 11/4/2011).

The recent MCP effort used a workflow similar to that in the Nature study, employing a separations procedure consisting of an initial stage of solution isoelectric focusing, followed by gel-eluted liquid fraction entrapment electrophoresis, then nano-LC and mass spec analysis on a Thermo Fisher Scientific Orbitrap Elite.

The study also employed a mitochondrial purification procedure developed by Kelleher's lab and presented in a paper in the January edition of Analytical Chemistry. That work, said Catherman – who is currently an associate scientist at Genentech – was focused primarily on top-down identification of membrane proteins – a class of proteins for which, he noted, the mitochondrial preparation worked particularly well.

Membrane proteins have traditionally proven a challenge for mass spec analysis due to the difficulty of maintaining their solubility during mass spec sample prep and the tendency of solubilizing detergents to interfere with mass spec signals.

With the mitochondrial preparation technique, however, "we think we are able to solubilize membrane proteins fairly well, Catherman said. "So, even without heavy enrichment we are able to see a lot of membrane proteins."

The researchers were aided in their analysis by the fact that membrane proteins "tend to fragment very, very well," he added. "So doing top-down analysis of membrane proteins is interesting because they actually behave quite well ... and are really amenable to identification."

While bottom-up, peptide-based proteomics has traditionally dominated the field, interest in top-down methods has grown as improvements in methods and instrumentation have made analysis of intact proteins easier and researchers have become increasingly aware of the importance of protein isoforms and post-translational modifications.

And, indeed, the Northwestern team identified a large number of protein isoforms, including several novel forms such as varieties of signal peptidase complex subunit 1 and keratinocyte associated protein 2 that exhibited translational start sites different from that given in their UniProt annotations; a previously unidentified myristoylated plasminogen receptor; and a previously unannotated palmitoylated Golgi vesicular membrane-trafficking protein.

Currently, Catherman said, the researchers are able to do searches on roughly 20 million proteoforms. However, he noted, such searches require significant levels of computing power. For the MCP work, for instance, Catherman and his colleagues used a 168-core computing cluster.

The study demonstrates that, while top-down "is still lagging behind [bottom-up proteomics], it is really getting closer," PNNL's Pasa-Tolic, who was not involved in the project, told ProteoMonitor.

One lingering challenge, however, she noted, is the amount of sample required for top-down workflows, particular those, like the Northwestern team's, that feature relatively involved separation steps.

"I think the main issue is that if you look at the amount of material that is needed to go through this particular pipeline to pull out this many IDs, it is not nearly as sensitive as bottom-up is at this point," she said.

High-throughput top-down methods face challenges in other areas, as well, most notably limitations in terms of protein size. According to Catherman, top-down workflows like that employed by he and his colleagues work best for proteins between 10 kD and 25 kD with "a fairly large drop off at 30 [kD] or 40 [kD]."

This is due to a variety of factors, including poorer fragmentation, more difficulty with chromatographic separation, and more challenging bioinformatics, he said. "Everything becomes more difficult at higher mass."

In addition to extending top-down to higher mass ranges, researchers are also interested in making the technique more quantitative, Catherman noted. For instance, in the MCP paper, the Northwestern team compared normal and senescent H1299 cells, identifying several changes to the senescent proteome, including hyperphosphorylation of the protein HMGA2.

These observed changes, however, did not qualify as "quantitative evidence of a biological difference," the authors wrote, noting that generating such evidence would require "a robust platform for intact protein quantitation that can quantify the relative abundance of proteoforms between treatments."

The move toward quantitation "is an ongoing theme of investigation," Catherman said, adding that it would be particularly important as top-down moved into clinical applications. He said that while there have been some experiments looking into using labeling methods like SILAC and isobaric tagging in top-down research, "none have really been adopted yet."

Quantitation "is something that we really need to bring to the table in order for this technology to become truly applicable for what people want to do," Pasa-Tolic agreed.

The biggest keys to quantitative work, Catherman said, are adapting software for quantitative measurement of intact proteins and, in the case of label-free approaches, achieving reproducible separations. Given the complexity of the separation procedures used for high-throughput top-down work, this latter aspect could prove especially challenging, he noted.

In particular, the isoelectric focusing and gel-free fractionation steps are currently very difficult to perform reproducibly, he said.

"For these multi-stage separations having some sort of internal reference or label would be a much better approach" to quantitation, Pasa-Tolic suggested. The top-down field "has not done these experiments yet, but people have shown in the bottom-up world that using multiple separations stages significantly complicates this label-free quantitation."