Skip to main content
Premium Trial:

Request an Annual Quote

Northwestern Team Working to Improve Top-Down Proteomics Data Acquisition Strategies


NEW YORK (GenomeWeb) – Northwestern University researcher Neil Kelleher is working on new data acquisition strategies for top-down proteomics with the aim of improving the performance of top-down techniques while making them more broadly accessible.

In a paper published this month in the Journal of Proteome Research, Kelleher and his colleagues demonstrated the ability of proteoform-dependent data acquisition software to broaden proteome coverage in top-down experiments. They also expanded the mass range accessible to top-down experiments, identifying 439 proteoforms in the 30 to 60 kDa range.

The size and complexity of the proteome makes it impossible for a mass spectrometer to analyze all the peptides present in a given experiment, and as a result researchers use various data acquisition strategies to guide their analysis.

The development and implementation of such strategies is an area of significant attention in traditional bottom-up proteomics, Kelleher noted. "Decision trees, data-dependent acquisition, data-independent acquisition — it has been a large, complex discussion in the field that is still ongoing."

However, there has been less attention paid to data acquisition strategies for top-down work, he said. In 2014, he and his colleagues presented Autopilot, a top-down data acquisition system that uses data from previous mass spec analyses as well as real-time database searching to improve the breadth of top-down analysis while also upping confidence in the accuracy of identifications.

In the recent JPR paper, the researchers used Autopilot with a Thermo Fisher Scientific Q Exactive HF mass spec to profile human fibroblasts, performing a series of experiments in triplicate using different instrument control and data acquisition methods.

In the first runs, the researchers used the standard top-down settings and acquisition methods for the Q Exactive, Kelleher noted. They then followed that up with a pair of runs employing the Autopilot package.

"And you can see [comparing the three runs] pretty substantial differences," he said.

Looking at the protein level, there was fairly significant overlap between the more commonly used data-dependent top-2 approach and Autopilot. The former identified 235 proteins while the latter identified 204, with around 70 percent of these proteins shared.

Looking at the proteoform level, however, the experiments' identifications overlapped much less, around 30 percent. One factor driving this divergence at the proteoform level was the application of Autopilot to a "SIM march" data acquisition strategy in which the proteome is scanned at the MS1 level in a series of fixed-width selected ion monitoring (SIM) scans. After each SIM scan, Autopilot would interpret the resulting MS1 spectra to determine if they corresponded to previously observed proteoforms or unobserved or poorly characterized proteoforms. In the former case, the instrument moved on to the next SIM window, while in the latter case, the instrument performed an MS2 scan to identify the proteoform.

This process, the authors noted, allowed them to identify a number of lower-abundance proteoforms that went undetected by the conventional approach.

Kelleher said that the goal of the Autopilot work is to incorporate instrument control and acquisition strategies in an essentially automated way where the mass spec instrumentation and its software can make these decisions for researchers based on the goals of their project and data collected on previous samples.

"A lab can achieve the optimization of instrument time," he said. "If last week you did a run with 18 injections, now you have the memory of what was identified and with what metrics and you can say, 'Well, if now I see these proteins with these expectation values or above, then I am fine, they are fully characterized and confidently identified.' [The instrument can] stay away from those and do a deeper dive into the proteome."

"Where we are headed is to really have [Autopilot] run a top-down project," he added. "You say, for instance, 'How much emphasis do I want on quantitative information versus qualitative information? Am I after depth, to get everything I can, or am I after highly differentially expressed proteoforms and that is what I want to characterize?' These are the kinds of questions that underpin our interest in Autopilot. Basically, we want to make the smartest mass spectrometer on the planet."

This sort of built-in capability could help with the larger goal of expanding top-down's reach beyond the realm of specialists, Kelleher noted.

"The challenge is, can we make these instruments for non-experts," he said. "Because the great majority of people are not going to be developing data acquisition logic and real-time decision making. You need a serious informatics crew to pull that off."

Kelleher said that he hopes Thermo Fisher can help with distribution of Autopilot in some capacity, noting that the software is "on their radar." Kelleher's lab developed Thermo Fisher's ProSight software, which the company offers for top-down analysis.

Beyond demonstrating an implementation of the Autopilot package, the JPR study also identified a 439 proteoforms in the 30 to 60 kDa mass range, which has typically been out of reach of top-down proteomics experiments, particularly those using benchtop instruments like the Q Exactive.

Key to improving coverage in this higher mass range was use of what Kelleher called "short transient mode," wherein the analysis is based only on the first 10 to 20 milliseconds of data collection. This limits the mass spec's resolution but significantly increases signal to noise, he said.

"When you shoot ions into an Orbitrap, the first 10 or 20 milliseconds you get this high burst of information and it is at its maximum signal at that time," he said. "Then you get dampening and decay of signal after that."

Conducting such a short analysis runs counter to how many might think to approach such an experiment, Kelleher noted. "People say, 'OK, I have this Orbitrap. I'm doing top-down. I need to max out all the settings. I need 400,000 resolving power.'"

But by doing this, "you basically go add a whole bunch of noise," he said, making it difficult to identify proteins in this higher mass range.

The short analysis time limits resolution, making it impossible to resolve different isotopes and limiting the ability to detect certain modifications, Kelleher noted. "But being able to see a protein is the first part of being able to identify it," he said.