Skip to main content
Premium Trial:

Request an Annual Quote

Q&A: Neil Kelleher on his Proposal for a Cell-Based Human Proteome Project


Name: Neil Kelleher
Position: Professor of Molecular Biosciences, Chemistry, and Medicine, Northwestern University
Background: PhD, Cornell University; Associate Professor, University of Illinois at Urbana-Champaign

Neil Kelleher is a researcher at Northwestern University and one of the foremost experts on top-down proteomics.

In a keynote presentation at this year's annual meeting of the Human Proteome Organization, Kelleher made the case for top-down proteomics as a technology whose time has come, citing work his lab has done on developing a high-throughput top-down platform (PM 11/4/2011).

Last month, Kelleher followed up this presentation with a paper in the Journal of the American Society for Mass Spectrometry in which he put forth a proposal for what he termed a Cell-Based Human Proteome Project.

While the HUPO-led Chromosome-Centric HPP aims to characterize one representative protein for each gene located on a given chromosome (PM 9/14/2012), Kellher's CB-HPP hopes to use top-down proteomics to characterize the roughly 250,000 distinct proteoforms in a given human cell and catalog these proteoforms across an estimated 2,500 cell types, ultimately involving "identification, characterization, and quantitation of over 1 billion detectable protein forms," he wrote.

Such a project, Kelleher acknowledges, isn't feasible using current technology. However, he said, by laying out such an ambitious vision he hopes to foster the technical development necessary to eventually bring it about. In the JASMS paper he suggests a timeline under which large-scale proteoform characterization would begin in 2022, with the project completed by 2030.

ProteoMonitor spoke to Kelleher this week about the proposal, potential benefits and challenges, and his near-term plans for pushing it forward.

Below is an edited version of the interview.

As you note in the paper, this is a pretty ambitious proposal. What's your aim in putting it forth?

It is kind of a comparison to the [Human Genome Project] in some general ways and also some specific ways, but one key general way is ... I likened it to throwing the long ball [in football], which is what the genome project did. You won't get the touchdown unless you throw the pass. And, yeah, it's a low probability when you throw it, but you run under it for the next 15 years and make it a reality, and that is an incredibly empowering thing for a field to do. So, it's OK to be bold and say, "We don't have all the technologies today. We can't see a precise, straight line to get it done."

In the article I describe that bottom-up and top-down could work together, of course, but its clear that the HUPO [initiative] constructions are all in the bottom-up framework, and that now, 10 years in, we see with increasingly clarity some of the limitations of the bottom-up technology. So I'm trying, with this article, to add momentum to the following conversation about, if you could catalog proteins using top-down, is it fundamentally better? Take out the practicality of it. And the answer is, for some things, yes. If that's true, then why not throw a long ball? Where other than the [Human] Proteome Project would be the place to describe to society the benefit of measuring proteins better?

Why propose an entirely new HUPO initiative rather than just adding more top-down measurements to existing initiatives like the Chromosome-Centric HPP?

Well, we as analytical and measurement scientists in mass spectrometry-based proteomics are a subset of scientists. If a Big Science project was going to begin to spin up over the next five or 10 years that could bring the kinds of [large-scale] resources and discussions about what the value of such a project might be, you have to get the [wider] biological community to buy in.

That inevitably led to the question of, well, if proteins are context dependent in their expression, what is that primary context? And you can't have it all. You can't say, "I want proteoforms in tissue and cells and organelles and protein complexes." It's too big of a project. So that is the question: of those different things, on the level of hierarchy of the human body, the way that it's organized, what level is the most important biologically?

There are two separate [tenets embedded in the project.] One is a biological organizational tenet, and the other is an analytical one. Biologists have all sorts of reactions when you talk about cell types, so that is provocative in itself. As provocative as top-down [proteomics] is in analytical circles, so too is the notion of cell types in biological circles. So I'm sort of taking up two fronts at once.

There is a continuum of cell types, but we can still put them in bins, and there is also the recent development of mass cytometry, which aligns quite well [with the proposal]. For instance, in the hematopoietic system, you start with a human stem cell in the bone marrow, and it then proceeds to develop into all the differentiated cell types in the blood. So that's a continuum. But in [a 2011 mass cytometry paper in Science (PM 5/13/2011)] [DVS Sciences founder] Scott Tanner and [Stanford University professor] Garry Nolan do unsupervised clustering and report something like 288 distinct nodes which can be used to define types of [hematopoietic] cells.

So, there is a wide range of people's responses to the notion of a cell type, and biology tends to focus on single cell methods, but I thought that single cell proteomics was just too much technology – that would really be a long ball. To be able to really do deep proteomics in a single cell is something that even with five to 10 years of focused effort I'm not sure we'll be there. There is some practicality here, and there are notions [of the number of human cell types] – there are two main ontologies that circle around a notion of 2,000 to 2,500 cell types. And I'm sure each cell is different in some molecule somewhere, but the fact of the matter is that they can be binned into some groups.

How do you imagine now moving this proposal forward?

Well, the Consortium for Top-Down Proteomics, which is a group of folks who are like-minded – at least in terms of the technology – those like-minded folks could help and actually start to execute a project on a pilot basis on highly defined cells types that are easy to get to, like in the blood – so cell-specific proteomics, which you're starting to see reports, in the literature anyway, of using bottom-up. [Cell-specific proteomics] is already starting because you don't want to analyze the whole resected colon tissue. What you want to know is what is going on in specific subsets of cells, and so people are starting to sort cells and then do proteomics on them.

So the notion of cell type as a unit for organizing a proposal like yours already has a certain amount of momentum?

There are hints of it. It's not like everyone is doing it, because it really increases the bar for sampling. You have to do it very carefully, and it also reduces the amount of sample that you have. So it’s a pain from both perspectives, but the value of the information will sharply go up. That's the value proposition.

Do you have any specific projects related to the proposal underway?

Yeah, we're doing just the very beginnings, which is looking at very defined protein mixtures to see what everybody detects, and how we are going to co-report this data. Because it's clear that a lot of other consortia are not paying attention to the issues that top-down faces. It's different from bottom-up. [For instance,] even UniProt can't take in the full richness of top-down data. It's just not a top-down world. So all these tools have to be built, and so we're doing that, and if you go onto the [Consortium for Top-Down Proteomics] web site you can see early elements of this kind of thing.

So the notion is that having the proposal out there and taken seriously will foster the sort of technology development needed to undertake it?

Yes. That is exactly what happened in the genome project. You map your analytical target; you understand it; you get ready to sample it precisely; and at the same time when you're mapping your target, you develop [for instance] a sequencer. That's why I say that the genome project teaches us a lot about how to organize a big science project. In 1989, no one could do high-throughput genome sequencing, but that didn't stop them from talking about the value of such measurements when and should they become possible.

What is the ultimate goal of the project you've proposed? What technology benchmarks do you need to hit for it to be feasible?

When you can get 250,000 proteoforms detected and quantified for $1 each, and you can do it in [sample sizes of] like 10,000 cells. So far we can get 3,000 proteoforms from roughly 107 cells. Obviously we've got our work cut out for us. But [if] you put people on a focused goal and resource it, it's amazing what the world can respond with.

Those who are at the interface of genomics and proteomics, what they see is disruption in genomics every few years. The market is going this way, the trends are going this way, and then, whoa, three years later the applecart has been upset and we're now talking bananas. Proteomics, [on the other hand,] just kind of bumps along. Iteration is the rule of the day. It's just a slower burn, and I'm in no particular hurry because the world changes only slowly.