NEW YORK – Researchers affiliated with the Human Proteoform Project have produced an initial reference map of the proteoforms present in human hematopoietic cells.
Termed the Blood Proteoform Atlas, the resource was presented last month in a paper published in Science and is the first of what the project's participants expect will be a series of efforts to catalog proteoforms in various human cell types. Ultimately, the project aims to characterize the proteoforms in 5,000 cell types at a depth of 1 million proteoforms per cell, making for a total measurement of roughly 5 billion proteoforms, with an estimated 50 million of those being unique.
In the Science study, the researchers identified a total of 56,813 proteoforms across 21 different human hematopoietic cell types and plasma, 29,620 of which were non-redundant. The number identified in individual cell types ranged from 9,991 in Pan T cells to 303 in naïve B cells.
Most, if not all, proteins exist in a variety of forms, featuring slightly different amino acid sequences due to splice variants, or different lengths due to truncations, or different combinations of post-translational modifications. These different forms are known as proteoforms, and the presence and proportion of different proteoforms within a cell are key to all manner of biological processes, influencing things like protein localization or protein-protein interactions or cell signaling. To fully understand the role proteins play in different aspects of biology and disease, it will likely be necessary to understand not just which proteins are expressed under different conditions, but which specific proteoforms are present, as well.
This is the rationale driving the Proteoform Projetct, which its founders formally proposed last fall. Confidently measuring proteoforms at proteome-scale is a daunting challenge, however. Proteomics is just now, after more than two decades of research and technical development, reaching the point where experiments are able to detect proteins to most protein-coding genes. And while there has been extensive research into certain specific kinds of proteoforms — phosphorylated proteins, for instance — study of proteoforms has generally lagged behind.
The findings from the recent Science study indicate how wide the gap currently is between existing technical capabilities and what will be necessary to create a human proteoform map with the breadth and depth the Proteoform Project researchers envision.
For instance, top-down mass spectrometry — the technology most commonly used for proteome-scale proteoform research — struggles when it comes to measuring larger proteins. In the Science study, 93 percent of the identified proteoforms were less than 20 kDa in molecular weight. This molecular weight cutoff leaves out the majority of the human proteome, including, the study authors noted, larger proteins key to various cellular processes that could have more proteoforms per protein than average. Extrapolating from their findings, the researchers calculated that primary hematopoietic cells probably featured another 50,000 proteoforms in the 30kDa and below range and that the average number of proteoforms in a human cell type is around 1.1 million.
In short, the study captured only a small fraction of the proteoforms present in the human body, indicating, as the authors noted, "a clear need to improve technologies" for proteoforms research.
"We need a 100-fold jump in the technology, whatever form that takes," said Neil Kelleher, director of the Chemistry of Life Processes Institute at Northwestern University and senior author on the Science paper as well as one of the leaders of the Proteoform Project. "We need to go from 10,000 proteoforms in a given cell type, to a million, and in the same amount of time and for 1/100th of the cost."
At the moment, he said, mass spectrometry remains the obvious technology for proteoform mapping, but he said he expected the field would see disruption from emerging approaches, including, perhaps, recently launched proteomics tools companies like Nautilus Biotechnology or Quantum-Si, both of which have highlighted their platforms' abilities to investigate proteins at the proteoform level.
Kelleher and his collaborators have put the price of the Proteoform Project at around $1.3 billion over 10 years — a substantial investment, but one that he argued will foster an array of new technologies and businesses much as the Human Genome Project has done.
"Everybody will feast from a defined proteome, and all the return on investment of these [recently launched proteomics firms] will go up," he said. "Because they are going to run into proteoform biology. It is fundamental, foundational. Whether you are Nautilus, Seer, Quantum-Si or any of the others, until we define the molecular landscape of the proteome, we will continue to be on a slippery slope climbing uphill."
While some of these newer firms project that their platforms will be capable of de novo protein sequencing, all currently rely to some extent on protein or peptide fingerprinting — matching signals generated from an incomplete set of a molecule's amino acids to a library of expected signals, much as is done in traditional mass spec-based proteomics. This suggests that a comprehensive proteoform library could offer value by providing a defined search space.
Kelleher and his colleagues aimed with their Science paper to give a taste of the value more thorough proteoform profiling could bring.
Among their observations was that on average a particular proteoform was found in 2.19 cell types, compared to 6.51 cell types for proteins, indicating that proteoforms provide more finely graded distinctions between cells. In a similar vein, the mean number of nonredundant proteoforms they identified per cell was 1,346, almost 20-fold the 76 nonredundant proteins they identified per cell.
They also provided a demonstration of the potential clinical utility of a proteoform-based approach, identifying a set of blood-based proteoforms that could help identify liver transplant patients showing early signs of rejection.
The ability of these proteoforms to identify these patients persisted "from the pilot cohort, then discovery, [to] the verification," Kelleher said, adding that they are now testing the markers in a set of 40 liver transplant patients from multiple sites being sampled longitudinally across five time points.
"Proteoforms should be more faithful reporting on the underlying biology and serve as better correlates of the clinical measures you are trying to put in place," he said. "Because they are more specific. They are carrying more information than a single peptide."