NEW YORK – Single-molecule protein measurement technologies have garnered great hype — and serious investment — in recent years, but a recent analysis suggests proteome-scale analyses remain far off.
In an arXiv preprint published last month, researchers examined the feasibility of using single-molecule methods for proteomic studies, finding that both in terms of cost and coverage, such techniques still fall short of existing mass spectrometry approaches.
Given that several of the study authors are prominent mass spec-based proteomics researchers, this conclusion is perhaps unsurprising. Nonetheless, it matches trends in the single-molecule protein analysis space, where, despite some talk of proteome-scale capabilities, most companies and researchers are pursuing more targeted applications.
Michael MacCoss, first author on the preprint and professor of genome sciences at the University of Washington, said that while a number of publications have detailed the ability of different single-molecule approaches to determine peptide and protein sequences, few if any have looked at how many of those sequences will have to be analyzed to characterize proteomes with reasonable depth of coverage.
MacCoss noted that in mass spec-based proteomics, each spectrum represents millions of ions, which, he said, meant mass spec approaches "had to be measuring as many or more molecules than a lot of these [single-molecule] methods were proposing to do."
That realization led MacCoss and his coauthors to look more closely at the number of molecules a platform would need to measure were it to analyze the human proteome at the level of individual proteins.
A typical human cell contains about 300,000 mRNA molecules and about 10 billion protein molecules, they noted, suggesting that to achieve similar depths of coverage, a single-molecule proteomic experiment would need to read around 30,000-fold more reads than a comparable RNA-seq experiment. Given that some guidelines recommend between 5 million and 200 million reads per sample for RNA-seq experiments, that would put requirements for single-molecule proteome sequencing into the hundreds of billions or even trillions of reads per experiment.
That, they note, exceeds the current capabilities of existing next-generation sequencing platforms, and is well beyond the throughput of single-molecule protein analysis technologies. Quantum-Si President and COO Patrick Schneider, for instance, said that his company expects its platform will, at launch, be able to make 200,000 peptide reads per experiment. Single-molecule firm Nautilus Biotechnology has shown it can affix up to 10 billion individual proteins to its chips (though it has not yet shown it can identify these proteins), but even this figure is well below the number of reads that MacCoss and his coauthors project as required for whole proteome analysis.
The typical mass spec experiment also falls well short, with a typical 90-minute run measuring around 1 billion peptides, the researchers noted. MacCoss said, however, that mass spec experiments achieve an effective dynamic range that is much larger than would be assumed based on the number of peptides measured. This is because mass spec proteomic experiments use upfront separation, typically by liquid chromatography, to simplify the sample before analysis. This upfront separation means that different peptides are entering the mass spec at different times, ensuring that signal from the highest-abundance peptides doesn't dominate the analysis at the expense of lower-abundance peptides.
This upfront separation also means mass spectrometry is able to decouple identification and quantification of molecules, with, as MacCoss noted, a mass spectrum matched to a particular peptide representing hundreds of thousands or millions of individual molecules.
Javier Alfaro, data science group leader at the University of Gdansk and coauthor on the preprint, said that he believes that upfront separation and sorting of molecules and the consequent decoupling of identification and quantification could be a way forward for single-molecule proteomic technologies.
Alfaro added that he is perhaps more optimistic about single-molecule approaches than some of his coauthors, but said that he would like to see more work in the field around the use of upfront sample separation.
"For me, the reason why I came onto this paper is because I don't see that as an aspect of single-molecule [research] that is being pursued," he said. "I think it is maybe because the scale of the problem is misunderstood or hasn't been stated explicitly in the literature. This paper really talks about what is the scale of the dynamic range problem and the sheer size of the human proteome just to try to encourage innovation in the field."
Parag Mallick, Nautilus's cofounder and chief scientist, said in an email that the paper "makes some great points about the challenges of proteomics analysis, particularly with regard to dynamic range."
He suggested, though, that the authors did not address one of the main advantages of single-molecule approaches — the fact that the identifiability of molecules is not linked to their abundance.
"In [mass spec] low-abundance species are less likely to have peptides ionize, and then when ionized, they generate less ions, which can make for poorer MS/MS spectra that are harder to identify," he said. "In single-molecule methods, abundance and identifiability are decoupled as each molecule is studied independently."
Single-molecule methods also offer the possibility of more finely grained analyses, allowing researchers to more accurately measure specific protein forms, including rare variants, that might go undetected in a typical mass spec experiment.
Of course, this requires sufficient dynamic range to ensure the instrument gets a chance to study these low-abundance single molecules.
Henry Brinkerhoff, a postdoctoral researcher at the University of Washington whose work focuses on nanopore-based protein sequencing, acknowledged that current single-molecule technologies lack the scale to tackle proteome-wide experiments.
While the feasibility of large-scale nanopore-based protein sequencing has yet to be demonstrated, assuming it does prove possible, a chip with 1,000 nanopores on it could plausibly sequence perhaps a million proteins per hour "if you're really lucky and everything is going nicely," Brinkerhoff said. "If you stack 100 of those chips, you're still like a factor of 100 away from what you really need."
He is optimistic, though, that the density of such arrays can be increased significantly.
"There's no reason why the density of nanopore experiments can't be increased by factors of 10, 100, 1,000," he said. "The fundamental limitation is how many individually addressable electrodes can you cram onto a chip, and that is a question that is well answered just by looking at a [charge-coupled device] for a camera. You can get a chip that has gigapixels."
Amit Meller, professor of biomedical engineering at Technion – Israel Institute of Technology, likewise said he believes that massively scaling nanopore-based measurements is possible, at least in theory.
"I think that we have all the components that we need to build a chip like this," he said. "In terms of basic science and what we understand about nanopores, I think we do have what we need."
However, he added, "there is a gap between what people have done in labs [and] building a system that is working."
For nanopores, the issue is largely one of "taking the fabrication process of nanopores from the laboratory to the mainstream fabrication facilities, [and] adapting these processes to the tools that are used in advanced nanofabrication," he said. "If that is done, you could have a huge breakthrough."
Meller is also exploring protein separation approaches including chip-based chromatography and electrophoresis that could be coupled to single-molecule techniques. In 2020, he and his colleagues published a paper in Nature Scientific Reports in which they presented a device for on-chip gel electrophoresis and used it to separate proteins isolated from a human cancer cell line.
"You can now produce nanochips that would separate millions of proteins by mass-to-charge ratio and do it very effectively," he said. "I think this would be very useful because it basically takes the whole proteome, separates it by mass or mass-to-charge-ratio, and then you can use single-molecule identification downstream."
"This is definitely something that I am very interested in," he said.