NEW YORK (GenomeWeb) – University of Texas researchers have developed an Edman degradation-based approach to single-molecule protein measurements.
Detailed in a paper published this week in Nature Biotechnology, the method could allow for analysis of complex protein mixtures with the high-sensitivity and digital quantitation capabilities technologies like next-generation sequencing have brought to nucleic acid experiments, said Edward Marcotte, professor of molecular biosciences at UT and senior author on the study.
Marcotte and his colleagues have founded a company, Erisyon, to commercialize the technology.
Their work is part of a small but growing effort by a number of labs to develop single-molecule protein analysis techniques. According Chirlmin Joo, director of the Kavli Institute of Nanoscience at Delft University of Technology, more than a dozen research groups globally are currently focused on this pursuit.
Joo, who was not involved in the UT work, said that perhaps half of those groups are working on nanopore sequencing-based approaches, but the rest are exploring a variety of approaches.
Marcotte's group falls into this latter half. Their method, which Marcotte described as blending elements of NGS and mass spectrometry, uses fluorescent labeling of specific amino acid residues on target peptides followed by Edman degradation of those peptides. By immobilizing the peptides on glass slides and using microscopy to measure decreases in fluorescence as the labeled amino acid residues are removed from these peptides via Edman degradation, they are able to obtain partial sequences of these molecules.
They can then match these partial sequences to a reference database to make peptide and protein identifications.
The first portion of the workflow is analogous to some NGS workflows, Marcotte noted, in that it is a sequencing-based approach that monitors changes in fluorescence to analyze millions of molecules in parallel.
The database-matching portion of the workflow, on the other hand, is akin to bottom-up mass spec methods wherein researchers compare experimentally generated peptide data to a reference database to make peptide identifications.
"We can interpret the data very similarly [to mass spec data] by taking protein sequences from a reference database, generating [in silico] fluoro-sequences and comparing the [experimentally generated] fluoro-sequences," Marcotte said, adding that the technique should be able to use many of the same approaches for assessing false discovery rates currently used for mass spec analysis.
After making peptide or protein identifications, however, the approach once again becomes analogous to an NGS workflow, he said, where "quantification is digital."
"We are directly counting the molecules," he said. "So, it looks just like RNA-seq data, and we use counting-based statistics to figure out the abundances."
Such a single-molecule counting approach could enable much higher-sensitivity assays than mass spec-based approaches, Marcotte said. In the Nature Biotech paper, the researchers identified peptides in simple mixtures at zeptomolar levels.
The researchers are currently able to measure around one million molecules in parallel, but Marcotte said that scaling up to a billion molecules would be fairly simple.
"We already know how to do that based on the next-gen sequencing field," he said. "It's mainly just more real estate on the slide. The lessons are pretty straightforward."
More challenging are issues around fluorescent labeling of target amino acids. While NGS typically uses a synthesis-based approach that tracks the addition of nucleotides to nucleic acids, the UT method is degradation-based, tracking the removal of amino acids from the peptides being analyzed.
This means that the fluorescent dyes used for labeling must withstand the relatively harsh chemical conditions used in Edman degradation, and identifying dyes that continue to work under these conditions has proved a challenge, Marcotte said.
"We screened around 25 dyes to get two that are able to survive the chemistry," he said. "We only have two dyes at the moment, but presumably we will identify others."
He said that with labeling of two amino acids (cysteine and lysine in the Nature Biotech study), the technique can accurately identify proteins in mixtures containing on the order of 1,000 different proteins. Labeling of four amino acids would allow it to accurately identify proteins in mixtures with the complexity of the human proteome, he said.
The dyes are linked to reactive handles that selectively label the desired amino acids, and thus far Marcotte and his colleagues have managed to specifically tag five of the 20 human amino acids with those handles.
"I don't see any reason why we couldn't get up to half of the [20 amino acids]," he said, adding that the number of amino acids labeled could depend on the goal of a given experiment.
"It gets into the information content and how rich you want your reads to be," he said, comparing it to tuning read lengths in DNA sequencing. "There may be cases where short reads are perfectly adequate if you want to do quantitation, and sometimes you may want longer reads because you want to do something that involves knowing the structure of the gene. In our case we can choose the number of labels according to the application."
Marcotte said the technique's major sources of error are more similar to those encountered in NGS than mass spec.
"Unlike mass spec where you make errors based on mass — like if [a peptide] is isobaric you can make attribution errors — here we make sequencing errors," he said. "We have insertions and deletions and substitutions. If the chemistry for cutting off one amino acid fails on one round and we pick it up the next round, that causes an apparent insertion in the sequence. If we destroy a dye, that looks like a substitution."
"Right now, just like the early days of DNA sequencing, [the technology] still has a lot of errors," he said. "Improving our dyes and extending our read lengths and improving the efficiency of all the processes and adding more colors are the obvious areas for improvement."
Like other proteomic approaches, the method will require upfront sample enrichment or depletion or fractionation to deal with the proteome's high dynamic range, Marcotte said. He added that he and his colleagues are also working on improved sampled prep methods for the extremely low-abundance samples the system is able to work with.
Talli Somekh, CEO of Austin, Texas-based Erisyon, which launched earlier this year to commercialize the technology, said that the company didn't believe the approach would displace mass spec and affinity-based proteomic methods but that it could open up new areas of research that "are kind of underexplored because the techniques for evaluating them are quite poor."
He noted as one example the analysis of post-translational modifications, where he said Erisyon believes its ability to label specific PTMs will allow researchers to "evaluate multiple post-translational modifications on single proteins and at single-molecule resolution."
He also cited immune-oncology and protein-protein interaction research as areas where he believed the technology could find early users.
Somekh declined to discuss how the company has been funded so far or provide a timeline for when it hoped to bring an initial product to market, but he said that he believed that "the development pathway should be quite fast."
He noted that the Marcotte lab has three working versions of the device in use.
"It's not as though we are taking a proof of concept and trying to develop it to a point where we are ready to productize it," he said. "We are ready to productize it."
He added that the company is currently involved in collaborations with several academic labs and industrial partners on developing a commercial version of the system.
Delft University's Joo said that the UT method looks promising and noted that its ability to analyze protein PTMs was particularly interesting. He added, though, that it is still relatively early days for single-molecule protein analysis and that it is difficult to say which technology or technologies will ultimately gain wide adoption.
Joo has also developed a fluorescence-based protein analysis approach. His approach uses the ClpXP protease, which denatures and degrades target proteins. By monitoring this process using FRET, Joo and his colleagues are able to detect the removal of labeled amino acids much like Marcotte's group, though Joo's approach is focused on full-length proteins rather than peptides. He launched a company, Bluemics, in August, to commercialize the method.
As Joo noted, a significant proportion of single-molecule protein research centers around nanopores, which have been used successfully for nucleic acid sequencing.
Hagan Bayley, professor of chemical biology at the University of Oxford and co-founder of nanopore-based sequencing firm Oxford Nanopore Technologies, has demonstrated the ability of nanopore sensors to distinguish between differentially phosphorylated protein forms.
Mark Akeson, a nanopore researcher and professor of biomolecular engineering at the University of California, Santa Cruz, has devised a method for driving unfolded proteins through a model α-hemolysin nanopore and using it to distinguish between different sequence-dependent features on the proteins.
Last year, researchers at the University of Groningen demonstrated the ability of Fragaceatoxin C (FraC) nanopores to identify peptide and protein biomarkers in simple mixtures and to distinguish between polypeptides differing by as little as a single amino acid.
One of the major advantages of nanopore-based approaches is the fact that, unlike fluorescence-based methods, they don't require any modification of the protein substrate, Joo said.
"So, in that sense nanopores are attractive," he said. "But, nanopores also have their limitations, and so people are looking at many different possibilities."
Another apparent entrant in the space is Encodia, a San Diego-based start-up that according to its website is "developing new technologies to create scalable and parallelized approaches to protein analysis."
The firm includes several veterans of the NGS world, including its CEO Mark Chee, a co-founder of Illumina; Kevin Gunderson, formerly the senior director, advanced research at Illumina as well as that firm's founding scientist; and Michael Weiner, one of the inventors of 454 sequencing.
Chee declined to speak about the company's plans, noting that it was still an early-stage venture, but a grant Encodia received this year for $222,959 from the National Institute of General Medical Sciences (NIGMS) provides some insight into its approach to tackling the space.
According to the grant, the method uses DNA-tagged peptides on beads combined with an "Edmanase" enzyme capable of gently degrading peptides one amino acid at a time. Upon removal, the DNA tags can be sequenced using conventional NGS, allowing for read-out of the tagged peptide sequences.
In addition to the NIGMS grant, Encodia received a grant this year from the National Cancer Institute for $718,340 and a 2016 NCI grant for $211,400.