Skip to main content
Premium Trial:

Request an Annual Quote

Chinese Team Develops Tools for Improved De Novo Peptide Sequencing


NEW YORK (GenomeWeb) – A team led by researchers at the Beijing Proteome Research Center has developed new reagents and software for improved de novo peptide sequencing.

In a study published this month in Molecular & Cellular Proteomics, the scientists presented a new enzyme for protein digestion and a software tool, both meant to optimize so-called "mirror protease" approaches to mass spec-based peptide sequencing.

Traditional proteomic workflows use mass spectrometers to identify peptides in a sample by comparing experimentally generated mass spectra to previously generated spectra stored in a reference database.

Recently, technological advances have raised the possibility of de novo peptide sequencing, in which researchers instead identify a peptide's amino acid sequence directly from the mass spec data.

De novo sequencing is an interesting approach in that it could improve discovery of peptide sequence variants that aren't in reference databases, many of them possibly disease-linked. It has also drawn significant interest from the biopharma industry, which sees it as a potential tool for characterizing antibody-based drugs.

"If you can do [de novo peptide sequencing], then the world is yours," said Albert Heck, professor of biomolecular mass spectrometry and proteomics at Utrecht University. "Because all [biopharma companies] want to do this."

Were it possible, Heck said, "it would allow us to identify almost any auto-antibody against any pathogen — for instance, a new antibody against HIV or a new antibody against cancer or a new antibody against really any antigen."

Such analyses have proved difficult, though, due to the challenges of obtaining sufficient coverage across the entire length of target peptides.

Researchers have primarily worked to improve this coverage through the use of different combinations of peptide fragmentation techniques. Heck's lab, for instance, has developed an approach called electron-transfer/higher-energy collision dissociation fragmentation (EThcD) to improve peptide sequence coverage, using the method for applications including the identification of human leukocyte antigen splice forms not present in reference databases.

Heck and others have also looked into using so-called "mirror proteases" to improve peptide sequence coverage. Trypsin, which is commonly used to digest proteins into peptides for mass spec analysis, cleaves proteins at the C-terminus of lysine and arginine amino acids. Mirror proteases cleave those same amino acids at their N-terminus. By using both the digested proteomic samples, researchers are able to generate more complete sets of ions, improving de novo sequencing efforts.

One of the most promising proteases for this kind of work is LysargiNase, which a team led by researchers at the University of British Columbia described in a 2015 Nature Methods paper.

That paper along with work by Heck and his colleagues demonstrated "that if you use a combination of fragment spectra generated by LysargiNase and trypsin you get complementary fragment series, with one starting from the [peptide] C-terminus and the other starting from the N-terminus," Heck said. "And if you have spectra of the highest quality from the C-terminus and spectra of the highest quality from the N-terminus… you have the best of both worlds, and then you may be able to get complete amino acid sequences of the whole peptide."

In their MCP paper, the Beijing researchers described a new, acetylated form of LysargiNase called Ac-LysargiNase that is more stable and has higher proteolytic activity than the non-acetylated version of the enzyme. They also presented a software tool, called pNovoM, that allowed them to effectively combine data from trypsin- and Ac-LysargiNase-digested samples and sequence the peptides de novo.

Using the enzyme and software to analyze purified proteins including antibodies, they found they were able to achieve sequence accuracy of almost 100 percent, while in a yeast proteome digest they found that for roughly half of the mass spectra generated they obtained full sets of peptide ions, allowing for de novo sequencing. These results, they suggested, indicate the approach's potential to improve proteomic analyses, particularly in the case of features like post-translational modifications or variants not covered in reference databases.

Heck said that while the ideas underlying the Beijing team's approach are not novel, the Ac-LysargiNase does appear to offer improved performance and the pNovoM software "looks pretty good."

He noted, though, that even with the levels of coverage demonstrated in the paper, the method is likely still not sufficient for the sort of antibody characterization work biopharma firms are exploring.

"What you see is that their best results are around 95 percent to 97 percent sequence coverage of the antibodies, which is pretty amazing," he said. "But if you really want to analyze an unknown antibody, you really need to get as close to 100 percent as possible, which is pretty hard."

"The hard thing about [sequencing] an antibody is that for 90 percent [of its sequence] it is identical to all other antibodies, but there are one or two small regions that are hypervariable," Heck said. "This hypervariability means that there are thousands or maybe millions of different mutations that all lead to a slightly different antibody. And to distinguish between all of these millions of different potential antibodies that are in your body is much harder than to just identify a protein."

Heck suggested that for antibody sequencing applications, additional proteases may be needed to get even better sequence coverage.

"You also need very good software to stitch all these different fragments back together into the full sequence of the protein," he said, adding that his lab was currently "putting a lot of effort into that."

Heck said that in additional to peptide-level data, researchers are exploring the use of intact protein analyses and middle-down proteomic approaches to characterize antibodies.

"Then you could combine [peptide-level data] with top-down and middle-down data and ideally software that could integrate all of this data," he said. "There is fierce competition to do this. It's sort of the next thing in proteomics. It's now easy to identify 10,000 proteins in a cell, but to identify and distinguish between the 10,000 antibodies that are swimming around in our plasma, that is really very hard."