Skip to main content
Premium Trial:

Request an Annual Quote

UCSD Team Develops De Novo Sequencing, Dereplication Method to Decipher Nonribosomal Peptides


By Tony Fong

Nonribosomal peptides are the basis for some of the most successful drugs in history, but developing NRP-based therapeutics has been an expensive, timely, and often fruitless effort because it has been nearly impossible to sequence them.

A study published on Monday in Nature Methods, however, presents a new approach to sequencing NRPs that combines mass spectrometry with computational methods. According to its developers, the method could provide the pharmaceutical industry with a tool for decoding such molecules and ultimately result in new breakthrough drugs.

The approach, the study's senior author added, could be used to sequence any cyclic or nonlinear peptides.

NRPs are a class of peptide secondary metabolites synthesized by nonribosomal peptide synthetases. Produced mostly by microorganisms such as bacteria and fungi, NRPs are of particular interest to the pharmaceutical industry because of the role they play in the defense systems of microorganisms.

"Essentially, [NRP's are] chemical bullets … optimized over millions of years [through evolution] to kill bacteria … or anything for that matter," Pavel Pevzner, the senior author of the study and a professor of computer science at the University of California, San Diego, told ProteoMonitor this week.

NRP-derived or "NRP-inspired" drugs include penicillin and other antibiotics, cyclosporine, and many anti-cancer and antimicrobial agents. But nonribosomal peptide synthetases, unlike ribosomes, are independent of messenger RNA, and so are not inscribed in genomes. As a result, NRP sequences cannot be inferred by traditional DNA sequencing.

For ribosomal peptides, either Edman sequencing or tandem-mass spectrometry is used when DNA sequencing is not available. But for nonribosomal peptides, both approaches are essentially useless: NRPs are nonlinear in structure, making them unamenable to either method; they contain non-standard, modified amino acids, increasing the number of building blocks to up to several hundreds; and they often have non-standard backbones.

Nuclear magnetic resonance spectroscopy is the traditional method for NRP characterization, but currently no software exists for the automatic interpretation of NRPs from NMR data, making such work time-consuming and prone to errors, according to Pevzner and his co-researchers.

"Therefore, there is a need for the efficient structure elucidation of NRPs," they said in the study.

Breaking the Cycle

Their method is a two-part approach that starts with mass spectrometry to break the cyclic NRPs into smaller linear pieces, and then uses algorithms developed by the researchers to make sense of the peptide pieces.

Normally, using mass spectrometry on cyclic structures is extremely difficult, Pevzner said, because "they are designed by nature to resist protein digestion. These rings are practically impossible to break" by usual tryptic digestion.

The researchers solved this problem by first using tandem-mass spectrometry to break the cyclic peptide into linear peptides with the same parent mass, a method developed by Thaiya Krishnamurthy and others in 1989. Pevzner and his colleagues then further broke the linearized peptides in the next mass spec stage, MS3 stage.

But that created another problem — what to do with the MS3 spectrum of different but related peptides.

"The major challenge we faced is how to interpret this data, because nobody tried before to interpret mass spectrometry data of NRPs or any cyclic peptides," and no extensive database of such compounds exist, Pevzner said.

To tackle this, the researchers developed a new algorithm called NRP-sequencing to allow the peptide fragments to be pieced back together so that the chemical structure of the cyclic NRP can be determined without having to know the amino acid masses in the compound.

NRP-sequencing uses the concepts of autoconvolution and autoalignment, in this case in the MS3 phase, to first derive a set of possible amino acid masses for the NRP, and then to construct a consensus spectrum for each mass.

[ pagebreak ]

NRP-sequencing generates all possible reconstructions for each consensus spectrum "and reranks all generated cyclic peptides according to the matches to the MSn spectra (for n=3, 4, and 5)," the researchers wrote.

However, the ability of NRP-sequencing to recover amino acid masses is limited by the ability of autoconvolution to do so, they said, and because some positions are less prone to breakage than others, "recovering all amino acid masses in an NRP using autoconvolution may be an unattainable goal."

For those cases, they developed NRP-tagging, an approach that uses frequently occurring amino acid tags for peptide reconstruction.

The spectra of cyclic peptides are "superpositions of related (cyclically shifted) linear peptides that tend to have the same tags repeated in the spectrum," the authors said. "As gapped peptides often contain masses representing combined masses of adjacent amino acids … NRP-tagging attempts to partition each mass in the gapped peptide into smaller masses. Similar to algorithms for sequencing linear peptides, NRP-tagging typically brings the correct peptide close to the top of the list of the high-scoring peptides."

Another important element to their method is an algorithm called NRP-dereplication, which allowed the researchers to perform comparative dereplication to screen for active compounds in a mixture and discard those that had been previously studied.

Dereplication is a longstanding biochemistry technique, but unlike the classical method, which compares only identical compounds, the comparative dereplication approach used by Pevzner and his colleagues considers NRPs that are similar but not identical, shortening the screening process considerably.

"Because many NRPs are produced as related analogs … comparative dereplication can reduce NRP characterization efforts from weeks to minutes," the researchers said in the study.

As an example of an application of comparative dereplication, they cited the case of a compound dubbed "compound 879," which was thought to be novel when it was initially isolated, but, in fact, had been already described in an earlier study. This was discovered only during the patent application process.

But using their NRP-dereplication method, Pevzner and his team were able to quickly discover that compound 879 is the antibiotic neoviridogrisen.

'Future of Modern Drug Discovery'

For drug companies, the new method could significantly hasten the process of developing NRP-based therapeutics and lower the cost, according to the researchers. In a statement, Pieter Dorrestein, a co-author on the study, said that though natural products have a long history in therapeutic development, many were discovered before the digital recording of mass spectrometry data.

"Therefore we do not have an extensive mass spectrometry database for natural products," he said.

Dorrestein, an assistant professor of pharmacology, chemistry, and biochemistry in the UCSD Skaggs School of Pharmacy and Pharmaceutical Sciences, noted that the new methods "enable dereplication without an experimental database to compare to. … As long as the structure of a therapeutic or a related therapeutic or natural product is in the library, we can accurately dereplicate the molecule.

"This is the first generation of algorithms that can accomplish this, and is a glimpse into the future of modern drug discovery," he added.

Collaborators on the study included researchers from the Scripps Institution of Oceanography at UCSD and the University of California, Santa Cruz.

In continuing work, they are doing de novo sequencing for "interesting cyclic compounds," mainly from cyanobacteria, Pevzner said, with a goal of creating a pipeline in which NRPs can be automatically and quickly sequenced. "This really could be a modern way to search for new antibiotics and other useful things."

Mass spec data resulting from their work will be deposited into Norine, a public database of NRPs, he said.

Pevzner said that a drug company has contacted the researchers to perform de novo sequencing of NRPs it is interested in. He declined to name the firm.

He added that the method he and his colleagues developed has applications beyond drug discovery. Many proteins in plants and primates are cyclic, including theta-defensin proteins, which play an important antimicrobial role in non-human primates. In humans, theta-defensin peptides are not known to express.

In addition, many ribosomal peptides are cyclic, Pevzner said, and "everything that is cyclic currently is completely off the radar of proteomics. They are absolutely not identifiable by any existing [method].

"And hopefully now we will be able to identify cyclic peptides or any non-linear peptides," he said.

The algorithms developed by the researchers are free to other scientists and are available here.

The Scan

Fertility Fraud Found

Consumer genetic testing has uncovered cases of fertility fraud that are leading to lawsuits, according to USA Today.

Ties Between Vigorous Exercise, ALS in Genetically At-Risk People

Regular strenuous exercise could contribute to motor neuron disease development among those already at genetic risk, Sky News reports.

Test Warning

The Guardian writes that the US regulators have warned against using a rapid COVID-19 test that is a key part of mass testing in the UK.

Science Papers Examine Feedback Mechanism Affecting Xist, Continuous Health Monitoring for Precision Medicine

In Science this week: analysis of cis confinement of the X-inactive specific transcript, and more.