For de novo protein sequencing, Edman sequencing is still seen by some as the gold standard, but researchers from the University of California, San Diego, and Genentech have developed a mass spectrometry-based approach they said generates data that is nearly as accurate Edman sequencing, but that takes a fraction of the time to generate.
In addition, their technology, described in a study in the December issue of Nature Biotechnology, offers significantly higher coverage than Edman sequencing and could open new ways for making higher-quality antibodies, one of the study’s authors told ProteoMonitor.
“We believe that [our technology] can do most of what Edman sequencing does and more,” said Nuno Bandeira, executive director at the Center for Computational Mass Spectrometry at UCSD, and first author of the Nature Biotechnology study. “In principle, for most applications, I think it’s very capable of replacing Edman sequencing as the main technology” for protein sequencing.
Other mass spec-based techniques have been used to perform de novo protein sequencing, but have generated less reliable data than Edman sequencing, making their utility questionable, according to some.
In their study, Bandeira and his co-researchers said that though antibodies have been “indispensable reagents for biomedical research and as diagnostic and therapeutic agents,” sequencing unknown proteins in general and antibodies in particular “remains a challenge.”
Antibodies are not directly inscribed in the genome and are constantly being created, making approaches based on tandem mass-spec database searches inapplicable. Edman sequencing, meanwhile, is low throughput, and a hybrid MS/MS-Edman approach, while applicable to de novo sequencing of antibodies, has low accuracy.
“Bridging this MS/MS sequencing gap not only significantly reduces the total sequencing time but also considerably reduces the sequencing costs and required expertise,” the researchers wrote in their study.
Among the advantages of their technology, which Bandeira and his colleagues call comparative shotgun protein sequencing, or CSPS, is the ability to sequence a single antibody in fewer than 72 hours, compared to months with Edman sequencing. In addition, the technique allows for greater detection of post-translational modifications and eliminates the need to purify proteins, Bandeira said.
Edman sequencing, Bandeira added, has superior accuracy at the beginning of the protein sequence, though that “will quickly degrade.” Depending on the protocols, after about 40 amino acids the accuracy with Edman sequencing drops “dramatically … whereas we have very high sequencing accuracy throughout the whole protein,” he said.
Currently, CSPS has about one error for every 20 predicted amino acids, though the error rate is expected to decrease with newer mass specs with higher mass accuracy.
His technique, he said, also covers about 95 percent of the protein while Edman sequencing goes as far as 60 to 80 amino acids of the protein from the N-terminus. After that, digestion protocols would need to be added for further sequencing, but that step makes purification difficult, he said.
The one area where CSPS does not compare to Edman sequencing is in its ability to distinguish leucine from isoleucine, amino acids that have identical atomic compositions but different functions. While some mass spec-based techniques for differentiating leucine from isoleucine have been demonstrated, it is unclear how effective they are, Bandeira said.
“Thus distinguishing leucine from isoleucine and sequencing regions not covered by MS/MS spectra represent the directions where CSPS and Edman degradation may complement each other,” he and his colleagues wrote.
Bandeira and his colleagues’ method comes amid waning interest in Edman sequencing. During the summer, Applied Biosystems, now called Life Technologies, discontinued its Edman sequencing business, leaving the US market without a sequencing manufacturer.
ABI said at the time that the decision to shut down its Procise instrument operations was based on a lack of demand for the instruments and newer, alternative technologies for protein sequencing that were making Edman sequencers obsolete. Nonetheless, the move drew fire from a small community of scientists that remains devoted to Edman sequencing [See PM 06/12/08].
“We believe that [our technology] can do most of what Edman sequencing does and more. In principle, for most applications I think it’s very capable of replacing Edman sequencing as the main technology.”
Later in the summer, Thermo Fisher Scientific signaled that it may be interested in entering the Edman sequencing market, though this week a spokeswoman said the company has not yet made a decision on the matter [See PM 08/07/08].
According to Bandeira, CSPS works similarly to what has been done with genomic sequencing in which multiple alignments of reads are done and then the genome sequence is recovered.
The key concept of CSPS is that “we do not interpret one spectra in isolation,” Bandeira said. “We first find a group of spectra that are related in the sense that the peptides overlap. After doing that where we find these pairs, we build a network of all related spectra.
“If you think that Spectrum A has a number of other spectra that it overlaps with, and then B also overlaps with a number of other spectra, so on and so forth, there’s this network that’s constructed with lots of overlapping spectra,” he added. “Those are then all interpreted at the same time, so there’s this composition. You put all of them together and dramatically reduce the noise and then find the consensus interpretation.”
While other mass spec-based techniques have been used as a strategy for de novo protein sequencing, they have not been well received for a variety of reasons such as the inability to distinguish between isobaric amino acids. Also mass spec-based approaches result in ambiguous sequences, which researchers must then interpret, increasing the chance for errors.
Protein sequences can also be determined indirectly from mRNA and DNA, but such approaches can also result in wrong inferences.
Problems have arisen from mass spec-based de novo sequencing, Bandeira said, because “there’s just not enough signal in relation to noise, so it becomes hard to get large enough sequences to be interesting.” In contrast, CSPS “dramatically increases the signal-to-noise ratio and the length of the sequences that you can reconstruct.”
To develop their methodology the researchers used Thermo Fisher Scientific mass specs — including the LTQ ion traps, Orbitraps, and models with ETD capability — but said that any instrument can be used with CSPS.
“It’s a matter of the fragmentation,” said Bandeira. “As long as you get good fragmentation in the spectra, it doesn’t really matter how you generate the spectra.”
In ongoing work, he and his colleagues are exploring ways to bring the sequence coverage closer to 100 percent and to devise ways to put “an error number or confidence number on every single amino acid on the protein. Then you could know that there is high accuracy throughout on average, but you would know which regions are more trustworthy or less trustworthy,” he said.
In an e-mail to ProteoMonitor, Satya Yadav, director of the Molecular Biotechnology Core Laboratory at the Cleveland Clinic Lerner Research Institute, said that rather than replacing Edman sequencing, as Bandeira suggested, CSPS may be best used as a complementary technology.
“It will certainly expedite the complete sequencing of [monoclonal antibodies] for the biotech industry but will require complementary help from traditional method of Edman degradation,” he said. He added that the technology’s inability to distinguish isobaric amino acids will always limit its usefulness.
“Comparative shotgun protein sequencing will never be able to distinguish isoleucine and leucine residues and even Lys and Gln residues,” Yadav said. “CSPS will require Edman degradation sequencing to resolve such issues. To assemble a complete [monoclonal] sequence contig map, a combination of CSPS and Edman sequencing will be a more realistic way at this point of time.”
While Bandeira said that the isoleucin/leucine issue is a hindrance, he added that it can be worked around, especially if previously developed mass spec techniques prove effective.
The researchers used their technology to sequence mABs, but it can be used for polyclonal antibodies as well, and in principle should be applicable to any mixture of proteins, Bandeira said.
He added that CSPS would also address one of the most pressing issues surrounding antibody-based research, the quality of antibodies. Genentech became involved in the project due to such concerns.
Genentech was “very interested in having a high-throughput way of finding whether the antibody has mutated and how it has mutated and how those changes may affect its function,” Bandeira said.
CSPS opens up avenues for finding and characterizing new antibodies “so if one has a new high-throughput way in which to sequence new antibodies, maybe there are different ways in which one can look for new antibodies that would be better than the existing ones.”
Funding for the project was provided by the National Institute of General Medical Sciences. Genentech did not supply funding for the project and has no agreement for the rights to commercialize the technology, whose IP is held by UCSD, Bandeira said.