By Aaron J. Sender
Michael Giddings likes to think of himself as an explorer. When he was five years old, his father, renowned chemist J. Calvin Giddings, bought him a kayak for Christmas. He made his first river run at age six, and 29 years later he still relishes expeditions through remote waterways. These days, however, the University of North Carolina researcher focuses his attention on exploring another of nature’s mysteries: the genome.
Giddings’ interest is in the rule breakers — events in the genome that defy the typical rules of alternative splicing and produce elusive, yet often functionally important, proteins that fall under the radar of detection.
Known collectively as recoding, these phenomena occur when ribosomes read the bases of the genome in unconventional ways. For example, the ribosome is reading along, translating, and then suddenly bypasses a section of the RNA. “It will partially release from the RNA, stop coding, scan downstream, and then resume coding some more downstream,” Giddings says. Other examples include frameshifting and even redefining a stop codon to specify an amino acid.
“One of the challenges is that proteins produced by recoding are often minor products,” Giddings says. “But they are often minor products of large significance.”
As a postdoc in recoding guru Ray Gesteland’s lab at the University of Utah, Giddings, with his colleagues, began developing a computational approach they hoped would get at the hidden proteins. Gesteland’s group has also constructed a database called RECODE of gene sequences that use recoding for their expression.
“We were looking for ways to do proteomics to detect these kinds of alternatively coded proteins,” says Giddings. “And we looked at existing proteomics technologies and decided one of the biggest limitations is that the approaches all assume you have some fixed database of genes or proteins that you are going to match against.”
Giddings’ solution: go to the source. “I said we really need a way to go directly to the genome that can account for any kind of product that comes out of this genome,” he recalls. “So we’ll directly match protein data with the potential of this given genome to code for this protein.”
The result: a new application called Genome Fingerprint Scanning.
The first step is to take the entire genome sequence and chop it up into all the possible peptides that a protylitic enzyme can theoretically produce. The key is that it doesn’t take into account open reading frames.
Then, peptide masses of protein samples run through a mass spectrometer are scanned against this database. “This is a fundamentally different approach. It takes observed proteins and says, ‘Where are they on the genome? What part of the genome encoded this?’” Giddings says.
But even a small genome such as yeast yields as many as 8.9 million hypothetical peptides. The researchers wanted to make sure the process could be done by anyone, even with modest computer resources. “So we implemented this genome digestion to see if it worked on just an average desktop machine,” Giddings says. “And the answer for yeast or other small organisms is definitely yes.” At first, generating the database took more than 20 minutes. “Now on a dual-CPU G4 running OS X, it takes approximately two minutes,” says Giddings.
Giddings has made significant progress with the application. He is collaborating with Washington University’s Michael Brent to use GFS for gene finding. “It’s a completely new approach to gene finding which has the potential to really revolutionize and strengthen the results of the computer predictions,” he says. GFS has also found proteins that other widely used protein identification software, such as Matrix Science’s Mascot, misses. And postdoc Michael Wisz is working on applying the approach to human data.
But how about the new recoding proteins Giddings set out to find with GFS? “Unfortunately, we haven’t found any specific examples of that in our work. That’s the one downside to this story,” he says. “But we’re still hopeful.”