Senior Research Scientist, Pacific Northwest National Laboratory
Recommended by Pavel Pevzner, University of California, San Diego
NEW YORK (GenomeWeb) – Sangtae Kim didn't know he was going to work in biology — he didn't take any biology classes as an undergraduate. Instead, he majored in computer science. His master's degree lab, though, was working on a project developing and analyzing algorithms for protein fragment assembly. A problem, Kim said, that is similar to some traditional computer science problems. That was his first taste.
For his military service in Korea, Kim taught computer science courses at a military university, and while there one of his friends suggested that he try his hand at analyzing mass spectrometry data to finding post-translational modifications.
"It was more fun than just doing fragment assembly because it's more dynamic," Kim told GenomeWeb.
When he decided to pursue a PhD study, Kim hunted for a lab doing computational proteomics.
While in Pavel Pevzner's lab at the University of California, San Diego, Kim developed a database to identify peptides from mass spectrometry data. One of the labs that snatched the tool up and incorporated it into their analysis pipeline was Dick Smith's lab at Pacific Northwest National Laboratory.
That's where Kim is these days. He's focusing on how to analyze top-down proteomic data — data from proteins that, unlike those analyzed in bottom-up approaches, haven't been digested — and developing new algorithms. He's also working on techniques to analyze proteomic data obtained through data-independent acquisition in collaboration with researchers at the University of Washington.
The main issue with working on top-down proteomics, Kim said, is that the field hasn't yet come to an agreement on the best way to generate data, so it's always changing. On top of that, he said the spectra themselves are much more complex, as compared to bottom-up proteomics.
The situation is similar, he noted, for data independent acquisition.
Paper of note
This past fall, Kim and Pevzner had a paper in Nature Communications describing a database search tool they dubbed MS-GF+ that they said is both sensitive and universal.
In it, they said that MS-GF+ could handle a variety of spectra type as well as various configurations of MS instruments and experimental protocols. Additionally, they reported that this tool uncovered the identities of more peptides than other peptide identification approaches. They also noted that it has already been included into pipelines like Galaxy-P, Skyline, Percolator, and more.
Top-down proteomics will become bigger and bigger in the coming years, Kim said. "If it works well, then it is absolutely better than bottom-up proteomics because there is no loss of information," he said. "The minute people digest the proteins, very valuable information just goes away."
Additionally, more people are beginning to focus on quantification and targeted proteomics, and people are starting to integrate data from the various omics techniques, he added. More and more groups are using genomics and transcriptomics sequencing to bolster their study of the proteome. "Genomics data will be helpful to better analyze proteomics data and vice versa, so that field is growing," Kim said.
And the Nobel goes to…
Kim said that if he and his colleagues could figure out how to overcome the challenges of top-down proteomics to perform it to the same degree of sensitivity as bottom-up proteomics, that that would be a great contribution to the field.
This is the tenth in a series of Young Investigator Profiles for 2015 that will appear on GenomeWeb over the next few months.