Skip to main content
Premium Trial:

Request an Annual Quote

Proteomics, Genomics Done Simultaneously Leads to More Complete Genome of Cyanothece


Jon Jacobs
Senior Research Scientist
Pacific Northwest National Laboratory
Name: Jon Jacobs
Position: Senior research scientist, Pacific Northwest National Laboratory, 2004 to present; adjunct faculty, Washington State University Tri-Cities, 2003 to present
Background: PhD, biochemistry, 2002 Montana State University; Posdoc, Pacific Northwest National Laboratory, 2002 to 2004
Certain strains of cyanobacteria are unique for being plant-like in their ability to use photosynthesis to manufacture sugar and bacteria-like for their ability to “doctor” atmospheric nitrogen for use by other species.
One such cyanobacterium, Cyanothece, also makes ethanol and hydrogen, drawing attention from the US Department of Energy and other organizations looking for alternative methods of producing fuel.
To better understand these organisms, a team of scientists from DOE’s Pacific Northwest National Laboratory, Washington University’s Genome Sequencing Center, St. Louis University, and Purdue University set out to sequence Cyanothece 51142
As part of their study design, the researchers incorporated proteomics methods into their sequencing work, resulting in 38 additional open reading frames in the final genome annotation.
In an article published in the Sept. 23 edition of the Proceedings of the National Academy of Sciences Early Edition, the authors said, “The combined analysis of proteome and genome data is an important new approach that resulted in the inclusion or reclassification of nearly 550 genes (10 percent) and lent an additional and valuable level of validation to the genome annotation.” 
Jon Jacobs a protein chemist at PNNL who oversaw the proteomics work, recently spoke with ProteoMonitor about his research into Cyanothece 51142.
Below is an edited version of the conversation.

What was the overall goal of this research and how did proteomics fit into that?
What was published here is really kind of a subset of the total project that was going on. What was represented here … was the genomic work.
But the project in whole included us doing a lot of the proteomic work for this organism of interest, characterizing the organism and all of its interesting pieces. And so because of that, we had the opportunity to, at the same time they were working on the genome, [to do] all the proteomic work simultaneously.
Initially for us, that pretty much entailed doing the tandem MS/MS analysis, building up a database where we were identifying as many peptides and as many proteins under different growth conditions [as possible].
For this organism, it’s on a day-night cycle, so we’re trying to hit all the different time points across this cycle, potentially different growth conditions in the presence of various nutrients and things like this, essentially to create the largest peptide or protein database that we can and to help complement the genomic work that was going on at the same time.
Were you already doing the proteomics work or did you do this specifically as part of the genomic-based project?
For this organism, we started specifically for this project. The collaborator that we were working with, Himadri Pakrasi … we had worked with him previously with similar organisms.
He wanted us to do some proteomic work before and we had done that, but when this program started, it was kind of … ‘Well, we’ve got this new organism, we’re going to sequence it, and we want you to do this complementary proteomics.’
He understood the power of what proteomics can do because we had worked with him before. He said, ‘We want you to start this right now and when the time comes, we can merge the two together…’ and we can have a much more powerful impact if we combine these two technologies in the end.
Did they design the proteomics into the study because they had already done some of the genomic work and had missing parts?
I don’t know that I can say that. Most of the other organisms that we had dealt with … already had a defined genome, and it was just something … early on in the discussions … that we had thought about and we realized that it was powerful and it was something that he could potentially use. … It was just more of an opportunity. This is an opportunity where we are actually doing the sequencing and we’ll have proteomic data during this time frame, and by having that, he said, ‘Let’s do these comparisons, let’s broaden this out and see exactly what the proteomic information can tell us.’
So what did you find using the proteomics methods?
There were two main things …which I thought were important. … During the annotation is where the proteomics comes in, and the two aspects of the annotation were: first … what genes are genes? What’s going to be the final call for the number of genes for this organism?
They were able to provide us at a very early stage all potential possibilities, pretty much taking all of their algorithms and just giving us all the possible genes that could potentially be in this genome.
[Pakrasi said,] ‘Now, use your MS/MS identification data and just blast that against this database that we’re giving you.’ I think it probably contained close to about 20,000 entries all over the place, and they said, ‘You just tell us what you hit, and your confidence in those and how that falls. Then we’ll look at that data, and … if we do our normal annotation and call out normally what we do for genes, we would see how many genes you have identified that potentially could be new that we normally would not have called that a gene.’
So we did that. It was one of the first analyses we did once we completed the proteomic work on our side. And it came up there was this significant number. In the end, I think we came up with 38 actual sequences where we had multiple identifications, or multiple peptides that identified these, that we feel very confident about, that would normally would not have been called out if they had just gone through the normal route of annotation without using the proteomic data.
That was the initial work, and it helped them to say, ‘Maybe we should switch the parameters here of what we call genes.’ So they went through and said, ‘Obviously these are genes that we haven’t called. Why weren’t we calling these out earlier?’
And it started some discussions on their side: ‘At least for this specific organism, maybe we do need to tweak how we call out genes.’
And it brought it back an organism-specific approach. A lot of cases, they do the same things for all different organisms. It kind of assisted in their annotation, as far as that aspect goes.
Did your findings shed any new light on different genomic techniques that could’ve been used to identify them, or were these genes able to be identified only specifically through the proteomics techniques you used?
In this instance, I can’t see how they would’ve been able to distinguish [the genes] using just genomics. To have that orthogonal information, to have that peptide identified in the mass spec, it’s such an orthogonal type of analysis compared to what happens on the genome side, I can’t see how they could’ve come up with calling those out.
Does this work go toward mapping out the entire proteome of this bacterium?
That’s essentially what we were trying to do, not to claim that we can ever get a complete proteome of any organism, but that’s quite a hot debate in the proteomics field right now.
How common is this strategy, to use proteomics to do some of the genome sequencing?
It has happened before, but looking back, in almost all instances, what I’ve seen is pretty much you taking an existing genome and then you go back and you say, ‘OK, how much more information can I potentially get out of that proteome.’
And that, I think, in almost all instances is the case.
I think the point we’re trying to make here is that by doing the proteomics and genomics at the same time … and [reporting] the genome of an organism and also to be able to report the proteome of an organism … makes [for] a much more powerful characterization of an organism.
It’s a highly orthogonal approach in order to do that, to add that type of information in there.
Do you have any idea why this isn’t done more often, especially when people keep saying that genomics and proteomics are complementary? It seems this kind of approach would be obvious.
Yes, it would. I think the trick is … it’s a timing thing. You’ve got the sequencing centers and the sequencing individuals, and then obviously you’ve got those who are experts in proteomics. In our case, we had a collaborator who brought us two together.
[Pakrasi] said, ‘I know the proteomics, and I believe in it, and I like it, and I know that it can add benefits. And at the same time he’s working with his sequencing center in St. Louis. And I think it’s more a matter of timing and being able to bring the two expertise together at the right time.
There’s a lot of sequencing going on and there’s a lot of proteomics going on, and they’re happening in the same organisms. It’s just that when you’re doing the sequencing, to have the proteomic expertise at the exact time of the sequencing and being able to have a collaborator who understands both and bring the both together, I think that’s probably what’s not common.
And I’m hoping it becomes more common … and we’ve demonstrated that this is a very good technique. If someone is sequencing a genome, I would highly recommend that competent proteomic data is accompanying that and I would expect that it would become more common in the future if not right now or some time currently.
Would this represent a new model for genomic sequencing, one that incorporates proteomics on a more active and earlier stage?
Potentially yes. I don’t know that if you can call it a model … I would call it more of a new emphasis or new approach that really helps augment genomic information when it first comes out to help benefit anyone who’s interested in looking at that genome.
Now, there’s so much proteomic work that’s going on. By saying, ‘OK, we now have the genome sequence,” if you have proteomic data already incorporated into that, that saves a lot of researchers a lot of time.
Is there a cultural divide between the genomic world and the proteomic world? And if so, does that account for why this approach has not been used more often?
I could see how that could be a part of it. I mean, there is, they’re different technologies, they’re both cutting edge technologies, but genomics is a little more mature, it’s a little more standardized.
You are bringing in two different expertise from different fields … everyone loves to talk about proteomics and genomics and they always seem to say those two [terms] in the same breath, but being able to bring those together [is rare].
[But] I think it’s becoming more common … there is a lot more crosstalk in science now than there was just a few years ago, and I think this is part of that trend.
Are you continuing to build off this work? Are you setting out to map the entire proteome of the Cyanothece bacteria, for example?
What we’ve kind of provided and what we’ve augmented with the genomics is a good part of the proteomic map of this organism. We do have quite a bit actually in the pipeline right now for this organism.
This really is a very qualitative study and now we’re focusing on getting all the qualitative effects. This organism has a very defined night-day cycle, there are different events that are occurring … and we are really working on quantitating that, getting the protein values and the abundance values, getting a more dynamic view of what actually is occurring across the life cycle of this organism, and going down that road right now.
Describe the quantitative work you’re doing.
When we run the tandem MS experiments that gives us the peptide identifications to build the database … what we call our AMT database, or accurate mass and time database where we have an accurate mass for the peptide and liquid chromatography elution time.
Once we have that built, for quantitative reasons what we do is we like to go into our high mass accuracy instruments, our FTICRs, our Orbitrap instruments, where we go and we essentially pick the time points that we want to study and we go and we pound those to determine the abundance information from these high mass accuracy instruments.
That’s essentially what we’ve done, and so we’ve gone in and we’ve got the MS/MS information and now we have very complementary quantitative information from our high mass accuracy instruments. And that’s the information that we’re now using to look at the quantitative aspects of the life cycle [of this organism].
The paper said that other Cyanothece strains will be sequenced. Are you involved in doing the proteomics for that work?
Yes … we’ve received some additional funding to kind of further this to move beyond just this one organism and to start looking at different strains and different groups of organisms.

File Attachments
The Scan

Myotonic Dystrophy Repeat Detected in Family Genome Sequencing Analysis

While sequencing individuals from a multi-generation family, researchers identified a myotonic dystrophy type 2-related short tandem repeat in the European Journal of Human Genetics.

TB Resistance Insights Gleaned From Genome Sequence, Antimicrobial Response Assays

Researchers in PLOS Biology explore M. tuberculosis resistance with a combination of sequencing and assays looking at the minimum inhibitory concentrations of 13 drugs.

Mendelian Disease Genes Prioritized Using Tissue-Specific Expression Clues

Mendelian gene candidates could be flagged for further functional analyses based on tissue-specific transcriptome and proteome profiles, a new Journal of Human Genetics paper says.

Single-Cell Sequencing Points to Embryo Mosaicism

Mosaicism may affect preimplantation genetic tests for aneuploidy, a single-cell sequencing-based analysis of almost three dozen embryos in PLOS Genetics finds.