Skip to main content
Premium Trial:

Request an Annual Quote

New Genomic Functions Fleshed Out By Analyzing Combined Gene, Protein Data

Premium

By analyzing the overlap between gene-expression and protein-expression experiments, scientists at BIATECH and the Pacific Northwest National Laboratory have ascribed functions to a significant portion of the sequences in the organism Shewanella oneidensis, whose functions were previously unknown.

According to a paper published in the Feb. 8 issues of the Proceedings of the National Academy of Sciences, about 40 percent of the genetic sequences in S. oneidensis were “hypothetical,” or without any known function, before Eugene Kolker, president and director of BIATECH, and James Fredrickson, a PNNL chief scientist, analyzed them closely.

Upon analysis, it was found that about one-third, or 538, of the hypothetical genes expressed functional proteins and messenger RNA. Out of these, researchers were able to ascribe functional information to 256, or 48 percent, of the genes. However, they could confidently assign exact biochemical functions to only 16 proteins, or 3 percent of the expressive hypothetical genes.

The researchers could identify protein homologues for 97 percent of the hypothetical proteins.

The methods used in this study are important because they are a way of “fleshing out,” or filling in the blanks of any organism’s genome, the authors noted.

“In a lot of cases, it was not known from the gene sequence if a protein was even expressed,” said Fredrickson. “Now that we have high confidence that many of these hypothetical genes are expressing proteins, we can look for what role these proteins play.”

S. oneidensis was used as a model organism in this study because it is of key interest to the US Department of Energy for its potential to break down nuclear and heavy-metal wastes. Kolker noted that S. oneidensis is a “higher IQ” model organism than E. coli because it has 2.2 times more signaling genes, despite having a genome that is only 4 percent larger.

When Kolker and his colleagues analyzed the functions of the hypothetical genes, they found that they were somewhat different from the rest of the S. oneidensis genome. In particular, there were fewer DNA replication, recombination, and repair proteins found among the hypothetical genes.

“This was expected because these proteins [for replication, recombination, and repair] are among the most conserved ones, and are therefore easier to characterize by standard similarity approaches,” said Kolker.

In contrast, proteins involved in secondary metabolism and post-translational modification, as well as outer membrane proteins, were more common among the hypothetical genes.

“You have more genes being responsible for regulation [and] interactions,” said Kolker.

Jerry Bergman, a professor at Northwest State College in Archbold, Ohio, who studies introns, said that examining the function of so-called junk DNA is exciting, and has revealed that much of that portion of a genome has a use. “To me, it’s like living in a house for 30 years, and you open the fireplace and you find there’s a room in the house behind the fireplace,” said Bergman.

Bergman said that his research and the work of other intron researchers shows that many introns, like Kolker’s hypothetical genes, contain genes that have a regulatory function.

“It’s intriguing,” he said. “I have a feeling that regulatory function is going to be critical for a lot of this.”

Kolker said the current study did not look at introns because the miocroarray data that was available came only from coding regions. Non-coding regions may be looked at in the future, however, once Kolker’s research team receives new microarrays made by Affymetrix that include non-coding regions, he said.

A big part of the current study was the protein-analysis work, Kolker noted. “At least six different, very experienced people looked at each of the proteins and came up with an annotation,” said Kolker.

The team of protein analyzers, which included Kolker, Carol Giometti from Argonne National Laboratory, John Yates from the Scripps Research Institute, and Richard Smith from the Pacific Northwest National Laboratories, was needed in order to identify proteins with certainty, Kolker explained.

“When you’re working with proteins you understand and know well, you don’t need different people,” said Kolker. “When you are looking deeper, you are not in a black and white situation and you need a tier-based approach with multiple opinions.”

Fred Winston, a professor in the department of genetics at Harvard Medical School, said he was impressed by the way the researchers in the current study combined experimental and computational approaches to understand what is encoded in an organism’s genome.

“[The researchers’] comprehensive and systematic set of approaches have been remarkably productive in the identification and initial characterization of genes that might have otherwise been missed by a narrower set of approaches,” said Winston.

Kolker said as a next step, his research team would like to come up with a reasonably high-throughput way to express proteins, purify them and verify their function.

“After all this work, only 3 percent of the 500 or so [hypothetical genes] were confidently assigned an exact biochemical function through experimental validation — this is one of the major bottlenecks of current biology,” said Kolker. “We need to engage now in a new way to experiment and to put some kind of high-throughput method for verifying protein function.”

—TSL

The Scan

NFTs for Genome Sharing

Nature News writes that non-fungible tokens could be a way for people to profit from sharing genomic data.

Wastewater Warning System

Time magazine writes that cities and college campuses are monitoring sewage for SARS-CoV-2, an approach officials hope lasts beyond COVID-19.

Networks to Boost Surveillance

Scientific American writes that new organizations and networks aim to improve the ability of developing countries to conduct SARS-CoV-2 genomic surveillance.

Genome Biology Papers on Gastric Cancer Epimutations, BUTTERFLY, GUNC Tool

In Genome Biology this week: recurrent epigenetic mutations in gastric cancer, correction tool for unique molecular identifier-based assays, and more.