Skip to main content
Premium Trial:

Request an Annual Quote

Integrated Genomics Bioinformatics Team Makes the Most out of Genomic Data Boom

Premium

There’s no such thing as too much information, according to Ross Overbeek, vice president of bioinformatics at comparative genomics company Integrated Genomics.

“There’s a lot that you can learn from a set of genomes that you can’t learn from a single one,” he said, referring to the integrated database of around 300 genomes his bioinformatics staff uses for functional annotation and metabolic reconstruction.

“The point that we made early on is that the value goes up with square of the number of genomes,” Overbeek said. “That’s why having the most carefully analyzed and integrated genomes is critical. And that’s why we think we’re way ahead. It’s solely based on the number of genomes.”

Overbeek’s philosophy that quantity could lead to quality in the case of functional genomics began with the idea that “it was probably easier to annotate 1000 simultaneously than to do a single one,” he said. “That’s obviously wrong in some sense,” he conceded, “but the essence of it is important — that is it offers a framework within which you can test and reject hypotheses that you just don’t get from studying a single genome in isolation.”

While Integrated Genomics has focused so far on microbial genomes, this is largely due to their wider availability, Overbeek said. The company plans to include eukaryotic genomes in its integration as they become available, and already uses those available in the public domain.

“The eukaryotes are where prokaryotes were in 1996,” he said. “Then, there was a small number of genomes and there really wasn’t that much to compare. But then as genomes poured in, everything became much easier. Your ability to predict genes, your ability to identify functions, your ability to assign functions to hypothetical genes, has gone up tremendously with the number of prokaryotic genomes sequenced. I believe the same thing will be true with eukaryotes.”

The company’s software toolkit, ERGO, was designed as a self-learning environment in which the power of analysis grows exponentially as new genomic sequences are incorporated into the system. Of the 300 integrated genomes in the system, approximately 110 to 115 are complete, Overbeek estimated. Thirty genomes are proprietary.

The 30 employees in Overbeek’s bioinformatics department include both computational staff and biologists focused on curation and annotation. He said the bulk of the annotation is done through an automated process, with a small number of hypotheses sent to the company’s wet lab for confirmation.

Overbeek considers the integrated database of a diverse set of genomes as the company’s key asset and the source of an almost unlimited number of business opportunities. “We’ll market it in different ways,” he said. “Whether we can do a better annotation of a person’s genome because we have a better integration to use as a tool or whether we believe it should be marketed as a standalone version, different products will emerge from that but it’s the integration as a whole that’s important.”

Integrated Genomics’ customers include Roche Vitamin, Maxygen, Genencor, Dow Chemical, Cargill, Dow AgroSciences, BASF, Archer Daniels Midland, the University of Scranton, the Department of Defense, the National Institutes of Health, and the Department of Energy.

While the majority of Integrated Genomics’ income has come from selling sequenced and annotated genomes, Overbeek said the company is ready to branch out. The ERGO suite is now available in a standalone implementation as either a browser over the database or with an additional capability that permits users to load and analyze their own genome sequences. In addition, Integrated Genomics has increased its sequencing capacity “substantially” over the last six months and is developing application projects both in the drug target area and in string development, Overbeek said.

But the company’s primary task for the time being is keeping up with a flood of genomic sequence data that is doubling every 18 months — good news for Overbeek and his staff. “The more data the better,” he said.

— BT

Filed under

The Scan

Shape of Them All

According to BBC News, researchers have developed a protein structure database that includes much of the human proteome.

For Flu and More

The Wall Street Journal reports that several vaccine developers are working on mRNA-based vaccines for influenza.

To Boost Women

China's Ministry of Science and Technology aims to boost the number of female researchers through a new policy, reports the South China Morning Post.

Science Papers Describe Approach to Predict Chemotherapeutic Response, Role of Transcriptional Noise

In Science this week: neural network to predict chemotherapeutic response in cancer patients, and more.