Skip to main content
Premium Trial:

Request an Annual Quote

Define Verified : AnVil Tackles Tricky Task of Validating Celera s Genes


Computational gene prediction is known to be an uncertain process, but it turns out that deciding when — and if — a gene is indeed verified is a bit of a black art itself.

This June, Burlington, Mass.-based AnVil entered a collaboration with Applied Biosystems to experimentally validate genes in the Celera Discovery System that were predicted by computational approaches. John McCarthy, director of discovery informatics at AnVil, said that the company has so far gotten through around 8,000 of the predicted genes, and is about one-third of the way through the total set.

How’s it going? Well, it turns out that answering that question is trickier than it may seem.

AnVil is using microarray and Taqman experiments to verify the predicted genes via expression experiments using probes and PCR primers supplied by ABI. But the success rate so far is difficult to assess because “verification is questionable,” said Pat Hoffman, an AnVil scientist working on the project. “Probes can be verified, but you don’t know if the whole gene is verified. If you have three to four probes per gene and 50 percent are verified, it doesn’t mean the whole gene is present.”

In addition, he said, “it’s not an apples-to-apples” comparison between the microarray-based and Taqman-based methods, which don’t check the same regions of the gene. The correlation between the two techniques is around 70 percent, Hoffman said.

As a ballpark estimate, Hoffman guessed that AnVil has seen a validation rate of around 30 percent for the genes studied so far.

The best way to validate the computationally predicted genes would be a gene-by-gene approach, which would be unfeasible for the large set of predicted genes to be verified. The AnVil/ABI approach, as uncertain as it may seem, is the best high-throughput method available for confirming the existence of genes with no biological evidence, McCarthy said.

ABI estimated that the unconfirmed genes with almost zero traces of evidence make up between 10 percent and 20 percent of the human genome in CDS.

“A lot of work still needs to be done in trying to predict genes from sequence. It’s not a solved problem,” said Hoffman.

— BT

Filed under

The Scan

LINE-1 Linked to Premature Aging Conditions

Researchers report in Science Translational Medicine that the accumulation of LINE-1 RNA contributes to premature aging conditions and that symptoms can be improved by targeting them.

Team Presents Cattle Genotype-Tissue Expression Atlas

Using RNA sequences representing thousands of cattle samples, researchers looked at relationships between cattle genotype and tissue expression in Nature Genetics.

Researchers Map Recombination in Khoe-San Population

With whole-genome sequences for dozens of individuals from the Nama population, researchers saw in Genome Biology fine-scale recombination patterns that clustered outside of other populations.

Myotonic Dystrophy Repeat Detected in Family Genome Sequencing Analysis

While sequencing individuals from a multi-generation family, researchers identified a myotonic dystrophy type 2-related short tandem repeat in the European Journal of Human Genetics.