I cringed when I heard the announcement of the H-Invitational Database: a new “human gene database” featuring “integrative annotation” of human genes … blah, blah, blah. Just what the world needs: another human gene database! I waited a decent interval to let them work out the kinks, then braced myself for an exciting visit to what the PLoS paper introduced as a “substantial contribution [for the] exploration of human biology and pathology.”
H-InvDB was produced by a large consortium (the paper has 158 authors from 68 institutions) who analyzed more than 40,000 full-length cDNA sequences from seven large sequencing projects. They aligned the cDNA sequences to the human genome, yielding about 20,000 unique genes. Then they looked at gene structure and alternative splicing, followed by a battery of functional annotation analyses: Blast, motif scanning, and so on. This is pretty routine work nowadays, but gene annotation is so complex that it can’t hurt to have another group add its two cents. The database is open access and can be easily downloaded.
It’s not clear whether H-InvDB is intended to be just a project database reporting the analysis of these 40,000 sequences, or a comprehensive human gene database. The details in their paper and on the website suggest the former, while the hype suggests the latter.
Testing It Out
After a quick jaunt around the site, I did my usual first test which is to look up the Huntington’s Disease gene. Such a famous gene should be an easy case. Guess what: I couldn’t find it. A search for “HD” (the gene’s official HUGO symbol) pulled up 21 hits, all wrong. I moved on to my second favorite gene, caspase 1, with gene symbol CASP1. Nada. I tried several more. Some worked, some didn’t.
A possible explanation for the missing genes is that they aren’t in the starting set of 40,000 sequences. I tested this by Blasting HD and CASP1 on the H-InvDB site. Indeed, the genes are not there. This really surprised me, since HD and CASP1 are widely expressed, and supports the hypothesis that H-InvDB is just a project database.
When a gene is found, the website presents a page of stored analyses: name, definition, motifs, GO annotations, predicted structure, and predicted subcellular localization. From here you can get to other views: G-integra, a genome browser which shows the gene in its genomic context; DiseaseInfo, which shows genetic diseases lacking a known gene that map closely to the gene in question; and H-ANGEL, presenting Gene X tissue expression data from multiple data sources. Of these, H-ANGEL seems most useful.
While hardly a unique resource, H-InvDB is a good addition to the roster of integrated gene databases (see box). Its incomplete coverage will not be a problem if you use it to supplement other, more comprehensive resources, such as NCBI Entrez Gene or the databases listed in the box.
REFERENCES and URLs USED FOR THIS MONTH''S ARTICLE
H-Invitational Database http://h-invitational.jp
US (NIH) mirror http://madb.nci.nih.gov/cards
More Gene Sources
Integrated gene databases present information about genes collected from multiple sites or pre-computed locally. It’s a classic one-stop shopping model.
• GeneCards is the premier example of this class. A lot of laboratory scientists like this site, and I agree. My major objection to GeneCards, as noted in previous columns, is that it’s not open access.
• Harvester is similar to GeneCards but does not reformat the information it gets from other websites. Its output consists of verbatim pages from the underlying data sources which it obtains in advance so you don’t have to wait.
• GeneLynx is even simpler than Harvester and provides a succinct page of links to other websites.
• Stanford Source has greater focus on gene expression and provides easy access to microarray gene expression data from the Stanford Microarray Database and elsewhere.
Nat Goodman, PhD, is a senior research scientist at the Institute for Systems Biology and is co-founder of HD Drug Works, which tests treatments for Huntington’s Disease. Send your comments to Nat at [email protected]