By Nat Goodman
Biologists are forever asking, “Where is my gene?” — by which they mean, “Is my favorite gene expressed in my favorite tissue?” This month, I’m taking a look at several websites that answer this question, or that give you the raw materials to do so yourself.
First off, three websites present Affymetrix data from a series of normal tissues.
My favorite is the Gene Expression Atlas from the Novartis Genome Institute. This site presents human data from Affy’s U95A chip (with about 10,000 genes) run on 47 tissues or cell lines, and mouse data from Affy’s U74 chipset on 45 samples. The human samples are mostly from vendors (Clontech, ATCC, Research Genetics); the mouse samples were produced internally. The website design emphasizes function over form and is a pleasure to use: you can specify one or more genes of interest in a text box, or upload a file of identifiers, press go, and your results come back in a single, easy-to-peruse Web page.
GeneNote from the Weizmann Institute of Science has human data from the full U95 chipset (A-E) run on 12 samples from Clontech. The site also presents expression estimates based on EST and SAGE data. The website is prettier than GNF’s Atlas, but harder to use. You can only ask for one gene at a time, you have to specify the type of identifier (gene symbol, LocusLink ID, and so on), and you have to click through an intermediate results page to get to the actual data.
HuGEIndex from Harvard has data from Affy’s older Hu6800 chip run on 19 tissues from 49 different people. It sounds like they obtained these samples from Harvard patients. The website is less polished than the others. You can query by gene (one at a time) or organ. The gene queries seem to require exact match, including case, which can make it tough for a newcomer.
GNF Atlas and GeneNote have kindly deposited their data in NCBI’s GEO as accessions GSE96 (GNF human), GSE97 (GNF mouse), and GSE803 (GeneNote). The GNF data can also be downloaded directly from their site. I couldn’t find HuGEIndex in GEO, but the data is available on their site.
Go to the library
An older and simpler approach, still in use, is to see what genes are present in tissue-specific cDNA libraries. This is very attractive in light of all the EST sequencing that’s been done: just go to UniGene, find some libraries for your favorite tissue, and look at the ESTs sequenced from those libraries. A fancier version of the idea is to use SAGE. You can also look for expression differences by counting how often your gene is seen in libraries from different tissues.
A lot of websites present this approach including UniGene’s Digital Differential Display, Cornell’s TissueInfo, CGAP’s SAGE Genie, NCBI’s SAGEmap, and TIGR’s Library Expression Search. GeneNote also reports this kind of data.
One pragmatic problem is that library names are idiosyncratic, to say the least. It’s not always obvious from the name what tissue was used to create the library and whether the source was normal or diseased. One solution is to use SANBI’s eVOC ontology that classifies libraries by anatomy, disease state, and other features. Unfortunately, there doesn’t seem to be an online website for eVOC (this is unusual for SANBI — maybe it’s coming soon), so you have to download the data and manipulate it locally.
An even older but gooder approach is to curate information from the literature. The Jackson Laboratory’s Gene Expression Database does so for mouse with a focus on developmental data.
Personally, I gravitate to the microarray sites, which may be techno bias on my part. I’ve never really used SAGE, but I suspect this data is pretty good, too. I worry about using ESTs for quantitative purposes because of the unknown bias introduced by library normalization.
Nat Goodman, PhD, is a senior research scientist at the Institute for Systems Biology and is co-founder of HD Drug Works, which tests treatments for Huntington’s Disease. Send your comments to Nat at [email protected]