Once a critical region for a genetic disease is identified on a chromosome, the work has just begun for the researcher who is charged with finding the candidate genes. Manually searching across numerous rapidly changing expression and phenotype databases is error-prone and time consuming, so Marc van Driel and his colleagues at the Center for Molecular and Biomolecular Informatics (CMBI) at the University of Nijmegen in the Netherlands decided to automate the process.
The result, a web-based software tool called GeneSeeker, can do in minutes what would otherwise take hours or even days, said van Driel. In addition, because the software searches across nine key bioinformatics databases in real time, the results are likely to be more up to date and accurate than they would be through a manual search, he said.
GeneSeeker, available at www.cmbi.kun.nl/GeneSeeker/, “gives a quick overview of the candidate genes for the disorders in the region you’re interested in,” van Driel said. Users can enter genetic mapping information — a chromosome, a chromosome arm, or range — along with gene expression or phenotypic location — such as a tissue type or body part. The software then searches a total of nine databases in two different categories: genetic localization (MimMap, MGD, and GDB), and gene expression and phenotype (Medline, OMIM, SwissProt/Trembl, GxD, Tbase, and MLC) to return any gene names that appear in the specified location and are also expressed in the specified tissue.
To overcome the discrepancies between gene names in the different resources, van Driel and his colleagues created a list of synonyms that combines the gene name information in SwissProt and the GDB. This synonym list is updated weekly.
The CMBI team recently tested the software for 10 diseases with known localization regions and a range of 49-322 positional candidate genes (average of 165). The results of their evaluation, published in a recent issue of the European Journal of Human Genetics, indicate that the software is not only fast, but effective: The number of candidate genes that matched both location and expression or phenotype was reduced to an average of 22.
The software will work best for researchers looking for a quick overview of the current status of a specific region, “but if you’ve already studied a lot of genes in the region and know the region by heart, then it’s not that useful,” van Driel said. However, the CMBI team plans to constantly update and improve upon the system to enhance its effectiveness. Van Driel said his team is currently adding additional datasets to the software’s search route, including Unigene, EST databases, and SAGE (serial analysis of gene expression) data.
The team is also “experimenting” with some metabolic pathway databases in order to expand the capabilities of the system into metabolic diseases, van Driel said.