Israel’s Weizmann Institute of Science has signed DoubleTwist as the first commercial licensee for its GeneCards Encyclopedia, a Web-based interface through which researchers can access biomedical information about the structure and function of human genes, a source close to the project said.
A spokesman from Yeda, the technology transfer arm of the Weizmann Institute, confirmed that a commercial entity is licensing GeneCards, although he declined to disclose the identity of the company or the financial details of the licensing agreement.
Catherine Collier, a DoubleTwist spokeswoman, declined to comment, saying the company was in a quiet period in the run-up to their planned initial public offering.
GeneCards began as an add-on to the institute’s Unified Database, a mapping database for all of the chromosomes. UDB integrates sequence tagged site and gene maps for each chromosome using a scaling algorithm developed at the institute.
The Weizmann researchers, including Michael Rebhan, Jaime Prilusky, Vered Chalifa-Caspi, and Doron Lancet, soon recognized that UDB’s underlying architecture could serve as a resource for a wide range of genomic information beyond mapping data. Proteins, diseases, sequences, and other related information could also be provided, simply by drawing from a wider range of databases.
UDB is now one of 22 databases from which the GeneCards encyclopedia extracts data.
“We believe that GeneCards presents the right combination of descriptive text and links,” explained Marilyn Safran, a researcher at the institute who is currently responsible for the GeneCards project. “Users get a nice overall picture of many aspects about a gene, organized in clear functional categories as well as a variety of links if they wish to go deeper into a particular area.”
A typical GeneCard lists the official name of the gene (from the HUGO Human Gene Nomenclature Committee), a list of synonyms (from the Genome Database), homologous genes in the mouse genome (from the Mouse Genome Database), the names of its protein products (from SwissProt), and the UniGene cluster of sequences related to the gene (from the National Center for Biotechnology Information).
The card also lists disorders in which the gene appears to be involved, the coordinates of the gene, titles of related research articles, and medical applications based on knowledge about the gene.
There are currently 19,175 GeneCards in the system, more than doubling last January’s 9,445. Safran said they typically release a new version every other month. The most recent release boosted the GeneCard count by 4,990.
The Weizmann researchers wrote the GeneCards search engine using Glimpse software originally developed at the University of Arizona. A user can search the GeneCards database using gene names, protein names, disease or symptom names, GenBank accession numbers, UniGene clusters, clone identifiers, or map regions. A spell corrector is included in the navigation support system, which also generates tips for query reformulation.
The information included in the GeneCards is collected in an internal database through an automated process developed by the researchers. The GeneCard generation script, written in Perl, starts with the list of approved gene names published by the HUGO/GDB nomenclature committee. Several algorithms and heuristics mine local FTP data and the websites of relevant databases.
Future features in the works include the integration of new data, providing GeneCards for genes that have not yet been assigned a symbol, upgrading the infrastructure to enable easier incorporation of additional data sources via an application programming interface, and migration to the XML data format for the cards.
Access to the main GeneCards site is available on the Web (http://bioinfo.weizmann.ac.il/cards) and academic entities may obtain a local mirror version that offers high-speed access. A Weizmann spokesperson declined to comment on how the commercial licensing deal would affect academic users’ access to the GeneCards website.
Funding for the GeneCards project has come from the Minerva Stiftung f r die Forschung, the Israeli Academy of Science, the Israeli Ministry of Science, the Crown Human Genome Center at the Weizmann Institute, and other sources.
According to Safran, funding for the near future looks fine and there has been “tremendous interest” from the private sector regarding commercialization of the technology.