It’s almost 2001, but Genome Database (GDB) 2000 is still not finished.
In this latest version, this curated map-based gene database was supposed to have been combined with the human genome reference sequence from the US National Center for Biotechnology Information.
The new database, which would be free and publicly available like the previous versions released during its 11-year history, was to be designed so a disease query would not only yield the chromosomal location of the genes associated with the disease and the associated information entered by the database’s volunteer staff of 100 expert editors, but also their sequences, according to Jamie Cuticchia, director of GDB.
But the database ran out of funding for its new update several months ago, with six months of work yet to be done. Currently, an anonymous private donor and donations from software companies are keeping the existing form of GDB going, while Cuticchia searches for a way to raise the funds to complete this project.
“We have $3 million worth of hardware from IBM and $2 million in software from Oracle,” Cuticchia said. “We have 10 people working on the project on an ongoing basis. What we are looking for in terms of additional funding is to increase the amount of curatorial staff to keep pace with the discoveries coming on, and to have the resources we need to complete GDB 2000.”
But in the era of private annotated databases marketed by Celera Genomics, Incyte Genomics, and others, a public database such as GDB faces questions about whether it is obsolete and unnecessary.
“It’s an interesting question” whether GDB can still remain vital in the current climate, said Peter White, an assistant professor of pediatrics at the University of Pennsylvania School of Medicine and an editor of GDB.
But White and others still think GDB has unique offerings. It is the only public database that is curated by a worldwide network of editors who, like White, are experts in their field, as well as a full-time staff of curators at Johns Hopkins University. And unlike GenBank, the information is mapped onto chromosomes — a useful feature for scientists working with medical cytogenetics.
The curation of GDB is in fact Cuticchia’s principal argument for maintaining GDB’s relevance.
“If you look at what you get from Celera or Incyte or even NCBI human annotation, you have a range of curation from nothing to in essence one man’s opinion,” said Cuticchia. “We operate the database as one would a high-quality peer-reviewed journal. The information coming into GDB is passed on to a team of 100 editors.”
These editors are chosen from GDB’s human gene mapping committee. But their involvement level varies widely, even Cuticchia admits. “Some spend a lot of time, but others you never hear from after the day they are appointed,” he said.
This inconsistency leads to GDB’s downside, the lack of systematic error checking, said White. But, he added, “its strength is that data are more carefully looked at on an individual basis than something industrially collected like a Celera or some of the other human genome databases.”
To keep GDB going, Cuticchia plans to go commercial. While he maintains that GDB will always be publicly available, he plans to develop and license software to pharmaceutical companies that will allow them to operate their own internal versions of the database behind their own firewalls and to add their own internal annotation.
But even if the update gets completed, others are not so sure that GDB can remain relevant. “While the information is there, it’s sometimes not so easy to get to it,” said White.
White is using an R01 grant from the National Institutes of Health to develop his own chromosomal map-based gene database that he hopes will do what GDB originally sought to do. “We want to create an online genomic catalog where someone can go to a single website and collect all of the information available for a given gene, genetic marker, region, or chromosome without having to sort through different databases,” he said.
This database, which is provisionally entitled Compview, but which White hopes to call E-Genome once domain name issues are resolved, includes high-resolution maps of each chromosome with genes and DNA markers. At first it will contain just structural information, but function information will be added later, he said.
This new database is tiny by any contemporary standard. It runs on a high-performance Apple Macintosh using Fourth Dimension, a high-end Mac relational database. Eventually, if the database is successful, White said he could see it running on an Oracle or Unix platform. “We’re trying to take the lowest common denominator for small labs not experienced in bioinformatics,” he said. “We want to keep the database as small as possible.”
—Marian Moser Jones