BAR HARBOR, Maine--The Jackson Laboratory is forging ahead with an ambitious $11 million effort to expand its already-prominent bioinformatics program.
"Scientists need new bioinformatics resources and they need them now," said Kenneth Paigen, director of the research facility, which is developing a new generation of bioinformatics software and databases. The lab's expansion effort got a boost earlier this month with the announcement of a $1.2 million federal award to create a new mouse tumor database and the unveiling of two new pieces of software designed to help scientists share genetic data.
At the same time, the lab dramatically expanded a mouse genome database used by researchers worldwide. It also continued to develop an innovative database that will link three-dimensional images of mouse embryos to information on the genes that guide fetal development.
Founded in 1929, the lab has a budget of almost $60 million and a staff of 750. It is best known for breeding and studying the genetics of mice. Its bioinformatics programs, which have a staff of 40 and are growing, have two major emphases: community-scale informatics, which involves developing the software and databases that allow large groups of scientists to share information; and laboratory-scale scale informatics, which involves developing software that will enable small labs and individual researchers to manipulate and mine genetic data for new discoveries. In addition to U.S. government support, both areas have attracted funds from pharmaceutical companies Hoffman-LaRoche and Astra AB.
The lab's bioinformatics programs have three centers of activity: the Mouse Genome Database (MGD), the Gene Expression Database for Mouse Development (GXD), and the Goodman Laboratory.
Mouse Genome Database
The flagship of the lab's informatics program, the MGD is a Web-accessible archive that the Goodman Laboratory's Nathan Goodman called "one of the most important and prominent information resources in the genetics community." Led by Janan Eppig, it is funded by a $7.4 million grant from the U.S. Department of Health and Human Services' National Center for Human Genome Research. The MGD links data on more than 20,000 genetic markers, 40,000 molecular reagents, and 40,000 literature citations with a wide array of other information. It first appeared on the Web in mid-1994 and is updated daily. Researchers access it up to 5,000 times a day.
The MGD is the official repository for mouse genome information produced by the Human Genome Project. It also includes the annual reports of the Chromosome Committees, groups of researchers who periodically review knowledge about each mouse chromosome. Joel Richardson, who leads the MGD's software development group, noted that the lab's "role in the bioinformatics world is changing. Early on, we were data collectors, converting information in the published literature into electronic form. Now, as more data are readily available in electronic form, the emphasis is on data integration: making both the raw data and the interpretations of that data available to researchers." That task has become increasingly challenging now that high-throughput labs are producing large volumes of data. For example, earlier this month the MGD dramatically expanded, with the addition of 170,000 CDNA clones and related expressed sequence tags from Washington University.
To keep up with the growth, a couple of weeks ago the lab unveiled a new version of the MGD's Web software, written in PYTHON. Among other improvements, it now allows researchers to look for genes using any accession number, to see physical maps of gene locations produced by researchers at the Massachusetts Institute of Technology (MIT), and to more easily identify molecular reagents used in research.
The lab's experience with streamlining the MGD is expected to help with a new effort to develop a Web-accessible database on mouse tumors, funded by a $1.2 million award from the National Institutes of Health's National Cancer Institute. The lab hopes to launch the new database, also directed by Eppig, in late 1998. It is looking to hire a software engineer and an information curator to staff the project.
Gene Expression Database for Mouse Development
The GXD project involves the lab's Martin Ringwald, the Medical Research Council Human Genetics Unit in Edinburgh, the University of Edinburgh, and labs supported by the European Science Founda tion. Ringwald is developing software to allow researchers to link three-dimensional images of mouse embryos to information on the genes that guide development. His portion of the project, a database of gene expression that is already partly available on the Web, is supported by a $1.2 million grant from the National Institutes of Health. The lab is currently looking for masters- or doctoral-level biologists to support the project.
The Goodman Laboratory
The Goodman Laboratory has a half-dozen staff and is focused on how to make both community- and lab-oriented bioinformatics easier and more productive. "One of my goals is to make it possible for biologists with modest computer training to create useful software themselves from fundamental building blocks," said Goodman, who recently relocated from MIT's Whitehead Institute. "We also need to create both the culture and the tools that will encourage scientists to share software. The goal is to avoid having researchers forever reinventing the wheel."
As part of the effort to reach those goals, Goodman's lab recently released a new version of its laboratory informatics software, LabBase, which researchers can download from the Web. In addition, the lab serves as the quality assurance group and software repository for the bioWidget Consortium, a group using Java to create easily shared software. "Java's arrival has created a fertile opportunity for cooperative software development," Goodman commented. His work is supported by $1.2 million grants from both the National Institutes of Health and the U.S. Department of Energy's human genome project.
--David A. Malakoff