Bioinformatics consulting firm 3rd Millennium recently completed development of a distributed web-based gene expression repository for Huntington’s disease research supported by the Hereditary Disease Foundation. Over 60 researchers at 20 different universities across the US will be able to search across data in the NeuMetrix data repository, which currently contains more than 15,000 data files from microarray experiments conducted by the HDF’s Hereditary Disease Array Group.
The database represents the first fruits of a project that Cambridge, Mass.-based 3rd Millennium has been working on since October 2000 — the Pathway Information Management System (PIMS), a bioinformatics platform for semantically integrating biological data within the context of pathway information. The project, funded by a $1.8 million US government Advanced Technology Program grant, has been under wraps until now. Development of the NeuMetrix repository was supported by the grant, but represents only “a small part” of the complete PIMS project, said Jack Pollard, principal investigator, bioinformatics, at 3rd Millennium.
Pollard said the NeuMetrix database is “the first indication of what the PIMS technology is capable of,” but acts primarily “as another data source for results that can be fed into PIMS.” However, he said, much of the effort that went into developing NeuMetrix shaped the company’s thinking for a broader information management system for biological research; in particular, “how to model data in a way that gives researchers access to the kind of information they want.”
Noting that the term “know-ledge management” is quickly becoming overused, Pollard described PIMS as a way to represent the “results of genomic experiments, rather than the data.” PIMS models biological objects as well as the interactions in which they can participate. Ontologies — either created by 3rd Millennium or customized for the user — define the relationships between the objects and serve as a basis for semantic integration. The resulting system integrates and provides access to sequence and annotations, results of gene expression experiments, physical interactions, pathway and disease models, and other types of biological information.
Pollard said the underlying PIMS technology is pretty much in place, and the company is currently concentrating on user-side features.
For the NeuMetrix database, the 3rd Millennium team took on a task that the HDAG had already tried and failed to pull off on its own. Jim Olson, assistant member of the Fred Hutchinson Cancer Research Center and coordinator of the HDAG, said the team tried for “over a year” to build a similar system within the institute. However, it was unable to get over several hurdles — namely the scale of the data involved (Olson estimated HDAG generates up to a gigabyte of data per day), the distributed nature of the repository, and the different levels of security access that needed to be integrated into the system so that researchers could access the database to work on projects that are not yet publicly available.
The resulting repository “allows us to do very complex Boolean searches and achieve data analysis with far better results than anything else we looked at,” said Olson.
The NeuMetrix database will be available to the general public in the fall, following the publication of an upcoming research paper based on the data.