In an effort to spice up its new Gene Expresssion Data Portal and attract new users, the National Cancer Institute has enlisted First Genetic trust to add a wide selection of analytical tools to the repository’s submission and search capabilities.
First Genetic Trust will develop an open source, extendable framework for gene expression software to the Gene Expresssion Data Portal, (GEDP, gedp.nci.nih.gov), a searchable repository of cancer-related gene expression experimental data.
The company was selected from a number of candidates who submitted proposals for the project, said Mervi Heiskanen, NCI Center for Bioinformatics project director and director of the NCI’s Director’s Challenge program, an initiative to apply new technologies to further the understanding of cancer at the molecular level.
Heiskanan said that the goal of FGT’s work is “not to build tools, but to build a space where all are welcome to analyze their own data or to reanalyze other data sets that are stored in the Gene Expression Data Portal.”
FGT’s development work is supported by the Director’s Challenge program, but Heiskanen stressed that the GEDP project will remain open to the whole community. The final platform will be freely available through the NCI for academic as well as commercial users, and source code and APIs will be available to all users as well.
FGT and the NCICB will develop some new analytical tools, but will draw from currently available public domain and open source software as much as possible, Heiskanan added.
Aris Floratos, director of the computational genetics group at FGT, said the company would focus its efforts on the object model that will underlie the framework. On the data side it will build on the MAGE object model (MAGE-OM).
FGT will add components to support the building of applications, including “objects for describing computational analyses and their results, objects for describing analysis parameters, an extensive collection of visualization widgets for displaying, dendrograms, or gene clusters, and other objects commonly used in gene expression analysis,” Floratos said.
The result, according to both Floratos and Heiskanan, will be a flexible, modular framework that researchers can use to build their own gene expression applications from available components. The framework will support all MIAME and MAGE standards, and will be integrated with NCICB’s caBIO (Cancer Bioinformatics Infrastructure Objects), which provides access to a variety of biological data models and data sources at the NCI, the Cancer Genome Anatomy Project and Genetic Annotation Initiative data repositories.
NCICB and FGT are currently defining the requirements for the project. While the particular analytical tools that will be included have yet to be identified, Heiskanan said the main goal is to ensure the framework is extendable so that new analytical tools can be added as necessary.
This aspect is a bit more challenging than it sounds, Floratos noted. “It’s one thing to say I want to make the software extendable. It’s another thing to actually make it extendable. We’re putting quite some effort into thinking what the designs will be to make this easy.”
The open source nature of the project is new territory for First Genetic Trust, which has built its business around genetic banking and population genetics — “a secure infrastructure that cannot, by definition, be open source,” according to Floratos. However, he noted, “for this particular engagement, we wanted it to be open source because we plan to use the same platform ourselves and having people contribute their own ideas by developing modules for software is for the benefit of everyone.”
A pre-development prototype should be ready before the end of the year, and the first release of the architecture is expected in June 2003. The first version will offer a range of “basic analysis and visualization tools,” Heiskanen said, including data filtering and normalization algorithms, but “we haven’t determined what they will be yet.”