The European Union has awarded €1.3 million ($1,853,670) over three years for a worldwide effort to integrate disparate mouse genome databases.
Called CASIMIR – for Coordination and Sustainability of International Mouse Informatics Resources – the project brings together a range of international partners who plan to consolidate a plethora of complementary data into a cohesive network.
While details on the project are still being ironed out, the group held its first, invitation-only meeting in Corfu, Greece, Oct 3-6.
Cambridge University is coordinating the consortium. Partners include the Karolinska Institute in Sweden; the Alexander Fleming Biomedical Sciences Research Center in Greece; the Helmholtz Centre for Infection Research and the GSF National Research Center for Environment and Health in Germany; the UK’s Medical Research Council, European Bioinformatics Institute, and Geneservice; and France’s Institut Clinique de la Souris.
While not members of the funded consortium, the Wellcome Trust Sanger Institute, the Riken Genome Science Center, the US National Institutes of Health, and the Jackson Laboratory have all pledged to support the effort.
In an interview with BioInform this week, Janan Eppig, a senior staff scientist at the Jackson Lab, said that one of the project’s aims is to avoid duplication of efforts worldwide. This will have the net effect of cutting down on the number of animals used in clinical research.
Echoing that, Paul Schofield, senior lecturer at the University of Cambridge, told BioInform that the group aims to create a distributed network of mouse genome resources, among other goals.
“We are not talking about one central database that has everything in it,” Schofield said. “[Actually] what we are doing by networking all the databases is giving the opportunity to do different types of data mining, to ask different questions, and to pull all that information together.”
This network is expected to complement the information in the Jackson Lab’s mouse genetic information resources, such as its Mouse Genome Database, which Schofield said is “without question, the most important mouse database that we have available.”
Niels Adams, mouse genetics program manager at the Sanger Institute, told BioInform via e-mail that while only the formal consortium partners are receiving funding from the EU, Sanger, Jackson, and others, “given their contribution to the field,” are welcome to integrate their individual mouse genome databases.
Sanger is supporting the project “on several different levels,“ he said. “One example is to develop automatic annotation systems like Ensembl … Other groups at the Sanger Institute are generating biological resources like genetically modified [embryonic stem] cells and have databases that track both their pipeline and inventory.”
He said that his group is developing tools to aid the acquisition and analysis of phenotyping data from genetically altered mice with the goal of associating genes with disease indications. “The intention is to connect all these databases along with the resources developed worldwide to help facilitate both mouse and medical research,” Adams said.
Database Sustainability, IP Concerns
While the primary goal of CASIMIR is building a network of mouse genome databases, the consortium also plans to tackle some much broader informatics challenges.
For example, CASIMIR plans to look at funding models from around the world “to see if we can put together some recommendations for the European Commission and other European funding agencies, which take into account the special nature of databases in terms of the — hopefully — medium or long-term sustainability of those databases,” Schofield added.
“I think most people [at the first meeting] agreed that, certainly in Europe, there are not good funding models that are aimed at maintaining databases in the long term,” Schofield said.
CASIMIR is also looking at issues of intellectual property and data access, and how that might affect the consortium’s plans for networking the mouse genome databases.
Schofield said that researchers don’t always publish everything they find – for various reasons.
“[B]y doing so, are you jeopardizing future intellectual property exploitation? Are you jeopardizing future publication? And of course, one has to balance that against the usefulness of this publicly funded research,” Schofield said.
Despite the scientific community’s emphasis on releasing data into the public domain, Schofield said “there is a significant degree of hesitation putting things into public databases” in some European countries.
He said the consortium plans to consult experts on intellectual property rights from the UK’s Medical Research Council and from the European Molecular Biology Laboratory to address these issues.
“We are not talking about one central database that has everything in it … but actually what we are doing by networking all the databases is giving the opportunity to do different types of data mining, to ask different questions, and to pull all that information together.”
The Sanger Institute’s Adams agreed that intellectual property issues are indeed among the challenges that lie ahead, including “how to address the question of who owns the data and how the funding agencies will get credit for the data once it’s distributed to other databases.”
Further, Adams said that databases “tend to be built to help with a specific problem with limited funds,” which is another challenge the group will explore.
Laurent Vasseur, head of the IT team at the Institut Clinique de la Souris, told BioInform that before any of these goals can be addressed, the project will first take a broad overview of the field. The consortium members will look at other kinds of databases and the kind of technology that is used to make predictions and queries with these databases, and whether web services are used, for example.
CASIMIR is comprised of eight so-called “workpackages” that cover a range of tasks, such as logistical issues, data representation, technical issues, data acquisition, user interaction, and other aspects of the project.
The next meeting, to be held Nov. 27-28 in Rome, will address the nature of these workpackages – who does what among the consortium members, Eppig said.
She said that the Jackson Lab plans to take information “about what genes have mouse models or mutant alleles [for example], and display them on the global genome browsers.”
Eppig added, “Each group has one component of the project they are responsible [for], so there will be more discussion among all groups as to how they will function.” She added that the Rome meeting will be open to all.
Whatever it decides as to work groups, the consortium does not foresee replacing animal testing with data – no matter how well-integrated it may be by the project’s completion.
“Animal studies will remain important because, particularly for disease research, the whole animal ‘in context’ is how the disease is manifested and how causes and therapies can be explored. These studies cannot be done in cell lines or isolated tissues,” said Eppig.
“Having improved data integration or linking among data resources will allow researchers to better know about the full breadth of experiments that have been done and their results, thus to not repeat studies,” Eppig said.
Further information about the CASIMIR project is available here.