NEW YORK (GenomeWeb) – One of the talks at the American Medical Informatics Association's Summit on Translational Bioinformatics, held last month in San Francisco, focused on the efforts of the International Mouse Phenotyping Consortium (IMPC) to characterize knockout strains for every protein coding gene in mice.
At the summit, Terry Meehan, project lead for the mouse informatics group at the EMBL European Bioinformatics Institute, one of the research centers involved in the IMPC project, discussed the consortium's broader activities, some of its early fruits, and the infrastructure and resources developed by the informatics subgroup that handles the aggregation, processing, and distribution of data produced by the participating centers.
The IMPC, which officially launched in 2011 with $110 million in funding from the National Institutes of Health's Common Fund, grew out the International Knockout Mouse Consortium (IKMC), a body that was set up to create a collection of knockout embryonic stem cells for mice.
Its goal is to characterize and provide data and metadata on more than 20,000 knockout mouse strains — typically seven male and seven female mice per knockout line — within 10 years. It also seeks to integrate these data with information from mutation and disease repositories, thus providing useful models for exploring and understanding human disease. By the end of phase I, which is expected to wrap up in 2016, the researchers expect to have characterized more than 5,000 knockout strains. Most recently, the IMPC published phenotype datasets for about 1,175 new mouse knockout strains
Meehan's group at the EBI is one of three teams — over 20 researchers in total — that make up the IMPC's Mouse Phenotyping Informatics Infrastructure (MPI2). These teams are responsible for providing high-quality data on the mouse strains, running statistical analysis on the aggregated data, as well as maintaining the main IMPC web portal. This portal, described in paper published last year in Nucleic Acids Research, provides a single access point for information on all the mutant mouse strains and embryonic stem cells, and collections of mouse phenotype data including genomic, genotypic, and phenotypic context provided by biomedical ontologies, published literature, and other sources.
Meehan's group handles data archiving and distribution, and also works to integrate the data with existing human disease resources and repositories, he told GenomeWeb. A team at the Wellcome Trust Sanger Institute (WTSI) tracks and reports on mouse production; and MRC Harwell is responsible for creating standardized phenotyping protocols that the contributing centers use for experimentation, data collection and handling.
The phenotyping protocols used by the consortium — as well as the more than 200 measured parameters, such as blood glucose response, blood urea concentration, and so on — are provided through the IMPC's International Mouse Phenotyping Resource of Standardized Screens (IMPReSS). There are separate protocols provided for phenotyping adult, embryonic, and sick mice. Each parameter is associated with standardized vocabulary terms in existing biomedical ontologies which makes it easier to integrate consortia information with existing resources on phenotype data available in the literature, Meehan said.
Also, so-called data wranglers within the informatics core are responsible for validating and processing submitted data. They are responsible for running quality control checks, highlighting issues, and working with contributing centers to remove errors and identify missing data. Data wranglers also run statistical tests on the data using an R-based pipeline called PhenStat that was developed specifically for the IMPC data. The researchers are also using a software tool called Exomiser — developed by researchers at Charite in Berlin and WTSI — to semantically match mouse phenotypes to human disease. This would be useful for clinical researchers working with rare diseases, for example, because it would help them prioritize variant candidates based on phenotype and orthologous mutant strains, Meehan said.
So far, there are more than 300 published references in the scientific literature to mouse resources provided by the IMPC, Meehan told GenomeWeb. Also, the consortium's efforts have resulted in several new mouse disease models that have been added to the public domain. Moreover, 428 diseases have been associated to orthologous genes phenotyped by the IMPC — 122 of these diseases currently have no published mouse model, according to the consortium, and similar phenotypes to the clinical ones are starting to be detected. The consortium expects to add up to 1,000 more potential disease models in the next 18 months, and members are also conducting preliminary studies to assess the impact of gender on genotype.
Furthermore, 33 IMPC strains with previously published mouse models have phenotypes in common with some rare inherited diseases in humans. For example, Gordon Holmes syndrome and Bernard-Soulier syndrome features — such as infertility, and increased mean platelet volume and decreased platelet cell number, respectively — were seen in IMPC mouse strains. Also, cerebellar ataxia in humans has a decreased fertility phenotype, which is seen in IMPC mice, and arthrogryposis renal dysfunction cholestasis syndrome has been associated with abnormal red blood cell morphology in mouse models.
Their efforts have also identified some new mouse strains as potential disease models. One mouse model — Frrs1l — is a potential model for autonomic neuropathy. At least one publication has linked phenotypes associated with this mouse strain with hereditary sensory neuropathy, including abnormal locomotor behavior and gait, decreased startle reflex, and limb grasping.
Over the next 18 months the IMPC will continue adding new strains to its web portal. As part of those efforts, it will work on characterizing embryonic phenotypes including using various imaging techniques to explore associations between knocked out genes and abnormal morphology in mouse embryos at different stages of development, Meehan said.