With £3 million ($4.6 million) in new funding, the Wellcome Trust Sanger Institute and collaborators have embarked on a project that aims to use second-generation sequencing technologies to sequence and assemble the genomes of 17 mouse strains that are widely used in research.
Over the next three years, scientists at the Sanger Institute plan to use Illumina’s Genome Analyzer to sequence the strains and discover genetic variants between them. The ultimate goal is to generate de novo assemblies of their genomes.
New sequencing technologies are enabling the project, according to David Adams, an investigator in experimental cancer genetics at the Sanger Institute who is leading the project. “It would have been inconceivable three years ago to even think about this, just because of cost,” he said.
The results are expected to help researchers understand the genetics of disease. Mouse strains display different phenotypes, some of which mirror human disease, Adams explained. For example, some mouse strains get heart disease, while others are prone to cancer. “By sequencing the background strains, we can uncover that diversity and use that as the basis to finding disease genes and disease pathways,” he told In Sequence last week.
In particular, the genome sequences will benefit mouse researchers — who perform many genetic studies in strains other than the sequenced reference strain, C57BL6/J — to link a phenotype to a genotype. Right now, “people are doing the genetics to find regions that are associated with a particular disease, but they don’t actually know where the causal nucleotide variants are,” Adams said. “By sequencing the strains, we will be able to go from having a candidate region to knowing the variants, without all the hard work involved in these experiments at the moment.”
David Beier, a mouse geneticist and a professor of medicine at Harvard Medical School, who is not participating in the project, agreed. “Having complete sequence will facilitate comparisons that will not be confounded by incorrect assumptions regarding allele-sharing” between strains, he said in an e-mail message. “In addition, the likelihood of discovering new and unexpected strain-specific mutations will make this analysis highly informative in its own right. Finally, the particular sequence variants found are likely to shed light on the mechanism of genomic alteration.”
The project is funded with £2.3 million from the UK’s Medical Research Council and £600,000 from the Juvenile Diabetes Research Foundation. The Sanger Institute itself will provide unspecified “additional in-house support” for the study.
Besides the Sanger Institute, which will generate and analyze the sequence data, the project involves the Jackson Laboratory in Bar Harbor, Maine, and the MRC Mammalian Genetics Unit in Harwell, UK, which will provide the mouse DNA; as well as the European Bioinformatics Institute, the MRC Human Genetics Unit in Edinburgh, and the Wellcome Trust Center for Human Genetics in Oxford, which will be involved in the data analysis.
The project “will give a more complete picture of variations between mouse strains.”
The 17 strains selected for the project include a number that were used for the Collaborative Cross, a large-scale genetic breeding project begun in 2005 that aims to generate 1,000 new mouse strains from eight original breeds. Other strains are cancer-resistant, and yet others are used “for a lot of genetic manipulation,” according to Adams.
Most of the mouse DNA comes from the Jackson Lab, which has maintained and characterized the strains for many generations, and has “enough mice to supply the world for 25 years,” Adams said. “Unlike humans, where you sequence an individual and that individual you never really can go back to again, with these mouse strains, we are generating data that people will use again and again for generations,” he said, “so it will be a permanent reference.”
Initially, the Sanger researchers are using Illumina’s Genome Analyzer platform to sequence the strains, and they have already embarked on the first genome, which they are sequencing “very deeply,” according to Adams. “That’s giving us some idea of what we are actually going to be able to get,” he said.
The current plan is to sequence several DNA libraries for every strain, using paired-end reads with a read length of more than 70 base pairs. Libraries will have 200-base pair, 500-base pair, and longer inserts. “We can currently generate 3-kilobase libraries, but there is a lot of effort to generate longer sizes, which will help with closely related segmentally duplicated regions and things like that,” Adams said.
Based on a pilot project, in which they sequenced regions of mouse chromosome 17, the researchers know they are “doing an extremely good job” at calling SNPs and small indels. However, “how well we do with some of the complex repeats and the copy number variants is something we are still working on,” he said.
Within 18 months, the scientists want to report “most of the nucleotide variation” in all 17 strains. Within three years, they would like to “push towards generating de novo assemblies,” but this will require improved assembly algorithms and longer reads to cover repeats.
At the moment, the project is using the Maq algorithm to call variants, and the researchers are trying out a number of assembly algorithms, including Velvet, Abyss, and Fuzzy, a new algorithm developed by Zemin Ning at the Sanger Institute.
What makes Adams confident that de novo assemblies will be possible is the fact that the mouse strains are homozygous and inbred. “Based on that, the assemblies and variant calling in mouse is considerably easier than it is in human,” he said.
Past projects have used microarray-based technologies to resequence the genomes of different mouse strains. For example, the National Institute of Environmental Health Sciences contracted with Perlegen Sciences several years ago to resequence 15 commonly used mouse strains using its oligo array technology. That project discovered almost 8.3 million SNPs across the mouse genome.
Though the results of that project were very valuable, it did not address copy number variations or repetitive regions, according to Adams. The new project “will add to what they are doing, and it will give a more complete picture of variations between mouse strains,” he said.