NEW YORK (GenomeWeb News) – Researchers from the Mouse Genome Sequencing Consortium and their collaborators have created a finished, high quality assembly of the mouse genome.
In a paper appearing online last night in PLoS Biology, the team used clone-based sequencing and assembly to generate a high quality mouse genome assembly called "Build 36." In so doing, they filled in thousands of gaps in — and added millions of bases of sequence to — the draft version of the mouse genome, published several years ago. By comparing the finished mouse and human genomes, the researchers were able to more fully appreciate conserved regions in the genomes as well as those specific to mice or rodents.
"With the benefit of hindsight, we now see how incomplete our initial summary of the mouse genome was," co-lead author Deanna Church, a staff scientist at the US National Institutes of Health's National Center for Biotechnology Information, said in a statement. "The new findings will allow us to dismiss some commonly held misconceptions and, more importantly, to reveal many hidden secrets of mouse biology."
The Mouse Genome Sequencing Consortium and Mouse Genome Analysis Group published a draft version of the mouse genome in Nature in 2002. That draft assembly, called MGSCv3, was generated using whole genome shotgun sequencing.
But the team also has been working in parallel to create a more refined mouse genome assembly using clone-based sequencing and mapping. To do this, the researchers sequenced BAC clones covering the entire mouse genome, incorporating information from the already available draft genome sequence. Nearly all of the sequencing was done at Washington University's Genome Center, the Broad Institute, Baylor College's Genome Center, and the Wellcome Trust Sanger Institute.
Using this data, the researchers assembled Build 36, closing more than 175,000 gaps in the mouse draft genome. The assembly contains 139 million bases of new sequence as well as millions more bases that appear to have been misassembled in the draft genome.
"The mouse genome assembly shows marked improvements over the MGSCv3," the authors noted, "with an increased amount of ordered and oriented sequence placed on a chromosome … and increased base level accuracy due to the addition of clone-based finished sequence."
Based on this assembly, the team concluded that the mouse genome contains roughly 20,210 protein-coding genes — nearly 1,200 more than the human genome. In particular, the researchers noted, Build 36 contains 1,259 mouse specific genes that were previously missing or misrepresented.
The researchers identified repetitive elements, repeats, and segmentally duplicated regions in the mouse genome that appear to harbor mouse or rodent specific sequence. They also uncovered shared long non-coding RNAs in the mouse and human genomes as well as ncRNAs present in mice but missing in humans.
"These new findings are extremely important in helping us to separate genes that underpin biology that is the same across all mammals, from genes that make humans and mice so different from one another," co-senior author Chris Ponting, a group leader at the University of Oxford's MRC Functional Genomics Unit, said in a statement.
The researchers are continuing to refine the mouse genome. A currently available assembly, Build 37, reportedly offers further improvements over Build 36, though the authors noted that some regions of the mouse genome "remain under review and will be addressed in forthcoming assemblies."
The researchers conceded that the clone-based sequencing and assembly is more expensive and time consuming than whole genome methods, but they argued that the extra investment is warranted in situations where researchers require a more refined view of the genome.
"[I]t's clear from our analysis of the finished mouse genome assembly that draft [whole genome sequence and assemblies] will always poorly reflect lineage-specific biology," the authors noted. "Finished genome sequence has proved essential to understanding the full range of biology for both the human and the mouse genome, and will no doubt prove similarly informative for other vertebrate species."