By Andrea Anderson
NEW YORK (GenomeWeb News) - Researchers from the UK, US, and Germany reported in Nature that they have sequenced and compared the genomes of mice from 17 strains.
In the process, the researchers found some 56.7 million SNPs, nearly nine million small insertions and deletions, and hundreds of thousands of structural variants in mice. By bringing this information together with gene expression and other data, they were also able to track down hundreds of quantitative trait loci and variants contributing to tissue-specific gene expression.
"[U]sing our catalogue, and the genome sequences reported here, we have begun to identify the molecular basis for this complex pattern of gene regulation," co-corresponding author David Adams, an experimental cancer genetics researcher at the Wellcome Trust Sanger Institute, and co-authors wrote. "Further analysis and functional studies will allow us to identify the exact sequence differences responsible for these allelic expression differences."
The mouse sequencing study was initially motivated by an interest in getting a better handle on the genetics of mouse strains being using in cancer genetics studies, Adams told GenomeWeb Daily News. But when they started looking at this problem in more detail, he explained, the group realized that researchers working with other mouse models were facing similar challenges.
"The issue of lacking [mouse] sequence is not just ours," Adams said, "but is an issue for all mouse geneticists who work in all corners of biology."
To learn more about genetic patterns in mice, the researchers sequenced the genomes of mice from 17 strains in the Jackson Laboratory collection, 13 standard laboratory strains and four strains generated through inbreeding of wild mice, mainly using the Illumina GAII.
"Collectively the sequences of these strains capture the genomes of the most commonly used strains of mice and their progenitors," they explained.
The researchers aimed for at least 20 times coverage of each genome, co-first author Thomas Keane, also from the Sanger Institute, told GWDN, and, on average, got around 25 times coverage of the genomes. They also used RNA sequencing to assess gene expression patterns in brain tissue from mice representing 15 of the strains.
By comparing the genomes with the mouse reference genome, the team found 8.8 million small insertions and deletions and 56.7 million SNPs ÃƒÂ¢Ã¢â€šÂ¬" far more than detected in the past.
When they sorted through the SNPs, researchers identified 120,000 substitutions that are predicted to be non-synonymous. That corresponded to roughly one amino acid altering SNP for every 1,454 codons, though some coding sequences were more variable than others.
Using the newly available genome sequence data, researchers are gaining an improved understanding of how mouse strains are related to one another phylogenetically. They are also starting to unravel some of the interactions between gene expression and genetic variation in the genomes.
Quality sequence data from inbred lines is necessary for exploring the influence that DNA sequence has on allele-specific expression, Adams explained. "This is really the first time that that's been possible in a vertebrate," he said.
For example, when the researchers assessed allele-specific expression patterns using data for six tissues from mice in a C57BL/6J and DBA/2J cross, they found that around 12 percent of gene transcripts show allele-specific bias.
"The interesting thing to do now is to take that data and to look very closely at the sequence differences between alleles to look for patterns," Adams explained. For instance, in genes that are over-expressed, it may be possible to track down sequence motifs corresponding to certain transcription factors.
"It really opens up the systems biology approach to understanding how genes are regulated," he added.
The team also found evidence that the location of quantitative trait loci in the mouse varied depending on the function of these loci. QTLs that cropped up between genes typically showed smaller effects than those in the non-protein coding introns within genes, which tended to show pronounced functional effects.
"In a lot of human genetic studies we're currently fixated with the exome when we go hunting for variants causing disease," Adams noted. "But our data suggests that the whole-genome sequencing approaches where we can take all variants into consideration are going to be very important."
The team was unable to look at about 13 to 23 percent of each mouse genome owing to the complexity of these regions. Still, the investigators identified 712,000 structural variants at 280,000 sites in the genome, along with 70,000 transposable element insertion sites.
In another study, also appearing online today in Nature, some of the same authors looked in more detail at these structural variants, which make up an average of 48.4 million bases of sequence per mouse genome.
They found, for example, that structural variations were more common in wild-derived mouse strains, which also had higher levels of overall genetic variations than their laboratory-derived counterparts.
"We found an order of magnitude higher number of structural variants, SNP, and short indels in the wild strains [relative to the reference genome]," Keane said.
An average of 98.2 million bases of sequence per genome were structurally variant in the wild-derived strains compared to 33 million bases, on average, in the 13 laboratory strains.
The group plans to create new reference-guided genome assemblies for the strains sequenced so far, Keane said. They also hope to sequence even more mouse laboratory strains.