NEW YORK (GenomeWeb) – A team of researchers has generated de novo genome assemblies for 16 laboratory mouse strains, enabling them to tease out strain-specific diversity and fill in gaps in the mouse reference genome.
Lab mice generally fall into two groups: classical strains that are inbred and strains derived from the wild that are more genetically diverse. The mouse reference genome was initially reported in 2002, but the authors of the new study said that mapping sequencing data from other strains to the reference has meant that strain-specific variation has been overlooked.
For their study, researchers led by corresponding author Thomas Keane at the European Bioinformatics Institute generated draft de novo genome assemblies for 16 commonly used mouse strains. As he and his colleagues reported today in Nature Genetics, they uncovered more than 2,500 regions of the mouse genome that exhibit sequence diversity, including spots tied to pathogen defense and immunity, and added 62 new coding loci to the C57BL/6J reference genome annotation.
"Here we generate the first chromosome-scale genome assemblies for 12 classical and four wild-derived inbred strains, thus revealing at unprecedented resolution the striking strain-specific allelic diversity that encompasses 0.5 [to] 2.8 [percent] of the mouse genome," the researchers wrote.
Using a combination of Illumina paired-end, mate-pair, fosmid, and BAC-end sequencing and Dovetail Genomics proximity ligation data, the investigators generated chromosome-scale assemblies for the 16 mouse strains. These assemblies ranged between 2.25 gigabases and 2.33 gigabases in size and had an estimated combined SNP and indel error rate of between 0.09 errors and 0.1 errors per kilobase.
With the help of the Gencode annotation of the C57BL/6J reference and strain-specific RNA-sequencing, they identified strain-specific consensus gene sets of more than 20,000 protein-coding genes and more than 18,000 non-coding genes. On average, 37 genes in the wild-derived and 22 in the classical strains were possible novel loci.
Even though lab mice are highly inbred, the researchers unearthed regions harboring high levels of allelic variation. In particular, some 2,567 regions showed high diversity and were enriched for encoded proteins involved in immunity and defense. One such gene family is apolipoprotein L, in which variants are thought to provide resistance to Trypanosoma brucei, which causes sleeping sickness in humans. Another, NAIP, is associated with cell death upon Legionella pneumophila infection.
The researchers also examined differences across the strains, linking some to phenotypic variations. For instance, within the CAST/EiJ strain, the researchers identified 1,249 candidate olfactory receptor genes. As compared to the C57BL/6J reference, that strain lost 20 olfactory receptors but gained 37 other gene family members.
They also noticed differences in Raet/H60 haplotypes between the newly assembled strains. Raet and H60 are key ligands for NKG2D, a receptor expressed on the surface of infected and metastatic cells that may be involved in allograft and autoimmune responses. Among the eight Collaborative Cross founder strains, they found six Raet/H60 haplotypes, while the wild-derived strains had three different haplotypes. The CAST/EiJ strain had a single Raet1 family member and no H60 alleles, while the classic NOD/ShiLtJ strain had four H60 and three Raet1 alleles.
The new genome assemblies also helped the researchers fill in lingering gaps in the C57BL/6J reference assembly. They reported, for example, finding a novel 188-exon gene on chromosome 11, which they dubbed Efcab3-like, as it extends the existing Efcab3 gene, which is expressed during development in numerous tissues. Using CRISPR, they generated Efcab3-like mutant mice to find that the gene appears to have a role in the regulation of brain development. It also appears to be conserved across mammals, they added.