NEW YORK – Researchers from the University of Washington, the University of Utah, Pacific Biosciences, and elsewhere have tallied de novo mutations in the human genome across four generations, revealing an overall higher rate than previously estimated and relatively rapid change in specific parts of the genome.
"Our multiplatform and multigenerational, assembly-based approach provides access to some of the most repetitive regions, such as centromeres and heterochromatic regions on the Y chromosome," senior and corresponding author Evan Eichler, a researcher at the University of Washington, and his colleagues wrote in a paper published in Nature on Wednesday, noting that the "use of parental references in addition to the standard references and the ability to confirm transmissions across subsequent generations improves both sensitivity and specificity."
For their study, the investigators combined short-read whole-genome sequencing data from Illumina and Element Biosciences platforms with long-read and ultralong-read data generated with PacBio and Oxford Nanopore Technologies sequencers as well as single-cell Strand-seq data to put together phased genome assemblies for 28 individuals from four generations of a Utah family, collected by the Centre d’Etude du Polymorphisme Humain (CEPH) consortium.
With these data, the team characterized de novo mutations in each generation — from SNVs or small insertions and deletions (indels) to structural variants — across the entire genome, Eichler explained in an email, noting that the work builds on prior parent-child trio studies that focused on de novo mutations in parts of the genome that are reliably mapped using short-read sequencing.
"The [long-read sequence] data allowed us to assemble entire chromosomes including complex regions and study them across the generations," he noted. "The use of multiple generations and multiple sequencing technologies essentially allowed [us] to validate almost all de novo mutations."
The multigenerational sequence data also made it possible to assemble 288 centromere sequences and half a dozen Y chromosomes, the researchers reported. They found that the Y chromosome appeared to be subject to 12.4 de novo mutations per generation.
On average, they saw around 150 de novo mutations events per generation, Eichler said, up from the 90 to 100 previously reported based on short-read sequence data alone. But these new variants were not uniformly dispersed across the genome. Instead, their occurrence differed by more than 10-fold depending on the part of the genome.
New mutations were particularly common in short tandem repeats or variable-number tandem repeat sequences, where the team unearthed 32 recurrent mutation sites. Intergenerational, recurrent de novo mutation events turned up at 27 of those loci, while the remaining five sites showed intragenerational recurrence.
"The most repetitive portions, like tandem repeats on the Y chromosome or the centromeres, are the most mutable regions," Eichler said, noting that an estimated three to five new structural variations turned up per transmission in the genome's repeat sequences.
Although paternal bias was prevalent amongst the de novo mutations, the team found that 16 percent of the SNVs identified stemmed from germline mosaic mutations or other postzygotic changes that were not overrepresented in paternal sequences.
When they compared de novo mutation patterns to a high-resolution map of recombination events that was based on the family's sequence data, the investigators found that the presence of de novo structural variants did not appear to coincide with sites where meiotic crossover tends to occur.
Together, their analyses shed a light on the probability of new mutations arising and where in the genome they are most likely to occur, Eichler explained, which helps in understanding not only evolutionary processes, but also disease risk variants and biological processes such as chromosome nondisjunction.
The extensive sequencing dataset from the multigenerational pedigree also provides an opportunity to benchmark new sequencing technologies, since most family members have consented to having their genetic data and DNA samples released.
To expand on their findings, the researchers are interested in doing similar analyses on other families. "We know that there are differences between families, so we are keen to analyze more families in the future using the same or improved strategies," Eichler said. They are also attempting to assay very repetitive sequences found on the short arms of acrocentric chromosomes that could not be assessed in the current study, he added.