NEW YORK – New research is highlighting the additional insights into germline de novo mutations (DNMs) that can be gained by incorporating long-read sequence data from families into the process.
"[W]e investigate and quantify the difference in DNM detection that can be reliably identified between short- and long-read data as well as the effect of a more complete reference genome for variant discovery," senior author Evan Eichler, a genome sciences researcher at the University of Washington School of Medicine, and his colleagues explained. "The use of multiple orthogonal sequencing technologies allows all events to be validated, producing a rigorous truth set with the potential to improve DNM detection and estimates of DNM rates."
As they reported in the American Journal of Human Genetics on Monday, the researchers used Illumina, Oxford Nanopore, Pacific Biosciences, Strand-seq, Bionano Genomics optical mapping, and 10X Genomics technologies or approaches to do deep genome sequencing on blood samples from four individuals from an ASD-affected family enrolled in the Simons Simplex Collection, analyzed alongside a more complete human genome reference assembly known as the T2T-CHM13.
By bringing in the long-read sequence data, the team tracked down and verified 20 percent more DNMs compared to those found with short-read sequence data alone, including 171 de novo SNPs, small insertions, and deletions, indels, or single base changes shared in the female sibling who had been diagnosed with ASD, and her sibling, a nonidentical twin brother.
"[W]hile the overall numbers appear similar, the long-read data are discovering a new subset of DNMs traditionally excluded or filtered by the short-read data," the authors reported, noting that the work "widens the gap in the number of DNMs present in the proband and sibling."
With the help of the more complete reference genome sequence data from T2T-CHM13, meanwhile, the team identified 195 de novo indel or single base changes in the ASD-affected and unaffected siblings, including variants that were specific to each child. The analyses highlighted 88 DNMs in the child with ASD, for example, compared to the 107 DNMs detected in her unaffected brother's germline.
The analyses also revealed nearly two-dozen possible mosaic mutations that appear to have arisen post-zygotically, including 20 mosaic DNMs that could be traced back to either maternal or paternal haplotypes.
While germline DNMs appeared to be overrepresented in the paternal haplotype, turning up 2.59 times for every one maternal DNM, the paternal-to-maternal ratio was far more closely matched when the authors considered the apparent post-zygotic mosaic changes, coming in at 0.66-to-one in the paternal and maternal haplotypes, respectively.
"Although the mosaic sample is small, this observation is consistent with the expectation that there is no parent-of-origin bias for post-zygotic mutations," they suggested.
Finally, the team used the new sequence data collection to analyze DNMs falling in tandem repeat portions of the genome, flagging half a dozen notable de novo indels in the tandem repeat sequences. The set included two tandem repeat structural variants, while a broader structural variant analysis led to more than 200 de novo structural variants in the ASD-affected proband and/or in the unaffected child.
"Despite their importance in neurodevelopmental disease, this more comprehensive DNM analysis did not reveal any new candidate mutations to better explain the proband's autism status," the authors noted, suggesting that the case may involve inherited or polygenic variants or de novo variants that are still being missed despite improve detection.
"An important aspect of this work was that all DNM candidates were obtained from primary tissue, in this case peripheral lymphocytes from blood," they cautioned, adding that "apparent de novo [structural variants] were initially identified with other technologies including Strand-seq, 10X Genomics, and Bionano Genomics where lymphoblastoid cell culture instead of primary blood were used to obtain larger amounts of DNA or actively dividing cells for the assay."