NEW YORK (GenomeWeb) – A Canadian team has harnessed genealogical data from Quebec to retrace the history of a rare recessive disease called "chronic atrial and intestinal dysrhythmia" (CAID), using a computational approach for inferring rare allele transmission history.
Researchers from McGill University and elsewhere used their software package, known as ISGen, to analyze past transmission of CAID alleles with the help of high-quality genealogical data for more than 3.4 million individuals of European ancestry in the Canadian province. The approach traced the rare heart and digestive condition back to French settlers who arrived in the region in the early 17th century, the team reported yesterday in the American Journal of Human Genetics.
"[W]e have shown that inferring population-scale allele transmission histories is computationally feasible, even in genealogies containing millions of individuals," senior author Simon Gravel, a researcher with the McGill University and Genome Quebec Innovation Centre, and their colleagues wrote, noting that the ISGen software package used for the analysis is open source and freely available to other researchers.
Moreover, the ISGen analysis made it possible to take a glance forward as well as backward: by teasing out the estimated allele frequency for CAID-related variants in nearly two dozen regions in present-day Quebec, the team was able to predict which parts of the province may be most prone to new cases of the rare condition.
"The work presented here aims to provide a more accurate and rigorous statistical framework for generating regional estimates, and more generally performing inference in very large genealogies that are being generated on academic, private, and participatory platforms," the authors wrote.
Generally speaking, the ISGen analytical approach is built on backward-time Monte Carlo simulations of rare allele inheritance, the investigators explained, considering possible genotypes in a genealogical dataset in relation to allele inheritance patterns, ancestral allele frequency, and observed input genotypes for certain individuals within the genealogy.
They applied the method to a database called Balsac Population Register — established at the University of Quebec at Chicoutimi, in partnership with McGill and other Quebec institutions, more than four decades ago to interpret relationships and family structures across Quebec since the 17th century using digitized marriage certificates, birth and death records, and other vital event documents.
From some 3 million records, the team had access to genealogical data for as many as 17 generations and around 3.4 million individuals, including 2.7 million individuals with associated geographical information. The analysis also relied on genotyping data from a Quebec regional population sampling project and on Illumina array-based genotypes from 11 individuals with CAID and one individual carrying one copy of a CAID-causing mutation, found in the SGO1 gene.
With this approach, the researchers tracked the CAID allele back to two families containing as few as five founder individuals. Based on estimated CAID allele frequencies in the current Quebecois population, which varied by region, they predicted that as many as one in every 24,025 individuals in the province are asymptomatic carriers of the CAID mutation.
"By identifying regions with high predicted carrier rate, ISGen provides useful information for the most efficient extension of [rare genetic disease] screening programs," the authors wrote. "Where genealogies are available, the … sampling scheme presented here represents a simple way to estimate regional carrier rates, without going through the time- and resource-consuming process of recruiting and genotyping individuals in each region."