Skip to main content
Premium Trial:

Request an Annual Quote

Large-scale Genome Study Catalogs Variation in Dutch Population

NEW YORK (GenomeWeb) – By sequencing individuals from hundreds of Dutch families, the Genome of the Netherlands (GoNL) Consortium has started unraveling genetic variation and haplotype patterns present in the Netherlands.

As they reported in Nature Genetics this weekend, the researchers did whole-genome sequencing on individuals from 250 families with Dutch ancestry as part of the GoNL project. The millions of SNPs and small insertions or deletions detected in these sequences provided fodder for haplotype mapping in the population and provided a look at variants with possible disease roles or functional effects in the Dutch population.

The analysis also made it possible to see population structure and signs of past migrations in the Netherlands, the study authors explained, noting that the genetic data point to "multiple ancient migrations, consistent with historical changes in sea level and flooding."

Members of the GoNL consortium used Illumina's HiSeq 2000 instrument to do whole-genome sequencing on samples from 769 Dutch individuals, generating 13.3-fold coverage of each genome, on average. The study participants came from five biobanks in the Netherlands and included individuals from 231 parent-child trios and 19 quartets comprised of parents and their identical or non-identical twin children.

The team's analysis of these sequences uncovered 20.4 million SNPs — found at frequencies ranging from less than 0.5 percent to more than 5 percent frequency — and 1.2 million indels.

Many of the more common variants appear to be shared with individuals from other European populations. For example, the team detected some 98 percent of European HapMap2 variants in the Dutch genomes and around 70 percent of variants that were identified in European samples sequenced for the first phase of the 1000 Genomes Project.

On the other hand, just over one-third of the rare variants identified for GoNL coincided with those described in the 1000 Genomes assessments of European individuals. Moreover, the genomes contained 7.6 million SNPs not previously included in the dbSNP variant database.

The average Dutch genome contained a few dozen rare SNPs, small indels, or larger deletions that were expected to interfere with the function of the resulting protein, the researchers reported, along with around 20 variants implicated in disease risk through past association studies.

By focusing in on members of individual Dutch families, meanwhile, the team got a look at de novo mutations in children that weren't found in either of their parents — a set that includes more than 11,000 high-confidence de novo mutations so far. As in past studies, the likelihood of seeing de novo mutations in a child's genome appeared to ratchet up as his or her father's age at conception increased.

The genomes offered a broader look at the Dutch population, too. By looking at rare and common variant clustering and identity-by-descent patterns in individuals sampled in each of the Netherlands' 12 provinces, the study's authors saw signs of a genetic gradient representing movement of an ancestral population from the southern part of the country to the north. Though "different demographic scenarios remain plausible," they explained, "all support a model of substantial region migration."

Along with information already gleaned from the Dutch genomes, members of the GoNL consortium noted that the sequence data should also provide a resource for imputing variants present in the genomes of other individuals from the Netherlands.

Indeed, their preliminary experiments using information from 81 Dutch individuals sequenced by Complete Genomics indicated that the existing GoNL panel can pick up a significant proportion of masked variants not directly tested by the Illumina Human 1M genotyping array.

"As long as the cost of genotyping continues to be competitive with whole-genome sequencing, imputation will remain important," the GoNL study authors concluded. "The consolidation of available whole-genome data sets into a single cosmopolitan panel, including low-frequency, structural, and other complex types of variation, should therefore be considered a top priority."