NEW YORK (GenomeWeb News) – An international group led by researchers in China, the US, and Denmark reported in the early, online edition of Nature Genetics yesterday that they have sequenced the exomes of hundreds of individuals from a European population and analyzed the low frequency genetic variants within these coding sequences.
Using targeted sequence capture and high-throughput sequencing technology, the team generated sequence data representing coding portions of 200 Danish individuals' genomes. In the process, they found that low frequency genetic variants found in two to four percent of the population were more likely to be non-synonymous than synonymous changes — particularly when they looked at sequence data for the X chromosome.
"Even more mutations are segregating at very low frequencies than what we might expect," co-corresponding author Rasmus Nielsen, an integrative biology and statistics researcher affiliated with the University of California at Berkeley and the University of Copenhagen, told GenomeWeb Daily News.
"There's an even stronger excess than what we expected of rare, non-synonymous mutations — mutations that change the protein," he added, explaining that such changes are less commonly detected amongst higher frequency variants because they've been culled from coding sequences through negative selection.
As part of a larger exome sequencing study looking at metabolic disease and diabetes, Nielsen and his colleagues did exome sequencing at intermediate coverage — a strategy intended to complement lower coverage, whole-genome sequence data generated by members of the 1000 Genomes Project and others, as well as high coverage sequence data from studies of individual genomes.
"With this intermediate design between low-pass population sequencing and deep individual sequencing, we aimed to derive a high-resolution allele frequency spectrum of [coding SNPs] … to characterize the distribution of allele frequencies in a human population and to use this distribution to make inferences about the effect of natural selection in the human genome," Nielsen and his co-authors wrote.
The team used NimbleGen 2.1M exon capture arrays to collect around 34 million bases of DNA from each of 200 Danish individuals — corresponding to 18,654 coding sequences, along with bits of related untranslated and intronic sequence.
Collaborators at BGI-Shenzhen sequenced the exomes to about 12 times coverage, on average, using the Illumina Genome Analyzer II.
When they began parsing this exome sequence data, the team found 121,870 SNPs, including nearly 54,000 previously unreported variants.
Of the 53,081 coding SNPs they detected, they found that 25,275 were synonymous, meaning they do not change the amino acid output of the code, while 27,806 were non-synonymous.
Among the low frequency alleles — those found in two to four percent of the population — the proportion of non-synonymous changes was even higher, showing up 1.8 times as often as synonymous changes.
This pattern likely reflects the fact that non-synonymous mutations have been largely weeded out of coding sequences through purifying selection, Nielsen explained, but are still found at low levels in the population, showing up more and more often as allele frequency declines. This was especially true of mutations expected to have the largest effects on amino acid sequence, he added.
The researchers' findings also indicate that the difference in allele frequency between non-synonymous and synonymous mutations is more pronounced among the low frequency variants found on the X chromosome, consistent with recessive inheritance of these alleles.
"There's a bigger difference in allele frequency on the X chromosome than on the autosomes," Nielsen said. "So we see a much stronger effect of selection on the X chromosome than what you see in the autosomes — that's our hypothesis for how we can explain this difference in allele frequencies."
The team is continuing to do exome sequencing within the Danish population and is starting to look at even larger datasets, Nielson noted. They plan to generate exome data on some 2,000 individuals within this population, he said, which can be integrated with information on genetic variation patterns being detected by other research groups.
"Future analyses of non-coding regions and ethnically diverse samples will help build a complete picture of human genomic variation and an understanding of the interaction between genetic drift, mutation, recombination, and selection in the human genome," the team concluded.