By Monica Heger
In one of the largest exome sequencing studies to date, Danish and Chinese researchers have sequenced 200 human exomes of individuals with Danish ancestry. The scale of the study allowed the team to uncover a somewhat surprise finding: an excess of low-frequency, non-synonymous variants relative to synonymous mutations. If confirmed, the results, published this week in Nature Genetics, would lend support to the rare variant theory of disease.
The study is part of a larger collaboration between the University of Copenhagen and BGI (IS 8/31/2010). This particular project falls under the umbrella of LuCamp, a five-year project funded by the Lundbeck Foundation that will use exome sequencing to study metabolic diseases such as diabetes, hypertension, and obesity, sequencing a total of 2,000 exomes. The project will also sequence the human gut microbiome in an effort to characterize its effect on metabolic and cardiovascular health.
The Nature Genetics study represents the first results of the LuCamp project, and also acts as a proof of principle for whole-exome sequencing studies of that scale.
"This kind of study has never been done before," said Stephan Züchner, director of the Center for Human Molecular Genomics at the University of Miami. "It seems to be very good news for the rare variant hypothesis, which argues that there are many different rare changes in the population that confer risk for certain diseases."
The results imply that "we all have much less fitness than we may have thought — that we're carrying around more deleterious mutations than previously believed," added Rasmus Nielsen, an associate professor of evolutionary genomics at the University of California, Berkeley, as well as a biology professor at the University of Copenhagen and a senior author of the paper.
The sheer scale of the study is what enabled the researchers to identify the rare mutations, Nielsen said. Other exome studies typically do not include more than 20 to 30 individuals, making it hard to detect rare mutations with high confidence.
The group used NimbleGen's capture array and the Illumina Genome Analyzer. For each individual, the group achieved, on average, 14-fold coverage of 95 percent of the exome.
They detected 25,275 synonymous SNPs and 27,806 non-synonymous SNPs. Just over 40 percent of the SNPs were novel.
For rare mutations — those with a frequency of between 2 percent and 5 percent — the researchers identified a 1.8-fold "excess" of deleterious, non-synonymous cSNPs over synonymous cSNPs.
"Although this excess is not incompatible with findings in previous studies, our study included a larger sample size and suggests that the excess of low frequency non-synonymous mutations predominantly comes from very rare mutations … and not from higher-frequency mutations," the authors wrote.
Because of their large sample size, the researchers were also able to characterize the impact of natural selection. To do this, they compared the distribution of allele frequencies among non-synonymous and synonymous SNPs. Synonymous SNPs followed an even distribution curve, which is what would be expected in the absence of natural selection, while non-synonymous SNPs showed a larger proportion of low-frequency alleles, suggesting that the alleles are likely deleterious, and indicating a "strong purifying selection," the authors write.
"The fact that we find so many mutations at low frequency lends support to the idea that these rare mutations are important," Nielsen said. "It also suggests that there is more selection acting on the human genome than previously believed."
One potential limitation to the method is the fact that the NimbleGen array tends to increase the number of variants called, compared to other platforms, Nielsen said. To reduce the number of false positives in their analysis, the researchers only included SNPs with a minor allele frequency greater than 2 percent.
Züchner added that because of this, the researchers may have missed additional rare variants. "One would expect, from the number of low variants that they found, that there would be even more, less-frequent variants," he said.
He added that the 1,000 Genomes Project, because of its scale, would eventually be able to provide support for this study.
The researchers will now try to validate their findings in case-control studies and "see whether these variations could explain any part of the 'missing heritability' in complex diseases, which is currently confusing the medical genetics field," Jun Wang, BGI's executive director and a senior author of the study, wrote in an e-mail.
He added, that for now, exome sequencing is preferable to whole-genome sequencing for large-scale studies "due to a significant lower cost, and because exonic variations are the most explainable genetic variations for now." Eventually though, "whole-genome sequencing will be the final approach in medical genomics studies."
Additionally, as part of the LuCamp project, the team will also conduct exome sequencing for disease cohorts of metabolic disease like obesity, diabetes, and hypertension, sequencing a total of 2,000 exomes, said Nielsen. The cohorts will all be of Danish background with well-characterized medical backgrounds. The project also includes sequencing of the gut microbiome. All the sequencing will be done at BGI, and Nielsen said the researchers planned to continue to use the NimbleGen capture array and Illumina platform for sequencing.
BGI is also working on a number of other large-scale sequencing projects, including the 1,000 Plant and Animal Reference Genomes, the 10,000 Microbial Genomes, and the 1,000 Plant Transcriptomes projects.
It is also collaborating with "clinicians and scientists worldwide to study various types of complex diseases and tumors," Wang added, including a recently announced collaboration with pharmaceutical company Merck (IS 9/21/2010). Exome sequencing will be the primary method used for the disease sequencing projects, but in some cases the researchers are using whole-genome sequencing, Wang said.