Sequence first, genotype second. That is the message of a new study published last week in the American Journal of Human Genetics.
Led by David Goldstein, director of the Duke University Center for Human Genome Variation, the scientists sequenced the complete genomes of 29 people of European origin to assess the relationship between the functional properties of variants and their population allele frequencies.
The Duke researchers determined that the most common genetic variants in the human genome aren't the ones most likely to cause disease and posited that, in fact, rare genetic variants are more likely linked to disease. The findings prompted Goldstein to argue that sequencing-based approaches, rather than array-based genome-wide association studies, are more likely to identify the rare variants behind common diseases.
"I am entirely convinced that sequencing, which is becoming less expensive every month, will unlock a lot of the causes of genetic disease," Goldstein said. "What we can do clinically with that information will become the primary challenge," he said. "It may take sequencing thousands of patient genomes to track down the responsible mutations, but they will be found."
Lead author Qianqian Zhu told BioArray News this week that the study's findings suggest it would be "more reasonable to sequence a group of cases and controls first to identify potential causal variants, then genotype a large cohort using arrays to verify those findings."
According to Zhu, disease-causing variants are likely to be rare and not likely to be seen in many control samples. She noted that "most" genotyping arrays contain common variants. Therefore, causal variants are not likely to be on current chips. "You have to know the variants first," said Zhu, a postdoc in Goldstein's lab.
The Duke researchers' recommended approach differs from others' opinions on how such studies should be conducted. For instance, Stephen Chanock, chief of the Laboratory of Translational Genomics at the National Cancer Institute, suggested in an interview with BioArray News last month that targeted, sequencing-based studies were more likely to follow the current round of array-based association studies (BAN 3/1/2011).
A common argument from array vendors is that sequencing is too expensive and analytically burdensome to be adopted en masse by researchers seeking to identify causal variants in a statistically significant cohort of cases. This has prompted some, like Illumina CEO Jay Flatley, to declare that the next round of studies will be based on new chips containing rare variant content from sources like the 1000 Genomes Project.
The Duke researchers have long argued that array based approaches are limited by the fact that they do not represent rare variants very well and that many common diseases would require sequencing based approaches. In a PLoS Biology paper last year, they warned that such array-based studies would "hit a wall" because of their inability to identify rare causal variants (BAN 2/2/2010). The same group made similar arguments in a paper in the New England Journal of Medicine in 2009 (BAN 4/21/2009).
"Because the first round of association studies didn't find many causal variants, we began to realize that the real causal variants may be rare," Zhu said this week. "Overall, this latest study just gives more genome-wide evidence that rare variants are more likely to be in functional regions of the genome, that they are more likely to do something," she said. "That is our major point."
Zhu recognized that efforts like the 1000 Genomes Project have greatly expanded the number of rare variants that can be made available to researchers via current array platforms. But there is still risk that the causal variants of some diseases are not found by the 1000 Genome Project. She cited a recently identified rare variant that is very likely to be causal for sick sinus syndrome in Icelanders but was not in the 1000 Genomes database.
Despite the shortcomings of array-based approaches in identifying potential causal variants, Zhu acknowledged that sequencing is still expensive when compared to arrays, and that initial sequencing studies have to be done in a "small set" of patients and controls at the moment. Because of this, she advocates following on those initial findings with targeted, array-based genotyping in larger cohorts. "I think array technology still has value," she said.
As for what vendors hope will be a second round of association studies based on new chips with expanded content, Zhu said that these arrays may do a better job of uncovering disease-associated variants than first-generation GWAS. "For now, I think it's also a valid approach," she said.
[ pagebreak ]
Common Disease, Rare Variant
As Zhu and fellow researchers noted in the AJHG paper, the more common a variant is, the less likely it is to be found in a functional region of the genome.
"Scientists have reported this observation before, but this study is the most comprehensive effort to date using annotations of the functional regions of the human genome and fully sequenced genomes," Goldstein said in a statement.
Goldstein also said that "the magnitude of the effect is dramatic and is consistent across all frequencies of variants" his team looked at in the study.
"It's not just that the [rarest] variants are different from the most common, it's that at every increase in frequency, a variant is less and less likely to be found in a functional region of the DNA," Goldstein said. "This analysis is consistent with what appears to be a growing consensus that common variants are less important in common diseases than many had originally thought."
To learn which genetic variants were functional, the researchers looked at the regions in genes that make proteins and functional regions that influence the expression of proteins. "We also asked whether we could identify any patterns by examining variants in which the derived form was the most common form," said Goldstein. "That is the unusual case, because when there is a mutation that changes an allele from the ancestral allele, this is usually the more rare form in the population," he said. "If you do have a mutation that is beneficial, then the variant can increase in frequency and become the common one."
The researchers observed that there was no connection between the frequency of these particular common variants in a population and whether they were in functional regions of the genome or not.
According to Goldstein, the interpretation is that whenever a variant does become common, it does so precisely because it has no impact. "The bottom line is that we see common variants, as a rule, being neutral and not having effects," Goldstein said in the statement.
"There are some big exceptions to this rule, in particular in the HLA gene region where selection for resistance to infectious disease has resulted in many common variants with major effects," he said. "But in general in the genome, we see this as very much the exception."
Have topics you'd like to see covered in BioArray News? Contact the editor at jpetrone [at] genomeweb [.] com.