NEW YORK (GenomeWeb News) – Common copy number variants alone cannot explain the missing heritability in the human genome, according to a new study that mapped and characterized nearly all common CNVs in the human genome.
In a paper appearing online in Nature today, an international team led by investigators at Wellcome Trust Sanger Institute used tiling oligonucleotide arrays to uncover nearly 12,000 common CNV candidates from dozens of HapMap samples. From there, they developed arrays to assess CNVs in hundreds of other individuals from three ancestral backgrounds, generating genotype information for roughly 5,000 of the CNVs.
In the discovery phase of the study, the researchers used tiling oligonucleotide arrays to detect CNVs that were 500 bases or larger in the genomes of 40 HapMap individuals of European and West African ancestry and a European reference sample.
In so doing, the team identified a median of 1,117 CNVs in the European ancestry genomes and 1,488 CNVs in the Yoruban genomes. After bringing together data on all 41 individuals, the researchers noted, they found 11,700 potential CNVs. Of these, just under half were present in any given individual tested.
The team then worked with the Wellcome Trust Case Control Consortium to design an Agilent CNV-array for typing the variable regions of the genome. Using this array, they assessed CNVs in 450 HapMap individuals from three ancestral groups, identifying 4,978 CNVs for which they could generate high confidence genotype information.
Genotyping the CNVs should help other members of the research community better assess relationships between CNVs and other types of genetic variation, senior author Matthew Hurles, a researcher with the Wellcome Trust Sanger Institute, told GenomeWeb Daily News. And, he added, the resource should also aid in interrogating CNVs in disease association studies.
The CNV data is being made available through the researchers' web sites and through files accompanying the publication on Nature's website, Hurles noted. The team also is working with up and coming CNV databases to include this data.
Overall, the team believes they identified about 80 percent to 90 percent of common human CNVs larger than about 1,000 bases.
By looking at how and where CNVs form — as well as CNV population genetics — the researchers also came up with several conclusions about CNV biology. For instance, they identified dispersed duplications that appear to play a larger role in CNV formation than previously believed.
In addition, their results allowed them to estimate the mutation rate leading to common CNVs, Hurles noted, suggesting approximately one in 15 children has CNVs not carried by either of their parents.
Finally, the common CNV patterns suggest many of these variants are under strong negative selection in the genome, Hurles said, although certain CNVs in non-gene-coding regions vary by populations and may be under positive selection.
The team also concluded that common CNVs are not a major contributor to the complex disease and trait heritability not explained by genome-wide association studies. On the other hand, rare variants, including rare CNVs may account for some of this missing.
"Rarer variants seem like a productive way forward for us," Hurles said.
The team plans to interrogate even smaller variants in the genome down the road, though Hurles noted that identifying small and/or rare variants will likely require sequencing.
Meanwhile, in a review article in the same issue of Nature, an international group of genomics researchers discussed missing heritability in the genome more broadly, describing potential sources of complex disease heritability and methods for assessing them.
Although genome-wide association studies so far have turned up a slew of disease-associated variants, they noted, most explain only a fraction of complex disease heritability believed to exist.
The group discussed a range of related issues, including strategies for doing future GWAS as well as the role of rare and structural variants and environmental factors in complex diseases.
For her part, lead author Teri Manolio, director of the National Human Genome Research Institute's office of population genomics, told GenomeWeb Daily News that she believes researchers need to use all of the genetic approaches at their disposal — from GWAS of common variants to sequencing studies aimed at identifying rare variants — to understand complex diseases and their heritability.
"GWAS were initially designed to focus on the higher end of the frequency-effect size spectrum, so much work remains to be done, both in finding other variants in the lower frequency and larger effect domains … and in understanding their functional and pathophysiological properties," Manolio and her co-authors wrote. "The modest size of genetic effects detected so far … suggests that complex diseases will require substantially greater research effort to detect additional genetic influences."