NEW YORK(GenomeWeb) – Copy numbers of certain elements in the human genome vary over wide ranges and exist in more states than can be accounted for with two-allele models. These multi-allelic copy number variation loci, or mCNVs, have proven challenging to study, and past work has yielded non-integer estimates of population-wide distributions of copy numbers.
A technical report published this week in Nature Genetics now describes a method to pinpoint the number of mCNV copy number variant states in a population using a next-generation sequencing-based computational approach validated with Bio-Rad's Droplet Digital PCR.
By analyzing 849 sequenced human genomes from the 1,000 Genomes Project, the authors discovered that about one third of the approximately 8,500 copy number variations in the human genome are multiallelic.
"mCNVs impact 231 genes, positively correlate with gene expression, and account for 88 percent of gene dosage variation between humans," Jen Berman, co-author of the study and a staff scientist at Bio-Rad Laboratories' Digital Biology Center told GenomeWeb in an email.
The research was part of ongoing studies in Steve McCarroll's lab at the Broad Institute and Harvard Medical School previously described by GenomeWeb. It may prove relevant to understanding human diversity as well as the genetic underpinning of certain CNV-related disorders. For example, the human amylase gene, or AMY, has five variants with similar sequences, making them a challenge to study in the past. Low copy numbers of AMY1, ranging between two and 14 copies, have been recently correlated with both increased BMI and fat mass in a study that also relied in part on ddPCR.
Other genes contained in mCNVs, such as HPR and ORM1, have disease associations, and thus properly documented mCNVs could enable studies of how these regions impact human phenotypes and disease, Berman said.
Genome-wide measures of high copy numbers have been more difficult to measure because intra-individual variability in DNA content at mCNVs is within the experimental noise of many approaches, according to the Nature Genetics study authors. Therefore, measures of mCNV copy number have previously been reported as continuously distributed.
To get at the bare-bones integers, the researchers adapted a Broad Institute-developed algorithm called Genome Structure in Populations, or Genome STRiP, to carefully normalized sequence measurements.
The authors found this approach accurate enough in preliminary studies to inspire a search of the entire human genome using overlapping windows of analysis to find regions where read depth was not unimodally distributed. Candidate CNVs were then mapped at higher resolution. "Such population-scale approaches became more powerful in the simultaneous analysis of many genomes," the authors claimed.
After determining copy numbers, the authors attempted to validate them by generating a false discovery rate via an intensity rank-sum test using data from Illumina Omni 2.5 and Affymetrix 6.0 SNP arrays. They also compared results to a previously published array-based analysis of 995 CNVs, and showed 99.9 percent concordance.
This later validation, however, was limited by the array study, which examined CNVs with copy numbers on the lower end of the spectrum.
To validate mCNVs with high copy number, the authors turned to droplet-based digital PCR. They selected 22 examples with high dynamic range, and used ddPCR to type them in 90 HapMap samples, and again found 99.9 percent concordance.
"Differentiating consecutive high copy number states, [for example] six versus seven copies, requires high technical precision not possible with arrays or qPCR," Berman said. "The only two methods available to robustly call consecutive high copy number states are NGS with these new algorithms, and ddPCR," said Berman.
In terms of the added value of ddPCR, George Karlin-Neumann, director of scientific affairs at Bio-Rad's Digital Biology Center, noted in an email that NGS association studies of tens of thousands of patient samples would be costly and time consuming, adding, "Other methods previously used for association studies, [for example] qPCR and microarrays, have been indecisive and controversial in their conclusions due to lack of precision."
Berman said she believes ddPCR will continue to be a valuable tool for researchers interested in complementing their NGS studies of mCNVs with "an orthogonal, low-cost, ultra-precise technique like ddPCR." Researchers focusing on a single mCNV locus might also imagine high-throughput disease association studies that would be enabled by the company's newly-launched automated droplet generator, Berman added. Droplet-based digital PCR is also yielding an increasing number of published studies, which may signal growing acceptance of this relatively new technology.