NEW YORK (GenomeWeb News) – In a pair of papers appearing in the early online edition of Nature Genetics this week, two research teams have garnered information on copy number variant patterns in the human genome.
In the first of these papers, an international research team identified CNVs in dozens of individuals from three Asian populations. Using array comparative genomic hybridization and massively parallel sequencing, the researchers found nearly 5,200 CNV groups in the genomes of 30 individuals from Korea, China, and Japan — and among those more than 3,500 CNVs appear to be specific to Asian populations.
In the process, the team also came up with a strategy for finding absolute copy number values based on relative CNV data generated by array CGH and whole-genome sequence information.
"These common CNVs in Asian populations will be a useful resource for subsequent genetic studies in these populations," co-senior authors Charles Lee and Jeong-Sun Seo, based at Boston's Brigham and Women's Hospital and Seoul National University, respectively, and colleagues wrote, "and the new method for calling absolute CNVs will be essential for applying CNV data to personalized medicine."
In an effort to catalog CNVs in Asian populations, the researchers used array CGH with custom Agilent 24M arrays to assess genomic DNA from 10 Korean, 10 Japanese, and 10 Chinese individuals.
After filtering the CNV data and converting it to absolute CNV numbers through comparisons with read depth data on sequenced human genomes (a European genome and two Korean genomes), the researchers detected almost 21,000 CNV segments.
These CNVs included 5,177 CNV element groups for which the researchers determined absolute copy number. Of these, 3,547 CNV elements have been detected only in Asian populations so far.
For example, the team found that copy number gains detected tended to affect genes involved in processes such as nucleic acid metabolism and development, while losses frequently involved cell adhesion and other genes. Both gains and losses were associated with genes in immunity and sensory perception pathways.
In their follow-up experiments, the team used a custom CNV genotyping array to evaluate CNV patterns in 13 individuals from three generations of the same Asian family. They also compared their results with previously published CNV data generated using NimbleGen arrays.
"To more accurately apply CNV research to personalized medicine, copy number genotyping must not rely on relative copy number data, but should be able to identify the absolute copy number state in any given individual," the researchers concluded. "Our results also provide guidance for future studies in genomic medicine in the Asian population, especially those that identify ethnic differences in predisposition to disease and drug response."
Meanwhile, Wellcome Trust Sanger Institute researcher Matthew Hurles led a group of researchers who used high-throughput sequencing of DNA captured by targeted hybridization to pinpoint hundreds of CNV breakpoints in genomic DNA from three individuals.
"Mapping CNVs to base-pair resolution allows precise annotation of function, including whether each CNV overlaps functional sequences and the likely impact on those sequences," Hurles and his co-authors wrote.
"[B]ase-pair resolution enables the development of breakpoint-specific genotyping assays, which, by virtue of their qualitative nature, are likely to be more robust than quantitative assays for the same variants and thus more useful in locus-specific population surveys, such as association studies," they added.
For that study, researchers captured pooled DNA from three individual genomes using a custom NimbleGen 385k array targeting 1,785 CNVs larger than 400 base pairs.
After weeding out false positive CNVs and targeted regions not containing breakpoints, the researchers were left with 1,067 CNVs with breakpoints that could be sequenced using the Roche 454 FLX platform.
By mapping hundreds of thousands of sequence reads to the human genome, the team tracked down breakpoints for 205 CNVs. To hone in on even more breakpoints, they developed a mapping method that relied on the BLAT alignment algorithm.
Together, the approaches yielded information on 324 CNV breakpoints, including 315 deletion breakpoints.
From there, the researchers began tracking down characteristic sequence signatures associated with the CNVs, identifying four different CNV breakpoint groups.
For example, the researchers found microhomology at most deletion breakpoints. And about a third of the deletion breakpoints contained inserted sequences ranging in size from one to 367 bases.
Nearly five percent of the breakpoints they found were more complicated, leading the team to speculate that these may have arisen through a process such as replication-based strand switching.
The researchers cautioned that other mapping strategies may turn up even more breakpoint sequences and classes. And, they explained, even with the breakpoint data available now, some questions remain about the causes of some CNVs — particularly complicated rearrangements.
Even so, those involved in the study noted that the targeted approach appears to be a useful alternative to more expensive analyses of CNVs based on whole-genome resequencing data.
"As the technology matures, targeted resequencing could be the gold standard for validation in CNV studies," the team wrote.