Over the past year, a slew of papers emerged linking copy number variation to a host of diseases. It's no wonder that studies linking insertions and deletions to disease etiology continue to multiply. New systems biology tools, from high-density SNP and CNV chips to next-gen sequencing, are moving scientists not only into genome-wide CNV analysis, but also to performing CNV association studies to learn how previously undetectable CNVs might contribute to disease susceptibility.
"In 2008 there was a lot of progress finding CNVs in neuropsychiatric disorders," says Steve Scherer at the Center for Applied Genomics in Toronto, which hosts the Database for Genomic Variants. He led work in early 2008 that discovered a CNV at chromosome 16p11.2 to be associated with autism. While last year was a "watershed year" for schizophrenia, he says, more and more studies show that CNVs are involved in diseases as varied as cancer, macular degeneration, obesity, and others. "People are trying now to sort out what all this means. That will require looking at more samples and doing more phenotyping — looking at families and things like that," Scherer says.
In the lab
Variants that arise de novo, as opposed to inherited ones, made headlines recently in linkage studies involving schizophrenia, with a series of papers published in Nature showing that rare, large CNVs are implicated in the disease. The studies were the first to take a genome-wide look at CNVs for schizophrenia. In one, led by the International Schizophrenia Consortium, scientists used Affymetrix's Genome-Wide Human SNP 5.0 and 6.0 arrays to scan for SNPs and CNVs across the genomes of almost 3,500 patients and reported large deletions on chromosomes 15q13.3 and 1q21.1. A study from Decode Genetics also looked genome-wide across thousands of samples, and genotyping was performed using Illumina's HumanHap300 and HumanCNV370 chips to reveal three deletions at 15q13.3, 1q21.1, and 15q11.2. The first study also found that people with schizophrenia have more rare copy number variants, both duplications and deletions, than people without the disease.
The move to a genome-wide perspective is not confined to just one disease; it is now a new priority for the Wellcome Trust Case Control Consortium, a group of 50 research groups across the UK set up in 2005 to perform GWAS across a spectrum of diseases, including hypertension, type 1 diabetes, multiple sclerosis, and others. It began phase two of the GWAS portion last April and is currently carrying out CNV association studies on the nearly 20,000 samples tested in the phase one GWAS. In phase two, a further round of GWAS will look at SNPs and CNVs in almost 30 studies spanning 120,000 samples and 13 diseases.
CNV discovery has been given a boost by new, high-resolution arrays on the market. In fact, both Illumina and Agilent have recently launched high-resolution arrays for CNV discovery. Agilent launched its series of SurePrint G3 microarrays for CGH/CNV, which come in four standard formats: a single million-feature array per slide, a two array by 400,000 feature format, a four array by 180,000 feature format, and an eight array by 60,000 feature format. The company also offers a catalog CGH microarray for each format as well as custom formats. The goal of the single million-feature array is "to offer high-density coverage across the entire genome" for high-resolution CNV discovery, says Dione Bailey, Agilent's product manager of CGH and CNV microarrays. "The goal is to provide an unbiased, whole-genome coverage to allow customers to do CNV discovery and get a good idea what the CNVs are, what size they are, [and] where they are in the genome."
Agilent also introduced the 2x400K CNV catalog array, which is designed to cover the known CNV regions from the Center for Applied Genomics' Database of Genomic Variants. The array offers high-density coverage of coding and non-coding regions, and is good for genome-wide, high-resolution association studies as well as for discovery. "Many customers are interested [in] looking at those particular sets of CNVs to understand the full spectrum of genetic variation across their particular cohort, for example, as a first pass before [they] can start to understand the significance of these CNVs in certain diseases," Bailey says.
Illumina's Infinium HD Human660W-Quad BeadChip features 2.6 million genetic markers and targets more than 5,000 CNV regions in the human genome. It's based on the work of Scherer, the Sanger Institute, and Harvard Medical School/Brigham and Women's Hospital. The chip covers all the known common CNVs, so it could be used to screen for variants and to perform GWAS. "The idea is to incorporate these into association studies to see if they actually do lend susceptibility to various diseases," says Illumina's Dan Peiffer, product manager for the Infinium Genotyping Arrays. "The reason why we think [the chip] is so valuable is the regions aren't available on any other array right now, they're common ... [and] the regions are very small."
While CNV discovery primarily still uses microarray-based approaches, Scherer foresees a range of technologies being applied, from next-gen sequencing to qPCR. One challenge to using the chips for discovery is maximizing the calling algorithms. "We're still learning how to use these algorithms. I saw a lot of progress in the past year, but there's still quite a way to go," Scherer says. He adds that he will often run the same sample on different platforms and come up with different CNV calls. Issues include the actual probe design, probe quality, and background noise. "It's an imperfect science, that's for sure — but it's gotten a lot better over the last year," he says.
In the clinic
Clinical geneticists have known for a long time that structural changes in chromosomes play a role in disease, but the trick has been identifying them. While karyotyping and PCR are still mainstays in the cytogeneticist's toolset, array-based comparative genomic hybridization is quickly claiming a notch on the belt, thanks to improved high-resolution arrays.
"Over the past five years, especially over the past year, we're getting a lot more copy number variation data in normal individuals at a higher resolution and smaller CNVs," says Charles Lee, a cytogeneticist at Brigham and Women's Hospital and Harvard Medical School. "And that information is critical for anyone doing clinical genetic diagnostics ¬using array-based methods." Lee sees more and more clinical geneticists adopting array CGH for diagnostic purposes "because array CGH provides a better way to look for gains and losses that are pathogenic in an unbiased manner."
According to Ulrich Broeckel at the Medical College of Wisconsin, "What really took this field to the next level are these technological developments and new platforms to analyze much more comprehensively compared to how we did the science just a few years ago."
It's fair to say that cytogenetics has embraced arrays and genotyping chips to do genome-wide scans for CNVs, and many vendors have upgraded their array CGH chips to accommodate the increased interest from clinical diagnostics labs. These higher-resolution chips afford the ability to detect smaller rearrangements and improve the detection rate of clinically significant, or pathological, CNVs. "Where copy number variation has more of a research tone, cytogenetics has much more of a clinical tone," Illumina's Peiffer says. "We really see cytogenetics and copy number variation as two different markets right now; they're kind of starting to converge a little bit, but they're still pretty separate."
Recently, Illumina launched its Infinium HD HumanCytoSNP-12 DNA Analysis BeadChip, which contains nearly 300,000 genetic markers that target all known cytogenetic abnormalities associated with mental retardation, autism, and others. It can screen for disease SNPs, analyze structural variation, and identify copy-neutral loss of heterozygosity events such as uniparental disomy, which are undetectable on current array CGH products. "It's really designed to be an entry-level genotyping chip that provides fairly good coverage of the Caucasian and Asian populations for common SNPs," Peiffer says. "We see a lot of our customers doing their association studies for the first time using this product. At the same time it also provides essentially a picket fence across the genome for cytogenetic screening."
A major challenge for the field is distinguishing between CNVs that are pathogenic, those that are benign, and those that may be pathogenic, which Lee calls CNVs "of unknown clinical significance." Accumulating new data on these imbalances as well as running genome-wide association studies will help to sort out what it all means, in terms of disease etiology and susceptibility. "As we accumulate more CNV knowledge, how is that helping in diagnostics? Basically it's interpretation," Lee says.
Broeckel thinks that all the new CNV information will help cytogeneticists better diagnose syndromes for which a duplication or deletion is known but where the patient might not show all the symptoms; CNVs may also help in cases where the symptom or disease is unknown. For "patients where we don't really know what's going on," Broeckel says, "this [could] become part of a general diagnostic workup." To that end, he's using Affy's 6.0 chip to create a CLIA-certified diagnostic. "I see a tremendous growth potential here," he says. "I think we're really just scratching on the surface," adding that rare diseases and cancer will be next in line for clinical CNV testing.
Navigenics' Dietrich Stephan thinks it's early days for spotting clinically relevant CNVs on chips designed for disease susceptibility testing. Navigenics, like its competitors 23andMe and Decode, tests for common SNPs and bases those susceptibility ratings on SNP association studies. CSO Stephan says that while measuring copy number will be critical in disease testing, "there aren't that many copy number correlations out there yet and the ones that are there are either in cancers — so somatic tissues — or they are things you can't do anything about yet, like autism and schizophrenia." He also thinks that the technology needs to improve to accommodate the higher resolution needed to find what ultimately comprise the majority of CNVs in the human genome. "With only 2 million probes on the array, you don't have the density to find these very small copy number variants that we're seeing emerge now," Stephan says. Instead of chips, he predicts PCR and sequencing will be used to probe for those changes in a clinical setting.
Some of the most promising work to emerge this year has been associating CNVs with disease, and this has been helped along by the high-density chips that cover common CNVs. "The idea is to have very good and high-density probe coverage for all of the common CNVs that are annotated in the genome, so you could try and do genome-wide association studies using CNVs instead of SNPs," Scherer says.
As more CNV data comes in, especially high-resolution and population-specific, Harvard's Lee says that people will really start to see the potential of copy number variants as risk factors for common diseases. "For these genome-wide association studies, I know a lot of groups that are incorporating analysis of copy number variation. So it's not just SNPs, but also structural variants, copy number variants in their GWAS studies," Lee says. In the last couple of years, he adds, researchers have found 12 different associations for specific CNVs with increased susceptibility to certain diseases and "and I think we're going to see more of that coming out. These benign CNVs could have more subtle effects, and one of the subtle effects could be increased susceptibility to a certain disease."
Lee also sees studies combining SNPs and CNVs, so the technology must advance to accommodate that. "At this moment, if people want to get the most comprehensive amount of information for genetic variation, I tell them the best way is to run these two arrays, the SNP array and the CNV array, and then combine the data. It's expensive, you have to do double the amount of work, and it takes twice the amount of time — but you get so much more data."
Many also see a move from looking at de novo to familial, or inherited, CNVs. "The vast majority of risk is going to be heritable," says Navigenics' Stephan. The first thing Lee does when trying to determine whether a CNV in a patient is pathogenic is to determine if it's a new mutation. But inherited mutations play a role, too, and Lee expects there will be more and more studies linking these to disease susceptibility. "In certain neuropsychiatric disorders, there may be a higher burden of these rare familial copy number variants. It doesn't mean that if you have it, you're going to get the disorder, but if you have this rare variant, you're at increased risk for it."
Stephan believes upcoming research will focus more on smaller, rare variants associated with common diseases. "I think that you're going to find these types of variants attributed to all kinds of different diseases," he says. "I think the trend is going to be using much more high-density arrays to find those tiny copy number variants and then apply those arrays to every common disease."
While most people are still looking for unbalanced changes like CNVs, the horizon is widening. Because none of the arrays out there can find balanced changes like inversions, targeted approaches such as next-gen sequencing have to be used. Scherer thinks the field will see more of that as sequencing applications advance. "They're not as abundant as CNVs and I expect most often will not be as damaging, but they will be important to study," he says.