Skip to main content
Premium Trial:

Request an Annual Quote

Counting on Copy Number


Copy number variation broke onto the scene four years ago, when what was once thought to be an occasional genomic glitch turned out to be an incredibly common mechanism of DNA variation.

Charles Lee, an associate professor in the pathology department at Harvard Medical School, worked on one of the earliest studies revealing this phenomenon. A research project involving 39 healthy control patients showed a significant number of "gains and losses which we weren't expecting to find," Lee says. "We thought it was artifacts, [but] when we started validating them, they weren't artifacts. They were true gains and losses."

The real surprise was how much of an impact these variant regions were having. "What we did not anticipate is that copy number variations are so abundant that they affect [a] greater number of bases than single nucleotide polymorphisms do," says Victor Guryev, a member of Edwin Cuppen's lab at the Netherlands Institute for Developmental Biology. Today, scientists posit that variation in copy number has even more of a role in disease association than SNPs do.

Basic biology

Much of the research into CNVs in these early days is simply geared toward understanding what these elements are doing, how they came to be, and how it's possible for organisms to have such different copy numbers and still appear the same.

Matthias Platzer at the Leibniz Institute for Age Research and Fritz Lipmann Institute focuses on the 350 KB region at 8p23.1 in the human genome, a cluster of well-researched defensin genes that have extraordinary range in copy number variation. "The range of variation is really huge, from two copies as a normal diploid genome to up to 12 to 14 copies of that entire region," he says. The question is, how can people vary by so many bases of sequence and still show no difference in phenotype?

An answer may lie in understanding how copy number variation happens, but that's elusive at this point, says Guryev. "Finding the molecular mechanisms responsible for CNV formation remains [a] challenge. Change in copy number is caused in various ways, such as non-allelic homologous recombination, non-homologous end joining, instability of tandem repeats, or transposition of mobile elements," he notes. "We still do not have a complete overview on how these mechanisms contribute to the diversity of structural genome alterations."

Some light has been shed on the issue by research such as Noah Rosenberg's, which focuses on CNV in human populations. Rosenberg, an assistant professor at the University of Michigan, has been studying populations globally to determine patterns of variation. "For the most part, the pattern of copy number variation in worldwide populations matches what we expect in terms of SNPs and microsatellites," he says. "That's telling us the history of copy number variants largely matches the human history as a whole."

Rosenberg's work also demonstrated evidence of natural selection at work on these variants. "We noticed that many of the copy number variants were rare," he says, indicating that "there's some negative selection operating against at least a reasonable fraction of these variants. Otherwise, some of these would be a bit more common."

Another piece of the puzzle was supplied by Harvard's Lee, whose involvement with the Structural Genomic Variation Consortium — a partnership with Harvard, Sanger, and Toronto's Hospital for Sick Children — led to research into population differences of copy number in amylase genes, which are involved in starch digestion. As hypothesized, populations with higher levels of starch in their diets had higher numbers of the gene. "That was the first time one of these CNV regions was shown to be under positive selection," Lee says.

Move to models

Of course, biologists are taking advantage of their go-to resource for better understanding bizarre events in the human genome: CNV research into model organisms is taking off. Lee says that the turn to animal models lagged; when he began to study variation in chimps and macaques, "I got a sense … that there was clearly a deficiency of work being done in other animals," he says. A recent paper from his team showed a first-pass look at these primates, demonstrating that copy number variation does indeed affect their genomes. As it turns out, Lee says, "when you have segmental duplication in the genomes of organisms, they do have the ability to foster the creation of copy number variants."

Following that, Lee's group has been delving into zebrafish, which also has segmental duplication. His results aren't yet published, but Lee believes the level of variation he's seen in zebrafish will be "a very eye-opening experience" for the research community. So much variation could play a significant role in the run-of-the-mill genetic experiments done on these organisms and will have to be controlled for, he adds.

Edwin Cuppen's group is using inbred rat strains to try to get a purer view of copy number variation. Evaluations of these regions in rat are still at a low-resolution phase, Cuppen says, but he believes the method of using inbred strains will remove a lot of the background noise that can't be controlled in most organisms. So far, he says, it's clear that copy number changes are responsible for "quite a few expression differences" in the organism.

Complex techs

Cuppen notes that copy number events are more complex than initially suspected. Copies don't appear faithfully and in whole; they can be duplications combined with inversions and small deletions, for instance, making them much more difficult to detect comprehensively. Because of that, Cuppen uses a number of technologies to study these elements, and says that just one platform isn't enough to track this kind of variation. His group uses standard array CGH technology with paired-end sequencing as well as optical mapping for the rat studies.

"A promising new technology for detecting structural variants is combination of paired-end mapping and next-gen sequencing," Guryev says. "However, it will require even higher sequencing throughput and price reduction before we can use it for such applications as diagnostics or association studies."

One challenge is that standard technology — generally speaking, array CGH — isn't precise enough to quantify copy number variation, says Liebniz's Platzer. "In these techniques you see just that there are more than two, or maybe four or five, and you have no information about the exact copy number," he adds. His team worked with MRC Holland to develop MLPA, or multiplex ligation-dependent probe amplification, a technology specifically designed for the defensin gene region he studies. MLPA increases probe density to get a high-res view of the region, and Platzer says that "from our point of view, this is the most quantitative approach at the moment."

Rosenberg says there's still a need for better quality control, especially to reduce false positives, and that technology needs to evolve to account for more complexity. His population study looked for five states of variation — homozygous or heterozygous deletion, normal, and homozygous or heterozygous addition — and he says current tools make it difficult to go beyond those states.

Link to disease

Whether technologies improve or sequencing gets cheap enough to enable whole-genome scans for copy number variation, the ultimate aim is the same: figuring out how these changes contribute to disease. Research has already shown that CNVs are tied to schizophrenia and autism, among other diseases, and scientists expect that trend to pick up steam. This "highlights the importance of copy number changes in disease etiology," Guryev says.

In this sense, CNVs are "like risk factors," says Lee. They may eventually help stratify patients to show which will respond better to one drug than another, for instance.

Lee believes that a connection to cancer is imminent. "I think we're going to find that there are some CNVs" that increase predisposition to cancer, he says. "I think it's coming right around the corner."

The Scan

Billions for Antivirals

The US is putting $3.2 billion toward a program to develop antivirals to treat COVID-19 in its early stages, the Wall Street Journal reports.

NFT of the Web

Tim Berners-Lee, who developed the World Wide Web, is auctioning its original source code as a non-fungible token, Reuters reports.

23andMe on the Nasdaq

23andMe's shares rose more than 20 percent following its merger with a special purpose acquisition company, as GenomeWeb has reported.

Science Papers Present GWAS of Brain Structure, System for Controlled Gene Transfer

In Science this week: genome-wide association study ties variants to white matter stricture in the brain, and more.