NEW YORK (GenomeWeb News) – New research is revealing — and starting to map — the remarkable structural variation in the human genome.
Collaborators from across the US used clone-based sequencing and microarrays to compare eight human genomes against a standard human reference genome. In so doing, they identified nearly 1,700 sites of structural variation — consisting of everything from copy number variations to inversions — and more than 500 regions containing new, previously unrecognized sequence. The results, generated as part of the Human Genome Structural Variation Project, appeared online today in Nature.
“The sequences we have identified range in size from a few thousand to hundreds of thousands of base pairs, and are not part of the published human genome reference sequence,” senior author Evan Eichler, a genome sciences researcher at the University of Washington, said in a statement. “This represents uncharted territory that can now be examined in more detail to determine the function of these new segments of the human genome with respect to disease and gene activity.”
“[W]e could not have found these differences without sequencing more human genomes from individuals of diverse ancestry to a high quality standard,” Eichler noted.
To date, most studies aimed at addressing genetic and genomic variation have focused on small genetic differences between individuals, such as single-nucleotide changes. But variations involving several or many base pairs actually seem to be more common than SNPs.
“The importance of structural variation to human health and common genetic disease has become increasingly apparent,” Eichler and his colleagues wrote. “However, only a small fraction of copy-number variant (CNV) base pairs have been determined at the sequence level.”
In an effort to get at such CNVs and other structural differences in typical human genomes, the collaborators — researchers from the University of Washington, Agencourt Biosciences, Agilent Technologies, Washington University Genome Sequencing Center in St. Louis, the NIH’s Human Genome Research Institute, the University of Wisconsin, the Broad Institute, and Illumina — constructed clone-based maps of eight human genomes.
First, they made whole genomic libraries containing roughly a million clones for each of the eight individuals — four of Yoruba Nigerian descent, two of Asian descent, and two of European descent. The samples were originally collected as part of the International HapMap project. Then, they broke up the genomic DNA into 40,000 base pair pieces, cloned them into fosmids, and sequenced 500 or 600 base pairs from the ends of each using Sanger sequencing.
When the researchers mapped the 6.1 million clones to a human reference genome completed in 2003, they found 1,695 structural variation sites: 747 deletions, 724 insertions, and 224 inversions. Roughly 40 percent of these consisted of copy number variations that had not been reported previously — and about half of the polymorphisms were found in several different libraries.
In addition, some 15 percent of the variable regions differed from the reference genome in five of the newly analyzed genomes, suggesting that, in some cases, the reference genome represents a less common form of the genome.
“We all know that there’s not one human genome — that there’s an incredible amount of variation,” lead author Jeffrey Kidd, a graduate student in Eichler’s lab, told GenomeWeb Daily News.
In many cases, structural variation seemed to correlate with high levels of nucleotide variation. For example, the researchers identified 15 regions with unusually high levels of nucleotide variation. One of these, a region on chromosome eight, also had one of the highest concentrations of structural variants.
Certain parts of the genome were also prone to specific types of variation. For instance, inversions seemed to be more common on the X chromosome than on other chromosomes.
In addition, the team identified 525 regions containing sequence insertions not found in the reference genome. Now that the clones containing new sequence have been mapped, the authors noted, these regions could potentially be incorporated into CNV and SNP genotyping platforms.
For this paper, though, the team designed their own oligonucleotide microarrays to look at the copy number variation in the newly-identified loci. Nearly half of the new sequences seemed to have copy number variation.
In the future, Kidd said the group plans to do similar analysis on ten or twelve more individuals to gather even more information about the extent of structural variation in the human genome. In the meantime, the authors suggested, the eight newly mapped genomes will likely serve as helpful references as more and more genomes are sequenced and analyzed.
“This map is a valuable starting point for researchers studying the normal patterns of structural variation and how differences in those patterns affect human health,” Francis Collins, director of the NIH’s National Human Genome Research Institute, said in a statement.
Sequence data for the eight genomes is available in the NIH trace repository.