NEW YORK – By examining a diverse set of human genomes, researchers have uncovered more than 125,000 structural variants, including ones that may be medically important.
Researchers from the Wellcome Sanger Institute examined structural variation within the Human Genome Diversity Panel and, as they reported in Cell on Thursday, uncovered structural variants specific to some populations as well as ones that seem to have arisen in the human genome through introgression with archaic human groups. Some of these variants appear to affect immune-related genes and could influence disease susceptibility.
"By analyzing the genomes of understudied populations we've been able to find high-frequency structural variations not uncovered by previous large-scale sequencing projects," first author Mohamed Almarri from the Wellcome Sanger Institute said in a statement. "Several of these are in medically important genes that tell us how a population has evolved to resist a certain disease or why they might be susceptible to others."
Almarri and his colleagues sequenced the whole genomes of 911 samples collected from 54 human populations to an average 36X depth. After mapping the reads to the human reference genome, the researchers identified 126,018 structural variants.
Of these, about 78 percent have not been present in previous studies, which the researchers noted underscores the extent of unrecognized human diversity and the need to study underrepresented populations.
They further explored how common these structural variations were among global populations, finding that some were more common or even specific to particular groups. A malaria-associated deletion in HBA2 was nearly fixed — affecting 86 percent — of Lowland/Sepik Papuans, but was not found among Papuan highlanders. Malaria, the researchers noted, is present in the lowlands of Papua New Guinea, but not the highlands.
Meanwhile, a 14-kilobase deletion affecting MGAM was only found among the South American Karitiana population, at a 40 percent frequency. MGAM encodes an enzyme involved in the digestion of dietary starch, and this deletion likely inactivates the gene. The Karitiana population underwent a severe population crash, which the researchers said may have enabled this disadvantageous allele to become common.
Other structural variants appear to originate from archaic human groups like Denisovans who interbred with modern humans. Among Oceanians, they noted a duplication at chromosome 16p12 that is thought to be due to Denisovan introgression as well as deletion in the AQR gene that is found only among Oceanians and Altai Denisovans. This gene encodes an RNA helicase, which are often involved in detecting viral RNAs and mediating immune response.
Another deletion found only among the Surui and Pima population in the Americas and Neanderthals eliminates an exon in the MS4A1 gene, which encodes CD20, a B cell differentiation antigen that is involved in T cell-independent antibody responses. It's also the target of a number of therapies for B cell-associated leukemias, lymphomas, and autoimmune diseases, suggesting that therapies for these conditions developed in one population might not translate to others.
The researchers additionally identified instances of runaway duplications, including one at high levels among African populations, and to a lesser extent among Middle Eastern populations, that boosted the number of copies of the HPR gene that provides resistance to sleeping sickness.
"Structural variants are complicated yet very important functionally, evolutionarily, and medically," senior author Yali Xue, who recently retired from the Wellcome Sanger Institute, said in a statement. "The discovery of these new structural variations provides one of the richest resources of this kind of variation so far, which not only offers unique insights into population histories and improves the currently used human reference genome but will also substantially benefit future medical studies."