Skip to main content
Premium Trial:

Request an Annual Quote

Whole-Genome Data From Multiple Populations Helps Researchers Home in on Causal Variants

COLD SPRING HARBOR, NY (GenomeWeb) – Combining whole-genome data from multiple populations can help resolve causal variants from within expression quantitative loci (eQTL), Stanford University's Marianne DeGorter said during a presentation Wednesday at the Biology of Genomes meeting.

"It is difficult to distinguish causal variants from multiple tightly linked variants," she noted.

These days, DeGorter added, the wider availability of whole-genome sequences from multiple populations makes it possible to tease out causal variants based on differing linkage disequilibrium patterns among various populations. Further, many of the variants she and her colleagues found often overlapped with transcription factor binding sites.

Using data from phase three of the 1000 Genomes Project along with gene expression data from a further 520 individuals from seven global populations, DeGorter and her colleagues identified potential causal regulatory variants shared among human populations.  She noted that about half of the eQTLs they found in each of those seven populations included blocks of tied variants. Some of those blocks, she added, contained hundreds of variants, all in high, or even perfect, linkage disequilibrium.

To break down these ties between variants and narrow in on the single best causal variant, DeGorter and her colleagues turned to sequence data from six human populations: Utah residents with European ancestry, Han Chinese, Gujarati Indians in Texas, Japanese, Luhya in Kenya, and Yoruban.

In the non-African populations, they found between 7 million and 8 million variants, while they uncovered 11 million variants within the African populations, and tested these variants for association with gene expression. In each population, they found about 2,800 genes that contained regions associated with gene expression or eQTLs, and 211 genes containing eQTLs in all the populations.

In non-Africans, they noted an average of six tied variants per eQTL, while Africans had about four tied variants per eQTL.

Then using ENCODE data, they gauged the likelihood that a given variant would have a functional consequence, and after fine mapping, the percentage of tied variants per eQTL dropped.

The addition of data from African populations to data from other populations was best able to help identify functional variants, DeGorter said, adding that the high genetic diversity of African populations likely made data from those populations more informative. The addition of data from other populations also helped home in on causal variants.

For instance, about half of the eQTLs identified in the Chinese population were tied variants, and that portion decreased slightly by adding in data from the Japanese population. Still, the addition of data from the more distantly related Luhya reduced the number of tied variants to 22 percent.

Many of these variants are linked to transcription factor binding, DeGorter said. By searching eQTLs identified in the Utah population with variants within the NHGRI-EBI catalog, they uncovered 102 variants associated with those Utahan eQTLs, enabling them to zoom in on to the single best variants.

For example, an eQTL associated with ORMDL3 in European Utah residents had 13 tied variants in the eQTL. Using their fine-mapping approach and applying data from other populations, DeGorter and her colleagues uncovered a single best variant that, they noted, overlaps with transcription factor binding annotation.