Skip to main content
Premium Trial:

Request an Annual Quote

Researchers Characterize, Control for Diversity of Hispanic Populations for Genetic Studies

NEW YORK (GenomeWeb) – A University of Washington-led team of researchers has developed a way to characterize and then control for the genetic diversity present in US Hispanic and Latino populations in genome-wide association studies.

Many Hispanic and Latino or Latina individuals have admixed genomes reflecting indigenous American, European, and African ancestors, though the proportion of genetic ancestry from these three ancestor groups varies within and among ethnic groups. To identify the genetic basis of disease within Hispanic and Latino/Latina populations, researchers have to contend with this complex admixture history and genetic diversity.

Using an iterative process to estimate population-structure principal components and pairwise kinship coefficients, UW's Cathy Laurie and her colleagues uncovered genetic differentiation both within and between six Hispanic groups from the Hispanic Community Health Study/Study of Latinos, as they reported today in the American Journal of Human Genetics. They then developed a clustering method to define genetic-analysis groups that reflected both participants' self-identified ethnicity and their genetic identification.

"These genetic-analysis groups are similar to self-identified background groups in that they share cultural and environmental characteristics, but they are more genetically homogeneous and include all study participants," Laurie and her colleagues wrote in their paper.

The researchers then added those genetic-analysis groups as covariates in their analysis of 22 biomedical traits to find that the groups accounted for a portion of trait variation they observed.

Some 12,800 participants from the Hispanic Community Health Study/Study of Latinos underwent genotyping. Nearly all the participants in the project self-identified as belonging to one of six background groups: Cuban, Dominican, Puerto Rican, Mexican, and Central or South American.

Using the genotyping data, the researchers simultaneously estimated principal components — reflecting population structure — and kinship coefficients —reflecting familial relatedness — using the tools PC-AiR and PC-Relate in an iterative manner. Five PCs, the researchers reported, revealed substantial genetic diversity within the six background groups.

For instance, participants who self-identified as belonging to mainland backgrounds, like Mexican and Central and South American, tended to have more Amerindian and less African ancestry than participants from Dominican or Puerto Rican backgrounds. However, the fraction of Amerindian ancestry in self-identified Mexican participants varies among the project's recruitment centers.

To control for that genetic variation, the researchers developed a multi-dimensional clustering method that defined hyper-ellipsoids that captured some 90 percent to 96 percent of the unrelated individuals belonging to a given self-identified background. These groups lack within-group genetic outliers and include all genotyped study participants, even those whose self-identified background group is missing or non-specific.

Though these genetic-analysis groups are highly concordant with self-identified background groups, the researchers noted that the genetic-analysis groups have greater within-group homogeneity.

"Here, we aimed for concordance that was high enough to retain cultural and environmental information but not so high as to retain extreme genetic outliers," Laurie and her colleagues wrote.

She and her colleagues also examined whether these genetic-analysis groups were associated with the 22 quantitative traits under examination in the Hispanic Community Health Study/Study of Latinos project.

Using the Akaike's information criterion approach, they gauged whether including genetic-analysis groups as a fixed effect in regression models affected the fit of those models. Adding in the genetic-analysis groups improved the fit of the regression model, the researchers reported.

This suggested to the Laurie and her colleagues that the genetic-analysis groups were capturing genetic information that's not encompassed by the first 20 PCs or were capturing non-genetic information associated with these traits, like cultural and environmental factors.

They noted, though, that the addition of either self-reported background groups or genetic analysis groups improved the model fit. However, they argued the genetic analysis group has the advantage of including individuals with missing or non-specific self-identified background groups.

According to the researchers, an approach like theirs could be applied to studies of other complex populations. "Our method for defining genetic-analysis groups is generally applicable to multi-ethnic populations in which genetic ancestry is associated with self-identified ethnicity," Laurie and her colleagues said.