Skip to main content
Premium Trial:

Request an Annual Quote

Ethnic Identity, Genetics Provide Correlated but Distinct Clinical Information

Diverse Crowd

NEW YORK – Genetically inferred ancestry provides increasingly sharp disease risk estimates but self-reported ethnicity cannot be discounted, according to recently published research from the University of California, Los Angeles.

The first batch of results from the UCLA ATLAS Community Health Initiative, published earlier this month in Genome Medicine, explored the relationships between self-identified race or ethnicity and genetic ancestry on the one hand and clinical phenotypes on the other hand among 36,736 individuals within the UCLA Health medical system, out of a target cohort size of 150,000.

Electronic medical records routinely include a patient's self-identified race or ethnicity (SIRE), which is used both for public health purposes and as part of an individual's inferred risk for certain diseases.

Using patient blood samples, the researchers genotyped participants with the help of the Illumina global screening array, then imputed single-nucleotide polymorphisms (SNPs) via the TOPMed Freeze5 multi-ancestry imputation panel. The study showed that within the ATLAS biobank, genetic ancestry and self-reported demographic information yielded distinct subpopulations.

For example, while study participants were easily clustered into the five continental populations designated in the 1,000 Genomes reference data (European, African, Admixed American, East Asian, and South Asian) based on their genotypes, many of the people assigned to these clusters did not identify as belonging to them.

Using a computational pipeline consisting of principal component analysis followed by identity-by-descent, the UCLA team further refined people in genetically inferred continental clusters into distinct subcontinental populations, such as West African, East African, and Ethiopian subgroups within the Africa cluster, and Japanese and Korean subgroups within the East Asian cluster.

"I think the study makes a good argument that utilizing the genetic information as a primary source, taking into account the different race/ethnic information, will allow for better medical treatment at scale," said Sultan Meghji, a professor of engineering at Duke University, who ran the Broad Institute's first commercial program for GATK and helped build the clinical cancer program for the government of Singapore.

Nonetheless, Bogdan Pasaniuc, vice chair of computational medicine at UCLA and the study's principal investigator, cautioned that moving away from the use of race in medicine and biomedical research entirely may not yet be practical. Despite being a social construct with "a long history of racism" rather than a biologically defined variable, concepts of race and ethnicity carry clinically relevant information concerning social determinants of health.

"They reflect different signals," Pasaniuc said. "Genetic answers reflect what happens at the level of one's genome, and then what individuals self-identify as could be used as a proxy for environment, diet, all sorts of other things."

Social determinants of health are the conditions in which people are born, grow, live, work, and age, and incorporate factors such as an individual's exposure to pollutants, air quality, and availability of dietary options, all of which influence disorders ranging from asthma and type 2 diabetes to Parkinson's disease and some cancers.

Also, social determinants and genetic factors often interact, especially in complex diseases such as Parkinson's and some cancers.

"You still need social determinants of health," Pasaniuc said. "Hopefully [though], they can be removed from the negative connotations of the past."

The degree to which clinicians and researchers use social determinants and genetics in their decision making, Pasaniuc suggested, may come down to a case-by-case basis. "It's a tough question," he said, "and I don't think we have an answer."

Highlighting how ethnicity and genetic ancestry can impact disease risk stratification, a 2011 study from St. Jude's Children's Hospital found that while people identifying themselves as African American or Hispanic had poorer survival rates in acute lymphoblastic leukemia, Hispanic children with ALL and at least 10 percent Native American ancestry, as determined by genetic ancestry, were more prone to relapse and needed more intensive therapy even when showing negative minimal residual disease.

The same group has since gone on to identify some of the genetic variants underlying their previous observations and further refined several molecular subtypes of ALL.

While studies such as those have demonstrated the clinical relevance of genetic ancestry, the less-quantifiable language of race and its associated social determinants of health impact healthcare policies that determine community-level interventions, such as better access to social services. These affect people's risk for disorders with strong environmental components, regardless of an individual's genetic susceptibility to them.

Pasaniuc described ATLAS as a way "to understand how genetics modulates disease risk in our patient population and hopefully, in the long run, gets back actionable information to clinicians and to the patients themselves."

Returning genetic information to individuals is one of the ATLAS team's future goals, in addition to increasing the study's diversity as the researchers build toward their target study size.

Within the UCLA Health patient population, approximately 65 percent identify as White, 5 percent as Black or African American, nearly 10 percent as Asian, less than 1 percent each as Native American/Alaska Native or Pacific Islander, and approximately 19 percent as another group.

"The UCLA team has done a terrific job of enrolling participants from the diverse community that seeks care at their institution in this biobank effort," said Michael Murray, a professor of genetics and pathology at Yale University School of Medicine, who has also conducted population genetics studies.

Nonetheless, Pasaniuc explained, this distribution is skewed somewhat by its location within west Los Angeles.

"We're still not there in representing other parts of LA," he said. "[UCLA Health] is just on the west side of LA, and the west side of LA is not as diverse. [For instance], it has fewer Hispanic/Latino individuals than the east side of LA."

Pasaniuc does not foresee any great difficulty in achieving the goal of increasing diversity further, given Los Angeles' generally high diversity and lack of an ethnic majority. Nearly 49 percent of Los Angelenos self-identify as Hispanic or Latino, 11.6 percent as Asian, and 8.9 percent as Black or African American. Furthermore, nearly 37 percent of Los Angeles residents were born outside the US.

He also plans to expand the ATLAS study beyond array genotyping and to eventually perform whole-exome sequencing, although the details of this aspect of the study are not yet finalized.

Finally, the team aims to generate polygenic risk scores for diseases across all five continental ancestry groups.

"One of the interesting bits of this entire study to me is that we can now test and try to calibrate whether the genetic predicting methods would work in our data or not," Pasaniuc said. "We can take a polygenic risk score for, let's say, heart disease, and we can ask in a research setting whether this polygenic risk score would have predicted [heart disease] in our patient population and whether that occurrence would stratify based on genetic ancestry or based on other social demographic factors."

ATLAS data is available to researchers through the online ATLAS PheWeb, which is currently in a beta testing format.