NEW YORK (GenomeWeb) – Researchers from Kaiser Permanente Northern California and two regional academic institutions have published papers in Genetics on the genetic ancestry and telomere lengths of a cohort of more than 100,000 subjects. In a third paper published in the same journal, researchers also described the methods they used to genotype study subjects for other investigators who wish to use the data.
"For anyone analyzing this data from this cohort in any kind of association or mapping study, this information would be critical," Neil Risch, director of the Institute for Human Genetics at the University of California, San Francisco, and leader of the genetic ancestry analysis, told GenomeWeb this week. He described the three publications as "core" papers for future investigations.
The ancestry analysis of Kaiser's large, diverse Northern Californian cohort, conducted by researchers from UCSF, Stanford University, and Kaiser, will be particularly informative for future studies into relationships between genes and complex disorders. "A major direction for the field for the last decade is doing these genome-wide association studies," Risch said. "When you do such studies, population structure [and] heterogeneity of ancestry in the sample is a major issue that is important to address."
Using saliva samples from Kaiser Permanente Northern California's Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort, Risch and colleagues assessed the genetic ancestry of more than 100,000 study subjects, and compared how closely it matched their self-reported ancestry. Study participants could choose to self-identify from a list of 23 race/ethnicity/nationality categories, which were condensed into seven race/ethnicity groups.
Around 81 percent said they were white and 19 percent reported to be from a minority group; 94 percent indicated one race or ethnicity, while 6 percent associated with two or more groups. In the study, "there was a very high correspondence between self-reported race/ethnicity and what we find in terms of the genetic ancestry," UCSF's Yambazi Banda, first author in the ancestry paper, told GenomeWeb.
Population structural analysis is fundamental, Banda said, when accounting for the potentially confounding effects that environment or the genetics of an ethnic group can have on genotype-phenotype results in GWAS. When doing GWAS, researchers scan the genomes of people with the disease of interest (cases) and those without (controls), and compare the groups to identify regions of variation.
When GWAS reveals areas of variations, Banda said, the usual assumption is that these parts of the genome are associated with the disease. "But you have to remember that people from different ethnic backgrounds and different ancestries will differ at different parts of the genome. What may be happening when you see these huge variations between cases and controls is that these are actually people from different ethnicities," he said. "So, you're not actually seeing the difference in disease status. What you're seeing is the difference in ancestry."
Risch and colleagues' analysis agreed with what other groups have found, for example, that self-reported whites had genetic ancestry of mixed European nationalities. Other groups, such as those that self-identified as East Asian, had genetic clustering indicating they tended to marry in their own local community, while people who self identified as Filipinos had a "modest amount" of European ancestry. Self-reported African Americans and Latinos had "extensive" European and African genetic ancestry, while Latinos also had Native American genetic ancestry.
Researchers genetically identified more than 3,700 parent-child pairs and more than 2,000 full-sibling pairs in the cohort, and there was 93 percent and 96 percent concordance, respectively, between self-reported race/ethnicity and genetic ancestry. There were trends within the parent-child pairs suggesting a trend towards increasing exogamy, or marriage outside of local community, researchers concluded. "The presence in the cohort of individuals endorsing multiple race/ethnicity categories creates interesting challenges and future opportunities for genetic epidemiologic studies," Risch and his colleagues wrote.
According to Risch, studies like this will be particularly informative for large-scale efforts such as the Precision Medicine Initiative (PMI) launched by President Obama earlier this year. As part of this project, the NIH is putting together a cohort of 1 million subjects whose de-identified genomic, medical, and environmental information will be probed for insights into causes of complex diseases and leads for new treatments.
"One of their major issues that they're trying to address is the diversity of the sample," Risch noted. "A major question is how are we going to define that? … In many settings, that's based on how people self-report."
On a national scale, those involved in the PMI will deal with more complexity in terms of ethnic/ancestry self-identification, since people reporting multiple categories will vary according to region. "For example, people might self-identify as Latino in Northern California, but the proportion that's from Mexico or Central America would be very different here as compared to New York City, where there may be higher numbers of people from Puerto Rico and the Dominican Republic," Risch said.
The US population is a mixture of people whose personal ancestry goes back many generations in the US to more recent migrants. "How people self identify relates to that," he noted. In the latest analysis, for example, some people that researchers identified as Ethiopian, based on their unique genetic structure within the African population, said they were African, while others self-identified as African American. "How people identify here now is sort of a social issue," he said.
In addition to the ancestry analysis, researchers from Kaiser and elsewhere also published two other papers where they described some of the technologies used to analyze the cohort. Lampham et al. developed a high-throughput robotics system to analyze the length of study participants' telomeres — the repetitive nucleotide sequences that cap the ends of a chromatid and protect chromosomes from wearing out or getting mixed up with each other. Out of the 106,902 samples researchers assayed, close to 99 percent had sufficient quality for analysis.
Generally, telomere length tended to decline with age in the Kaiser study. But researchers also reported that age greater than 75 years correlated with longer telomeres. "We don't fully understand this at this point but we suspect this has to do with survivorship and longevity," said Risch, who was also an author in this paper. In terms of sex, he highlighted that females and males tended to have comparable telomere lengths up until age 50, but after that the women's telomere lengths tended to stabilize, while the men's continued to get shorter.
This analysis was performed by Elizabeth Blackburn's UCSF research group. Blackburn in 2009 won a Nobel Prize for discovering telomeres. The group has been looking for associations between telomere lengths and a variety of other aspects, such as mortality and behavioral characteristics, such as smoking and drinking. The latest publication, Risch said, is the first step towards further investigations in this regard. "It's a very large sample, so it gives you some confidence," in terms of the relationships researchers saw between telomere length, sex, and age, he added. The high-throughput robotics system completed the analysis in four months.
In a third paper, Kvale et al. described the analysis techniques and the quality control researchers employed in genotyping 103,067 out of 109,837 members of the Kaiser Permanente Medical Care Plan in Northern California. Researchers collected 140,000 saliva samples for testing over 32 months starting in 2008. The investigatos ran the assays constantly and had to develop new, real-time analysis methods in order to complete the genotyping during the funding period, one of the coauthors Pui-Yan Kwok at UCSF said in a statement.
They used four different Affymetrix ethnic-specific arrays to genotype Kaiser's ethnically diverse cohort. The project has generated more than 70 billion genotypes over 14 months, during which period researchers assayed 1,600 samples per week. Using strict quality metrics, the genotyping success rate in the study was between 92 percent and 95 percent across the four assays.
Ultimately Kaiser's cohort is a resource that investigators at UCSF, Kaiser and elsewhere can use to discover links between genetics, environment, and disease risk. "We've had many requests from people outside [of UCSF and Kaiser] for access to the data," Risch said. Researchers last year uploaded the data of participants in the Kaiser study who provided their consent into the NIH's database of Genotypes and Phenotypes.
UCSF and Kaiser partnered in 2009 to create the GERA cohort, which contains more than 100,000 saliva samples from the approximately 200,000 participants in Kaiser's Research Program on Genes, Environment, and Health. These 200,000 participants are members of Kaiser Permanente's Medical Care Plan and have agreed to give researchers access to their electronic medical records and answered survey questions about their health behavior and history. Within this collaboraiton, researchers have already identified genetic variants linked to allergies, glaucoma, macular, degeneration, diabetes, and high cholesterol.
In one recent example, a team led by UCSF's John Witte published a study in Cancer Discovery comparing genotypes of 7,800 prostate cancer patients against 38,600 controls. Witte and colleagues reported a new risk marker, indel rs4646284, at the 6q25.3 locus. Additionally, using a risk score derived from 105 SNPs associated with prostate cancer, researchers were able to estimate 7.6 percent of the disease heritability for non-Hispanic whites. The risk estimates for minorities in the study population need further exploration and there are many other genetic risk factors for prostate cancer that have yet to be discovered, but researchers still said their findings may inform the development of more precise diagnostics.
Experts in the genomics field have often cited the Kaiser project as a model for the NIH's PMI, which is still in the planning phase. "There's probably a lot to learn from us and some of the other large-scale efforts," Risch said.
In California, Governor Jerry Brown in April launched a two-year, $3 million project dubbed the California Initiative for Advancing Precision Medicine, which UCSF is leading. It's not yet clear how Kaiser might contribute to that effort, but Risch said there may be opportunities for collaboration.