NEW YORK (GenomeWeb) – Researchers with the Geisinger Health System, the University of Pennsylvania, and elsewhere have presented results from a large-scale phenome-wide association study (PheWAS) simultaneously spanning hundreds of diseases or clinical measurements.
Using electronic health record data for tens of thousands of genotyped participants in the MyCode Community Health Initiative, the team looked for variants coinciding with a broad range of disease diagnoses or clinical laboratory measurements. These apparent associations were assessed alongside results from related genome-wide association study data, when available.
Moreover, the researchers started digging into the risk variants implicated in the PheWAS and large-scale association studies published previously, searching for functional explanations for some of the associations.
"The comprehensive nature of this PheWAS allows for novel hypothesis generation, the identification of phenotypes for further study for future phenotypic algorithm development, and identification of cross-phenotype associations," senior author Sarah Pendergrass, a biomedical and translational informatics researcher with Geisinger Health System, and her colleagues wrote in a study published online today in the American Journal of Human Genetics.
For their analysis, the researchers considered 541 diagnostic codes from the International Classification of Disease (ICD version 9). Along with those binary diagnostic outcomes, they tracked continuous outcomes in the participants using average measurements across more than two-dozen clinical laboratory tests in Geisinger MyCode Community Health Initiative study participants.
Starting with data for more than 50,700 genotyped individuals, including 45,899 individuals genotyped on the Illumina HumanOmniExpress Exome bead chip array, the team focused in on 635,525 SNPs in 38,622 unrelated individuals with sufficient data quality.
"A PheWAS at this scale, where we computed a total of 343,819,025 associations for the diagnostic codes and 15,888,125 associations for the clinical lab measures, presented several big data challenges such as computational burden, high throughput result interpretation, and visualization of the results," the authors noted.
On the diagnostic code side, for example, the team narrowed in on more than 1,100 phenome-wide significant associations, including new and known risk variants for conditions ranging from diabetes to psoriasis, heart disease, and hypertension.
Another 3,024 associations reached phenome-wide significance for clinical lab measurements, the researchers reported, revealing thousands of SNPs linked to levels of bilirubin, blood glucose, and other traits revealed by clinical lab tests.
Along with analyses focused on genes closest to potential risk variants, the team considered potential sources of pleiotropy in the associations, while diving into the nature of these associations with functional, regulatory, and epigenetic data.
"Further, epigenomics knowledge of non-coding regions of the genome helped us to refine the genetic associations, to illustrate the biological relevance to the associated disease," the authors wrote. "With these results, we provide a landscape of associations across diseases and quantitative traits, a series of potentially novel associations, and cross-phenotype associations, all within the context of protein-coding and regulatory impact of genetic variants."