NEW YORK – Self-reported phenotypes in combination with more structured hospital diagnoses can boost the ability of biobank-based genome-wide association studies to uncover genetic associations, according to a new analysis.
Recent studies have begun using questionnaires or similar digital phenotyping approaches as a less expensive means of gathering phenotypic data. But researchers led by Stanford University's Manuel Rivas noted that it is unclear how well this approach lines up with more established phenotyping approaches.
Using data from the UK Biobank, the Stanford-led team analyzed the genetic effects uncovered by genome-wide association studies based on either self-reports or on disease diagnoses from hospital records. As they reported Thursday in the American Journal of Human Genetics, the researchers found that both approaches identify broadly similar genetic effects. Further, combining the two improved the power to tease out genetic associations within biobank data.
"Overall, this work demonstrates that digital phenotyping and unstructured phenotype data can be combined with structured data such as hospital records to identify cases for GWAS in biobanks and improve the ability of such studies to identify genetic associations," Rivas and his colleagues wrote in their paper.
To compare these approaches, the researchers conducted genome-wide association studies based on genotyping data from 337,199 individuals of European ancestry from the UK Biobank.
For 41 medical phenotypes like rheumatoid arthritis, Parkinson's disease, and diabetes, they divided biobank participants into cases and controls based on either their in-patient hospital records or questionnaire responses.
The researchers compared the two phenotyping approaches using a multivariate polygenic mixture model they developed that estimates genetic parameters like genetic correlation, polygenicity, and the scale of genetic effects from GWAS summary statistics.
For most of these phenotypes, the genetic correlations between the approaches agreed. For 21 of the 41 phenotypes, there was strong agreement of genetic effects between cases identified via hospital records or via questionnaire data, while another six had moderate agreement. The genetic correlation for asthma, for instance, was strong at 0.96. Other phenotypes like angina, hypertension, and glaucoma also had moderate to high genetic correlations.
Other phenotypes had more modest correlations under 0.8, including migraine, diabetes, and carpal tunnel syndrome. The researchers traced some of this disparity to the differences in case numbers identified through the two approaches. For migraine and diabetes, a greater number of cases were identified via questionnaire data than through hospital data, which may have affected the results.
Combining the two phenotyping approaches to use cases identified by both medical record data and by self-report boosted the power to detect both the risk-conferring and protective rare variants, the researchers found. They noted that combining the two approaches does not greatly increase the effect size estimates as compared to each approach individually, but does increase the power to identify genetic associations through the increased number of cases.
The researchers also examined a third means of ascertaining phenotypes, namely using family history of disease as a proxy. For the 15 diseases they analyzed, this approach also uncovered a high degree of genetic correlation, as compared to the other approaches. They noted that this GWAX approach did lead to a decrease in power to detect genetic associations, but that decline was offset by an increase in case size.
"This work demonstrates that power to detect genetic associations in population biobanks is improved by using diverse phenotyping approaches to improve the classification of subjects into cases and controls," they added.