NEW YORK — A new approach taking age of disease onset and family history into consideration may boost the power of genome-wide association studies, a new analysis has found.
Most GWAS have relied on case-control data, which may not account for whether individuals have reached the age range when the disease of interest generally is diagnosed, or whether controls have a family history of the disease.
A Danish team of researchers has now developed a multivariate liability threshold model that extends an existing one that is conditioned on family history to also account for age of onset and sex. They proposed that these changes could improve power in GWAS.
As they reported in the American Journal of Human Genetics on Tuesday, the researchers applied their approach, dubbed LT-FH++, to both simulated data as well as data from the UK Biobank and the iPSYCH dataset. They estimated that LT-FH++ could improve the statistical power of GWAS by up to 61 percent, as compared to typical case-control approaches.
"As more genetic datasets with linked health records and family information become available, e.g., in large national biobank projects, we expect the value of statistical methods that can efficiently distill family history and individual health information into biological insight will only increase," senior author Bjarni Vilhjálmsson from Aarhus University and his colleagues wrote in their paper.
The LT-FH++ approach builds on the idea that each person has a certain liability for a disease and that once they pass a certain threshold that is determined by the sample or population prevalence, they are considered to be a case. In the previous LT-FH model, that liability was broken down into genetic and environmental components, where the genetic component can include family history. The new LT-FH++ approach personalizes that liability threshold based on the person's age, birth year — to account for cohort effects — and sex.
The researchers benchmarked their method against LT-FH and a case-control approach using simulated data. They found that over 10 simulations, LT-FH++ had power improvements between 34 percent and 61 percent over standard GWAS. By comparison, LT-FH had a power improvement between 14 percent and 54 percent. These power gain estimates, though, varied by sample size and completeness of family or age-of-onset information.
They further applied LT-FH++ to real data from the UK Biobank and the Danish iPSYCH register. Specifically, they conducted a GWAS of mortality with UK Biobank data using the LT-FH++, LT-FH, and case-control approaches. With the standard case-control approach, the researchers were unable to find any significant SNPs, but the LT-FH method uncovered two SNPs with genome-wide significance, one at APOE, which has been associated with mortality, and one at HYKK, which is associated with smoking behavior. LT-FH++ identified those two SNPs as well as eight additional ones, including near HLA-B, MYCBP2, and ZBBX.
Meanwhile, using the iPSYCH dataset, the LT-FH++ approach identified more genome-wide significant associations than the others across the disorders examined.
However, the researchers noted that for ADHD, there was little power improvement using either LT-FH or LT-FH++ instead of a case-control approach, which they said could be due to the underlying assumptions of the multivariate liability threshold model, such as that there is no environmental covariance between family members or that there are no differences in genetic architecture by age of diagnosis.
Vilhjálmsson and his colleagues added that their approach provided the largest power gains when cases were ascertained through downsampling and when prevalence was high. They added that it is also limited by access to detailed health register data , though they noted that the approach can be applied to individuals with missing or partial data and that prevalence rates could be obtained from national statistics.