NEW YORK – A team led by investigators at Google Health AI has demonstrated the potential of using machine learning (ML) not only to gauge individuals' risk of chronic obstructive pulmonary disease (COPD) but also to focus in on new and known genetic contributors to the lung disease.
As they reported in Nature Genetics on Monday, Farhad Hormozdiari and his colleagues at Google and elsewhere began by training a so-called deep convolutional neural network model to distinguish between individuals with or without COPD based on self-reported COPD diagnoses, potential diagnoses gleaned from International Classification of Diseases labels, and raw spirogram lung function measurements.
"Utilizing deep learning on full spirograms can improve COPD detection and help reduce the number of COPD exacerbation events and COPD deaths through early detection and preventative treatment," Hormozdiari said in an email.
"A big challenge for managing COPD is a small yet extremely high-risk, quickly exacerbated subset of patients," he explained. "We believe longitudinal analysis and monitoring of the AI-derived risk/liability has the potential to identify those at the highest risk of exacerbation."
After showing that the ML approach could indeed distinguish between COPD cases and controls, the team came up with a corresponding ML-based liability score aimed at flagging COPD cases and identifying individuals at increased risk of requiring hospitalization for the condition.
The investigators validated known COPD-linked loci by combining the ML-based liability score with a genome-wide association study on more than 325,000 unrelated UK Biobank project participants of European ancestry with available genotyping and spirogram profiles, and highlighted 67 previously unappreciated associations. They went on to validate 38 of the new risk loci through validation testing with data from non-UK Biobank participants in the Global Biobank Meta-analysis Initiative, SpiroMeta, and International COPD Genetics Consortium.
"Using our ML model, we can rapidly and accurately detect an individual's COPD risk based on an individual's entire spirograms," Hormozdiari said, noting that the accuracy of the model led to "better discovery of associated genetic variants than existing metrics commonly used in COPD diagnosis."
The collection of new and known risk loci is expected to improve researchers' understanding of COPD biology, while also pointing to potential therapeutic strategies or drug targets. More broadly, though, the investigators expect that ML-based phenotyping strategies similar to that used in the current study may also boost genetic research on other conditions in the future.
"A key innovation in this work is that to train ML models we do not necessarily need to have access to high quality case/control labels and one can train these models with noisy labels," Hormozdiari explained, adding that ML-based phenotyping "is an extremely general method that can be applied to other diseases and disorders with access to high dimensional clinical data (e.g., spirograms, ECG, MRI, eye fundus images) with therapeutic implications."