CHICAGO – A team led by scientists at Leiden University Medical Center in the Netherlands has developed a computational approach that boosts the prediction of drug metabolism from CYP2D6 genotypes in individual patients. However, they still have a long way to go before their method, which uses a convolutional neural network trained on complete sequences of the CYP2D6 gene to produce continuous-scale rather than categorical assignments, can be translated into clinical pharmacogenomics practice.
The researchers described their machine-learning algorithm and methodology in a proof-of-concept study published in Science Translational Medicine on Wednesday.
Typically, genetic biomarkers help clinicians manage drug dosing by classifying patients by star alleles to predict the activity of cytochromes P450, one of which is encoded by CYP2D6. "However, this approach leaves a large part of variability in drug response unexplained," the authors wrote.
The Leiden University team found that their new model was able to explain 79 percent of patient-specific variability in CYP2D6 activity for tamoxifen metabolism, whereas the star-allele categorization could only do so for 54 percent. Additionally, the algorithm, which was trained on data from 561 breast cancer patients who were prescribed tamoxifen, predicted metabolism of the drug based on previously uncharacterized combinations of variants.
They replicated their results in an independent cohort of patients treated with tamoxifen and in another cohort of patients treated with venlafaxine, an antidepressant.
"These results demonstrate the advantage of a continuous scale and a completely phased genotype for prediction of CYP2D6 enzyme activity and could potentially enable more accurate prediction of individual drug response," according to the article.
The authors said that pharmacogenomic guidelines historically have considered standard haplotypes and predicted phenotypes, with haplotype assignment based on star-allele numbers. For CYP2D6, nomenclature defined by the Pharmacogene Variation Consortium categorizes patients as poor, intermediate, normal, and ultrarapid metabolizers.
However, the current methodology based on categorization is an oversimplification that ignores variability and overlaps between phenotypes, according to the authors. They surmised that a continuously predictive model that relies on complete gene sequences and a neural network rather than star alleles would improve prediction of CYP2D6-related drug metabolism.
In other words, said co-corresponding author Seyed Yahya Anvar, a former senior scientist in human genetics at Leiden University Medical Center, while there is a defined standard of care in terms of how to manage tamoxifen based on common CYP2D6 variants, the categorical approach makes the standard incomplete. "It's all kind of black and white. You either have a functional allele or you don't have a functional allele," he said of the current method.
"That only solves part of the puzzle," involving only patients with known haplotypes, said Anvar, now head of data science at Okra Technologies, a UK-based company that develops predictive software to support the pharmaceutical industry. "We know that there's a lot more to it than what we currently can do in the clinic, and part of that is due to the limitations of the technology that's being used today."
Anvar said that genotyping CYP2D6 is rather complex, since it has a pseudogene, CYP2D7. "There is a lot of rearrangement that can also happen between the two genes," he said. He and his colleagues chose to work with a convolutional neural network because of this complexity. "Given the data, given the complexity, you need to find a methodology that would be able to accommodate complex relationships," Anvar said.
For example, in the context of drug metabolism, a neural network considers relationships between alleles that are fractional rather than linear. This, according to Anvar, allowed the researchers to start understanding the role of uncommon variants. "We could have a much finer grain of what even the classical star alleles contribute to the overall drug metabolism," he said.
For this experiment, the Leiden researchers sequenced the entire CYP2D6 locus in a single read with a Pacific Biosciences long-read sequencer. "We … know all the variants, regardless of whether they are common or rare, so we can look at the entire picture," Anvar said.
After processing multiple reads and filtering out PCR artefacts, the result is "really clean alleles," he explained. Algorithms developed by the Leiden team then translated the alleles to gene function and the metabolism of tamoxifen.
When patients are treated with tamoxifen, CYP2D6 converts desmethyltamoxifen to endoxifen, so the researchers measured levels of these compounds in the blood of patients to determine their ratio, and thus the efficacy of tamoxifen treatment. They then used their neural network to build a model that incorporates a continuous scale.
Victoria Pratt, director of the pharmacogenomics and molecular genetics labs at Indiana University School of Medicine and a past president of the Association for Molecular Pathology, called the idea of a continuous scale "an interesting concept," though a difficult one to implement. "I think from a practical clinical laboratory point of view, discrete values with distinct cutoffs are more implementable," Pratt said via email.
She noted that massive parallel sequencing is good at identifying novel variants. However, she said, "determining the pathogenicity of novel variants is still challenging and often requires multiple lines of evidence from in vitro studies to animal models," citing as an example guidelines developed by the American College of Medical Genetics. "Additionally, phasing of novel variants/alleles will improve phenotype predictions, which are starting to be resolved by long-read technologies."
Also, this concept still needs further research and replication. "It is not ready for routine clinical implementation," she said.
Anvar and his colleagues acknowledged the limitations of their work, including the fact that most of the subjects in their replication study came from Western Europe. He said that future research should include a more diverse cohort.
He also noted that the study was limited to two drugs — tamoxifen and venlafaxine — and CYP2D6 affects the metabolism of 25 percent to 30 percent of "commonly prescribed" drugs, according to the paper. For the algorithm to have broad utility, it would have to differentiate between multiple drugs, Anvar said.
"You truly would end up with a unified model where the drug is part of the equation, not just genetics, and then you would be able to probably put it in practice," he said.
The next step for the researchers is to broaden the research cohort, he said, and to develop a cost-effective CYP2D6 pharmacogenomic assay that would be easy to bring into the clinic for preemptive testing.