NEW YORK (GenomeWeb) – Johns Hopkins University researchers have developed a framework to assess methods that predict cancer driver genes.
There are numerous approaches to identify potential cancer driver genes based on cancer genome sequences, but as only a few driver genes have been confirmed as true drivers, there's no established standard to determine what is a real driver of disease and what might just be a passenger mutation.
As the Hopkins team reported in the Proceedings of the National Academy of Sciences this week, they developed a framework to appraise such methods that they applied to evaluate the performance of eight approaches for finding cancer driver genes. They found that some approaches appear to be better than others.
"Identifying the genes that cause cancer when altered is often challenging, but is critical for directing research along the most fruitful course," Bert Vogelstein from the Johns Hopkins Kimmel Cancer Center said in a statement. "This paper establishes novel ways to judge the techniques used to identify true cancer-causing genes and should considerably facilitate advances in this field in the future."
As they reported in PNAS, the researchers developed a five-part framework. It considers the given method's overlap with the manually curated Cancer Gene Census (CGC); its agreement with other methods; its observed versus theoretical P values; the number of genes it predicts; and whether its predictions were consistent when applied to random portions of the dataset.
They used this approach to evaluate eight different prediction approaches — 20/20+, ActiveDriver, MutsigCV, MuSiC, OncodriveClust, OncodriveFM, OncodriveFML, and Tumor Suppressor and Oncogenes (TUSON) — both on a pan-cancer dataset and on set of four cancers types with a range of mutation rates.
The Hopkins team developed the 20/20+ approach. They described it as a machine-learning-based ratiometric method that expands upon the 20/20 rule, which takes the fraction of inactivating mutation and recurrent missense mutations a gene of interest has into account.
Overall, the Hopkins team ranked 20/20+, TUSON, and OncodriveFML high across their framework using the pan-cancer dataset.
The 20/20+, TUSON, and MutsigCV approaches contained the highest fractions of predicted drivers that were also in the CGC, while 20/20+ also had the lowest difference between its observed and expected P values. The approaches with the greatest differences between their observed and expected P values were MuSic and OncodriveFM.
The number of significant genes these approaches predicted varied from MutsigCV's 158 to OncodriveFM's 2,600, and 20/20+ fell toward the low end with 208.
When the researchers repeated their analysis using data from pancreatic adenocarcinoma, breast adenocarcinoma, head and neck squamous cell carcinoma, and lung adenocarcinoma, they found that 20/20+ again had the smallest difference between its observed and expected P values, and that it and MuSiC were the most consistent approaches.
Because a number of the approaches exhibited a bit of a deviation between their observed and expected P values, Vogelstein and his colleagues suggested that these cancer gene driver prediction approaches typically don't account for background mutation rates and unexplained variability well and that that contributes to the number of false positives they report.
When they modeled the effects of background mutation rate on gene-driver predictions, they found that as the background mutation rate and number of samples increased, so did the number of expected false positives. For instance, at an intermediate background mutation rate of three mutations per megabase and high unexplained variability, the researchers estimated that some 1,000 false positives would be expected from 8,000 samples. This, they said, means that cancers with high background mutations rates — often ones linked to environmental carcinogens — are the most troublesome for driver prediction methods.
"Our conclusion is that these methods still need to get better," first author Collin Tokheim, a doctoral student at Hopkins, said in a statement. "We're sharing our methodology publicly, and it should help others to improve their systems for identifying cancer driver genes."
This framework, the researchers added, could be applied to evaluate any new cancer driver gene prediction method.