Skip to main content
Premium Trial:

Request an Annual Quote

TOOLKIT: SUNY Team Hopes to Improve Cancer Diagnostic with New Algorithm

Premium

A team of researchers at the US Food and Drug Administration, the National Cancer Institute, and startup Correlogic Systems made a splash nearly two years ago when they published a paper in the Lancet describing a bioinformatics method for detecting ovarian cancer based on protein patterns in the mass spectra of blood serum [BioInform 02-11-02].

Since that time, the promise of an inexpensive blood test to screen for early-stage cancers has spurred the FDA/NCI/Correlogic collaborators to refine their methodology, and has also attracted a growing pool of researchers willing to contribute to the process. Last week, a team led by biostatistician Wei Zhu of the State University of New York, Stony Brook, described the latest improvement on the classification method in the Proceedings of the National Academy of Sciences [PNAS 100(25): 14666-14671].

The key differentiator between the SUNY method and previous approaches is in its accuracy. Correlogic’s Proteome Quest algorithm, which correctly identified 50 cancer cases in 116 masked blood samples in the 2002 Lancet paper, has a sensitivity of 100 percent — that is, it is able to detect all the cancerous patient samples without missing any. However, the specificity of the algorithm is 97 percent, meaning that there is a 3 percent chance of a healthy patient being misdiagnosed as having cancer. That may seem close enough to perfect at first glance, but according to Zhu, even a 3 percent margin of error is unacceptable for ovarian cancer, which has a relatively low incidence rate — only one in 2,500 — “so if we wanted to use the screening test developed by the NCI/FDA group … for mass screening … for every correctly diagnosed cancer case, there would be 75 false positives identified at the same time,” she said. “For ovarian cancer screening, the specificity — the chance we find you normal — must be near 100 percent.”

Zhu and her colleagues demonstrate in the PNAS paper that their method achieved perfect discrimination between healthy and cancerous tissue samples — 100 percent sensitivity and 100 percent specificity — in two data sets with a total sample size of nearly 500 women.

The primary difference between that SUNY algorithm and other methods is that it searches the entire mass spectra to find those biomarkers most indicative of cancer. Correlogic’s Proteome Quest uses a “random window” approach to select differentially expressed biomarkers from small sections of the mass spectra, but, according to Zhu, this approach could exclude significant biomarkers because only a portion of the spectra is used. The SUNY method overcomes this limitation by screening the entire spectra at once to select those biomarkers that are significantly differentially expressed between the two data sets.

“You might think it would take longer for our algorithm because we look at each marker, but it doesn’t because we perform a very quick test on each marker to evaluate that marker, and after that we impose a threshold so that we only select those markers that are really different between the two groups,” Zhu said. “Then the subsequent analysis will only be performed on the smaller set, so it’s computationally actually quite efficient.”

Zhu said a typical test using the FDA/NCI data sets described in the PNAS paper takes “several minutes” for each subject, and can be performed on a typical desktop PC. However, the group is already modifying the algorithm to handle next-generation, higher-resolution mass spec data, which generates around 3 million markers per patient — around 200 times more than current methods. The FDA/NCI clinical proteomics initiative is currently generating a high-resolution data set using ABI Q-Star mass spectrometers, and Zhu said she would soon have access to the new FDA/NCI data set in order to evaluate the algorithm. In anticipation of the new data, which will be arriving on CD “pretty soon,” Zhu said that her group has already modified the algorithm’s running time to around 10 percent of what it was previously. Even with that improvement, the high-resolution data will still require around 20 minutes to analyze per subject, so Zhu said her team plans to access SUNY’s supercomputer to reduce the analysis time to 5-10 minutes.

Zhu said that her team is working with the FDA/NCI and three other groups “to produce the best algorithm possible” to develop into a marketable cancer diagnostic. “It’s possible that different algorithms have different advantages and disadvantages, so we may eventually combine the tools and overall take the best features from each one,” she said.

Emanuel Petricoin, co-director of the FDA-NCI clinical proteomics program, confirmed that his team is working with Zhu’s group, and agreed that more is better when it comes to improving the accuracy of the algorithms used for the final diagnostic tool. “Scientifically, we’re interested in using several different algorithms at once so we can find [consistent] features,” he said. “What we’re looking for is concordance.”

Zhu’s research team filed for a patent on the algorithm through the university, but she stressed that her goal “is not to make money for this,” adding that she would be pleased even if only elements of the method appear in an FDA-approved diagnostic. “My biggest wish is to make this available as soon as possible,” she said.

— BT

Filed under

The Scan

Transcriptomic, Epigenetic Study Appears to Explain Anti-Viral Effects of TB Vaccine

Researchers report in Science Advances on an interferon signature and long-term shifts in monocyte cell DNA methylation in Bacille Calmette-Guérin-vaccinated infant samples.

DNA Storage Method Taps Into Gene Editing Technology

With a dual-plasmid system informed by gene editing, researchers re-wrote DNA sequences in E. coli to store Charles Dickens prose over hundreds of generations, as they recount in Science Advances.

Researchers Model Microbiome Dynamics in Effort to Understand Chronic Human Conditions

Investigators demonstrate in PLOS Computational Biology a computational method for following microbiome dynamics in the absence of longitudinally collected samples.

New Study Highlights Role of Genetics in ADHD

Researchers report in Nature Genetics on differences in genetic architecture between ADHD affecting children versus ADHD that persists into adulthood or is diagnosed in adults.