Biomarkers crashed back to Earth this week as clinicians and regulatory specialists joined the usual crowd of academics and industry scientists for a sobering reality check at the IBC Biomarkers conference in Reston, Va. Speakers across the board warned of steep obstacles to bringing a validated, reproducible, effective, and FDA-approved proteomic biomarker test — diagnostic or otherwise — to the clinic, as they hashed out the nitty gritty details of sample collection and handling, statistical analysis, identification of spectrum peaks, and government regulations.
“This is going to be a very slowly developing process,” Joseph Hackett, associate director of the division of clinical laboratory devices at the FDA, said during a panel discussion at the conference. “We don’t have all the answers. We don’t even have all the questions.”
At the meeting, the new tone toward biomarkers was perhaps most in evidence when Lance Liotta of the NCI-FDA clinical proteomics program came under quite a bit of fire. In his presentation, Liotta said: “At this time, [for] none of the runs in any of our collaborations can we say definitively that we know the protein behind the ion,” and that this was due to “very stringent requirements” that his group uses for identifying a protein. “Not until I can remove it with an antibody will I say I know it’s that protein,” he said. But at the conclusion of Liotta’s presentation, Eleftherios Diamandis of Mt. Sinai Hospital — who has been consistently critical of Liotta’s work with collaborator Emanuel Petricoin — challenged the group’s failure to publish identities for the peaks found in the original 2002 ovarian cancer dataset. “With all due respect, Dr. Liotta, if I discovered this two years ago, I would think, ‘My God, what are these’ … and I would share them with the scientific community,” he said. When Liotta replied that the group would not publish anything until they were absolutely sure of the results, Diamandis responded, “It’s been two years: Will we ever know what these peaks are?”
Many of the other presenters also stressed that it was absolutely necessary to identify the peaks in a proteomic biomarker pattern, seemingly abandoning a previous notion that perhaps pattern tests would be accepted without identification. Hans Voshel, team head of functional genomics at the Novartis Institute for Biomedical Research, said that identification was necessary to assure drug developers that patterns did not simply represent inflammation markers or other less-than-meaningful molecules. “The idea of an [unidentified] ‘protein signature’ in a pharma company is not popular,” Voshel said. Sam Hanash, president of HUPO, also minced no words. “To us, it’s critical that we identify these markers, not only because of scientific importance but because if you know what they are, you have literature out there that can either corroborate your data or cast doubt,” Hanash said.
During a closing panel discussion, Scott Patterson, who recently returned to Amgen as director of early development (see PM 11-14-03), summed up this trend by asking whether there was anybody in the conference hall who thought that it was not necessary to identify the components of a proteomic pattern. There was no response.
As speakers explored the factors that can cause a promising experiment to fail, imperfect sample handling and misleading statistics topped the list, with a few speakers alluding to the results of a paper published in last month’s Bioinformatics by a group at the M.D. Anderson Cancer Center — which showed that spectral noise may account for many of the differential patterns presented in Petricoin and Liotta’s original Lancet paper — as examples of how artifacts arising from sample processing can be mistaken for biomarkers. “Just because bioinformatics can separate out two populations doesn’t mean that it can give you the right diagnosis,” Daniel Chan, director of the biomarker discovery center at Johns Hopkins Medical School, said during his presentation, in reference to the paper.
At the heart of this problem, many said, was the fact that overtrained algorithms can sometimes make messy data look too clean. “Statistics cannot make a marginal test good,” George Klee, professor at the Mayo Clinic Laboratory of Medicine and Pathology, said. “I caution that we need to know about the effect of all the factors on the algorithms, or small drifts in one or two analytes can blow these apart. You can have it working very nicely on a controlled population, and then you turn it loose on a clinic and you have all kinds of problems.” Klee gave an example of how even a 1 percent error in one of the values entered into a simple prediction formula to determine which fetuses are at risk of certain birth defects can increase the number of false positives that the formula produces by several percentage points. His conclusion: Find protein patterns, but find ones with limited numbers of easily controllable analytes.
The FDA’s Hackett echoed this suggestion in a later conversation with ProteoMonitor, but for a different reason: Since he expected that FDA approval for a pattern biomarker test would likely depend upon the identification of the proteins that made up the pattern, he said, “that would mean a panel of four would be a better than 20, right?”
Liotta later made reference to the M.D. Anderson paper by emphasizing that his data had been conducted with extensive QA/QC controls and was not biased “as far as we can tell.” “I want to emphasize that everything was done through blinded studies,” he said.
Liotta added that he had a paper currently under review for publication that compared the performance of three different algorithms — including those created by Correlogic and by Large Scale Biology (see PM 7-18-03, 12-19-03) — in evaluating the same ovarian cancer data set, and that he found similar results using each one. Correlogic is planning on releasing a homebrew ovarian cancer test — which does not require FDA approval — sometime this year (see p. 9). Sudhir Srivastava, chief of the cancer biomarkers research group at the NCI, said that the NCI’s Early Detection Research Network is also working on a study that directly compares the SELDI and Q-STAR-based mass spec analysis of markers. “We have to have that for FDA submission — we have to demonstrate that month to month, instrument to instrument, we have reproducibility,” Liotta said.
Collecting and Storing the Goods
Chan and other presenters pushed to bring sample collection and handling into the forefront of the conference discussion. As Kevin Krenitsky, senior vice president and medical director of Genomics Collaborative, put it, “It may not be the most sexy part of biomarker discovery, but the bottom line is, if you don’t do this properly, you’ll never get the results you want.”
Krenitsky went through the details of how to properly collect and store a sample, stressing the importance of avoiding freeze and thaw cycles, which can introduce changes in the proteome. He also said that post-mortem tissue — which many proteomics scientists probing organs such as the brain are forced to work with — was very unreliable due to a lesser ability to control for time frames, and a common inability to get informed consent from patients. Hanno Langen, head of the proteomics initiative at F. Hoffman-La Roche, also warned about using post-mortem tissue, showing that after freezing rat brain tissue at 4 degrees after six hours — “the best [time frame] you can hope for with human brain” — many obvious changes were already visible on a 2D gel.
Srivastava further addressed an issue that some suggested could have led to misleading data in the Lancet study. “Many times investigators use convenient samples sitting in reference areas, but non-cancer control spectra can differ from the population,” he said.
Wooing the Powers that Be
Krenitsky also urged attendees to keep in mind ethical considerations and legal issues when collecting samples. “Every pharma and biotech wants to make sure that ethical standards are taken care of, so that at the end of the day, their IP is their IP,” he said. John Janik, of the Center for Cancer Research at the NCI, later reminded attendees of the need to get IRB approval to work with tissue and serum samples. Although there are no particular regulations currently in place for the use of tissue samples for proteomic studies, Janik said he expects “there may be in the future.”
Similarly undefined are regulations guiding the FDA’s decision-making process in approving a potential proteomic pattern-based biomarker test. Steven Gutman, director of the office of in vitro diagnostics at the FDA, and Hackett both repeatedly stressed the need for the FDA to work with researchers and industry in mapping out plans for crossing this developing bridge.
Neither last year’s guidance on the submission of genomic multiplex tests such as microarrays (see PM 7-11-03) — nor the more recent draft guidance on the submission of pharmacogenomics data that was released in November — specifically addressed proteomic data. The microarray guidance made some mention of potential applicability to protein arrays, while the latter guidance notably left the submission of proteomic data out of the guidance altogether.
However, that does not mean that the FDA does not intend to draft a later guidance that more specifically addresses proteins, Hackett told ProteoMonitor. “We’re taking this one step at a time,” he said, noting that those wishing to submit a biomarker test — including a pattern-based test — for approval in the meantime should refer to the general standards of proving safety and effectiveness. If these two standards can be met, consideration of approval will be given on the same basis as for any other technology. Still, Gutman noted that biomarkers do seem to represent another animal. “Biomarker testing is an interesting model — our intent is to do our best to ground [the process] in good science,” Gutman said on the subject in his presentation.
Taking direct advantage of the FDA’s invitation to work together, Liotta said in his presentation that he is in “very close step-by-step cooperation with Steve Gutman.” He said that the FDA has approved his plan to gather data comparing the efficacy of his most recent ovarian cancer patterns — obtained by using the ABI Q-STAR with a Ciphergen ProteinChip interface (see PM 7-18-03) — with CA125 in a retrospective test for the recurrence of ovarian cancer, with the intention to apply for approval of the test under the FDA’s 510k process. The 510k process deals with approval of tests that are comparable with existing tests — in this case, CA125. To gain approval as a 510k, tests need show “substantial equivalence” with previous tests. What does that mean for biomarker pattern tests?
“You know it when you see it,” Gutman said.