Validating microarrays for use in clinical trials is, apparently, a time-consuming task.
Richard Hockett, senior clinical research physician, and group leader for genomic medicine for Eli Lilly, has been conducting a series of exercises to validate Affymetrix microarrays for use in clinical trials and so far, the process has outlasted the clinical trial that the effort began with, and at least one upgrade of the array products used.
Still, with three of the exercises completed, the group has generated some surprising insights, which Hockett described last week in a session during the Lab Automation conference in San Jose, Calif.
This study is important as it deals with some of the issues that microarray technology must overcome in order to earn US Food and Drug Administration approval for use in clinical settings. It’s not an easy bar to clear, as so well illustrated by Roche’s rollout of its initial AmpliChip products built by Affymetrix. The molecular diagnostics giant last spring introduced the CYP-450 AmpliChip and announced that it would be marketed as an analyte-specific reagent, only to run headlong into a regulatory barrier. Roche is now marketing the product for research use only and is doing its own preparatory work to submit the product for FDA approval this year.
Lilly is undergoing a process of clinical validation, which is also regulated. The tests conducted with the technology do not have to be approved, but the tools, and the protocols used, have to be able to withstand an FDA audit in order to be used in clinical trials in support of drug development, said Hockett.
“The standards for an FDA audit are very similar to what an FDA approval is, and that is where we have to have a very rigorous clinical validation of this technology,” he said.
Lilly’s efforts began with clinical trials associated with the drug Alimta, which just last week received FDA clearance for use with cisplatin, a standard chemotherapy agent, in a regimen for the treatment of malignant pleural mesothelioma. Lilly describes Alimta as a multi-targeted anticancer drug.
Overall, the validation process for microarrays is not something that is special, or different, than that used for any other technology, Hockett said.
“Every biomarker we develop at Eli Lilly or that we get from labs that have developed them, goes through the same thing,” he said. “You have to establish the parameters, the instrumentations, including compliance with CFR 21 Part 11, the electronic records, electronic signatures part of FDA requirements. You need to understand the variability of your entire process and procedure, the setting of standards, establishing of control parameters, which is [the] acceptance and rejection criteria of runs, and obviously, a completed validation document.”
The validation process, in this case, Hockett said, is one of adapting microarrays to the clinical setting and making them [identify] clinically relevant biomarkers.
“We are not questioning the science, nor the biological relevance of measuring gene expression in any of these systems,” he said. “What we are talking about by clinically validating is going through and rigorously defining the steps of the process so that we can prove on a single measurement, that that measurement actually is correct, not that that measurement actually has biologic or scientific validation.”
Hockett’s group established six validation experiments, of which two have been completed, a third is two-thirds completed, and the fourth has just started.
All six experiments were designed to use the Affymetrix Human Genome U95 GeneChips, although, later the next-generation U133 was integrated into the investigation, and a second experiment was added after the process began and reached the expected conclusion that CVs for the U133 chips were “roughly” the same for the newer models as those for the originals, Hockett said.
“I would be worried if it was different,” he said.
Hockett said the Affymetrix platform was the only one that could be considered for this effort as, at the time, it was the only microarray produced under the FDA’s approved good manufacturing processes, known as GMP.
The first experiment, and perhaps the most important in the process, Hockett said, was a system to test procedural variation, with the results being used to establish standard operating procedures.
The third experiment was designed to establish control parameters and is two-thirds complete, with one run of chips remaining to set control ranges, Hockett said.
The fourth experiment will examine variability at the end of the dynamic ranges of the arrays.
“We have a reasonable handle on what is happening down on the low end when the signal-to-noise ratio is starting to have problems,” Hockett said. “I am more concerned about the high end. [For example:] when do I understand when I am starting to top out the ability of Affymetrix to discriminate additional copies of [a particular] gene? There is not a good handle on that. So this dynamic range testing is really going to be more at the high range than at the low range and understanding when we have to say: That is above the limits of detection as a report-out for that particular gene.”
The trial under which this investigation was conducted was one involving breast cancer patients in South America, said Hockett.
Breast-cancer tumors on that continent average 10 centimeters in diameter, allowing investigators to collect enough tissue to conduct standard immunohistochemistry, histology, and pathology tests; quantitative PCR on targeted genes; as well as microarray profiling. Biopsies were conducted prior to treatment, 24 hours after the first dose of Alimta, and after three cycles of the drug.
“The one that we wanted to concentrate on was the baseline sample,” Hockett said. “What we are intending to do is try to understand and predict who is going to respond to that drug based on a signature that we would get either from quantitative PCR, or from a microarray. The end points of this trial were not molecular; they were clinical response and standard efficacy and safety.”
However, that was also a limiting factor as there was not enough tissue to procure the 200 to 300 samples required for the entire validation process, Hockett said. So, researchers obtained a leukemic cell line and grew 10(12th) to 10(13th) cells out of it, making an extract of approximately 1.5 liters to aliquot for sampling in the experimental processes.
“There was nothing magical about what we choose — we could have chosen any cell line that was reproducible and easy to grow,” he said.
The first experiment asked how many times each step in a microarray process — for example: extractions, cDNA syntheses, cRNA labelings, and chips — should be run to obtain a valid result with variability defined at each step, including that coming from technologists, and the fluidics workstations.
“I have to define what my process is going to be upfront — before I ever start,” said Hockett. “Every step of the way has to be defined, has to be not variable. I have to figure out how many times I have to sample the extract of the Total RNA; how many cDNAs, cRNAs, how many chips I have to put this on, all that has to be defined first. In order for me to choose each one of those steps, I have to understand where the variability within the assay resides. So if I make the choice, I know what I gain or what I give up by either adding or subtracting multiple samplings at any one step.”
This experiment was conducted with two technologists and two fluidics stations alternating on eight runs of eight arrays, with the same RNA input for each run.
However, nine chips were unusable as a result of a fluidic-station malfunction, and another one had high background noise that made it unreadable, said Hockett. Otherwise, if the software was able to align the grid, the chips were included in the analysis, whether they had low background or high background, said Hockett.
For Hockett, it was difficult to predict how much the variability among the technicians would impact study results.
“When we started this exercise, even people at Lilly were telling me: ‘Your analyst variability is going to be a fairly large component of what you see.’ Different analysts get different results. In reality, we didn’t see any of that at all and, in fact, the machine, and this is the fluidics station, also had almost no variability, or contribution to the overall percent CV. [Variability] was basically the difference in the chips, and the difference in the runs comprised the vast majority.”
So, is using more chips a key to try to improve on that answer?
“Adding multiple chips is only going to drive me from a 20 percent CV to a 12 percent CV, which for most clinical questions is not very important or relevant,” he said. “So, the overall gist of this from the experiment is that I can sample a tissue one time, do one cDNA, one labeling and put it on one chip and get an overall variability of about 20 percent. That is going to be very important because that is going to make the ease of doing this in the clinics much better.”
The overall percentage CV is not a lot different between experiments 1 to 3, said Hockett. Lot-to-lot variance in the chips was “quite small,” he added.
For the first experiment, the overall CV was approximately 20 percent, Hockett said. With two thirds of the third experiment completed, the overall CV is running about 25 percent.
The next experiments that Hockett would like to conduct would involve a more complex matrix, such as human tissue, and would involve using some 200 to 300 chips, he said. Also, he is seeking to address the issue of what constitutes a control.
“If I ran a breast cancer or colon cancer, I’m going to get a certain number of genes that are expressed within that tissue,” he said. “With a U133 that’s somewhere in the range of 10,000 to 12,000 genes, if I am going to make any clinical decisions off of any set or particular genes, I really have to have these controls.”
That means searching for a number of different cell lines that can be propagated that will have expression of most of the genes of interest for a clinical setting, he said.
Overall, the CV of this set of experiments, he said, is lower than foreseen.
“I expected it to be a bit closer to PCR, which is 35 to 40 percent for any clinical assay that I have ever done,” he said. “[But] this level of variability is sufficient to answer almost all of the clinical questions that I can think of asking.
“The greatest remaining challenge for this clinical validation of microarrays, in my mind, is the development of adequate control samples and agreeing and coming to grips with the FDA as to the number that need to be controls with any one clinical question that is going to be addressed. That is now going to consume our time before we are able to apply this to our clinical process.”
The conundrum unsolved by Lilly, and by the thousands of others looking at applying this technology into the clinical setting, is the data.
This process does not speak to the torrents of data that come from one slide, from one set of experiments.
“So, we can validate a system without having all of this figured out, and it still won’t be used clinically because we don’t know how to interpret it,” he said. “We are actually at that stage because there is very little understanding of how to apply 10 or 20 thousand data points to a single clinical condition.”