Skip to main content
Premium Trial:

Request an Annual Quote

At ABRF Conference, Bioinformatician Warns of Statistically Unsound Experimental Procedures


Savannah, Ga. - Statistically sound ways of processing both samples and data are essential in order for experimental results to be biologically relevant, cautioned Kevin Coombes, the head of the bioinformatics section at the MD Anderson Cancer Center, during his talk here at the Association of Biomolecular Resource Facilites meeting this week.

When conducting biological experiments, especially those involving sensitive instruments, researchers should consider not only having a consistent protocol for handling all samples, but must also be careful to randomize the order in which different categories of samples are run. They must also make certain to calibrate instruments over a wide enough range, and to statistically correct for instrumentation qualities such as the baseline in MALDI mass spectrometers, Coombes said.

To illustrate the effect that sample run order can have on the results of an experiment, Coombes presented results from the re-analysis of a published study on ovarian cancer.

When Coombes re-analyzed data from the study, he found that there were two clusters of cancer patients that emerged, and he began to question what their meaning was.

After some investigation, Coombes discovered that many samples that had run on the third day of the experiment had poor quality mass-spectrometry spectra, and on the fourth day the instrument had to be repaired. He then discovered that all the cancer cases had been processed first, and then the controls. This resulted in skewed data, in which cases had better quality mass-spec data, and controls had poorer quality data.

"The obvious lesson here is that not only do you have to keep the protocol the same, but you have to do randomization of patient enrollment and sample processing," said Coombes.

Coombes also emphasized that it is important to statistically process data in order to account for quirks in instrumentation. To illustrate his point, Coombes referred to a cancer study that used a MALDI-TOF instrument to produce three datasets.

In the study, the first two data sets had not been statistically processed to account for baseline variation. The third data set had been statistically corrected. When the data was re-analyzed, Coombes' research group found that the third data set was reproducible, while the first two data sets were not.

"Technology can overwhelm biology quite easily," Coombes said during his ABRF presentation. "Processing makes a difference. If you don't process [MALDI] data correctly, the variable baseline will dominate."

When asked to comment on Coombes' presentation, David Muddiman, a professor at the Mayo Clinic College of Medicine (see Proteomics Pioneer, p.6), said that his research group worries more about sample processing and "things you can't change later," rather than data processing.

"In terms of pre-processing, we do everything including changing our pipette tips every 20 samples, and designing our experiment so that all samples are processed on the same day, with the same person," said Muddiman. "In terms of baseline processing, that's something you can do later by re-analyzing data. Plus, we use FTICR, and I think baseline is something unique to [the MALDI-TOF] community."

Coombes explained that the hump that appears in MALDI-TOF baseline is probably as a result of the way MALDI matrixes are ionized. Baseline is not as much of a problem with other instruments, he said.

Coombes said it might be a good idea to have a journal committee put together a checklist of things that researchers should be aware of in conducting an experiment, including sample handling, run order randomization, instrument calibration, baseline correction, noise reduction, and other aspects of data processing.

"They could have some kind of standard, like the MIAME standards for microarrays," said Coombes.

'One Size Doesn't Fit All'

However, some other researchers thought that experiments are too variable to be overseen by a committee.

"One size doesn't fit all," said Steven Gross, the director of the mass spectrometry facility at Cornell University's Weill Medical College. "There are so many different kinds of experiments. Things don't necessarily apply from one experiment to the next. I think it's the job of reviewers and journal editors to make sure that the data is validated."

Gross said he found Coombes' talk "enlightening," and said he "never thought about what order to run samples in. Most people just assume that the technology is working. But I'm surprised these things slip by reviewers."

There is no gold standard for data processing, Coombes pointed out. He suggested producing some kind of standard by simulating different kinds of complex spectra and then thinking about the best way to process the data.

"There's all kinds of phenomena — periodic phenomena, electronic noise. Processing makes a difference," he said.

Tom Skyler, a marketing representative for Bio-Rad, said that Coombes' talk made a lot of sense. "Researchers should use common sense," he said. "You don't want to have all the controls in front, and all the cases in the back. What Coombes said makes a lot of sense as far as doing statistical analysis."


The Scan

Study Finds Sorghum Genetic Loci Influencing Composition, Function of Human Gut Microbes

Focusing on microbes found in the human gut microbiome, researchers in Nature Communications identified 10 sorghum loci that appear to influence the microbial taxa or microbial metabolite features.

Treatment Costs May Not Coincide With R&D Investment, Study Suggests

Researchers in JAMA Network Open did not find an association between ultimate treatment costs and investments in a drug when they analyzed available data on 60 approved drugs.

Sleep-Related Variants Show Low Penetrance in Large Population Analysis

A limited number of variants had documented sleep effects in an investigation in PLOS Genetics of 10 genes with reported sleep ties in nearly 192,000 participants in four population studies.

Researchers Develop Polygenic Risk Scores for Dozens of Disease-Related Exposures

With genetic data from two large population cohorts and summary statistics from prior genome-wide association studies, researchers came up with 27 exposure polygenic risk scores in the American Journal of Human Genetics.