Skip to main content
Premium Trial:

Request an Annual Quote

At ABRF Conference, Bioinformatician Warns of Statistically Unsound Experimental Procedures


Savannah, Ga. - Statistically sound ways of processing both samples and data are essential in order for experimental results to be biologically relevant, cautioned Kevin Coombes, the head of the bioinformatics section at the MD Anderson Cancer Center, during his talk here at the Association of Biomolecular Resource Facilites meeting this week.

When conducting biological experiments, especially those involving sensitive instruments, researchers should consider not only having a consistent protocol for handling all samples, but must also be careful to randomize the order in which different categories of samples are run. They must also make certain to calibrate instruments over a wide enough range, and to statistically correct for instrumentation qualities such as the baseline in MALDI mass spectrometers, Coombes said.

To illustrate the effect that sample run order can have on the results of an experiment, Coombes presented results from the re-analysis of a published study on ovarian cancer.

When Coombes re-analyzed data from the study, he found that there were two clusters of cancer patients that emerged, and he began to question what their meaning was.

After some investigation, Coombes discovered that many samples that had run on the third day of the experiment had poor quality mass-spectrometry spectra, and on the fourth day the instrument had to be repaired. He then discovered that all the cancer cases had been processed first, and then the controls. This resulted in skewed data, in which cases had better quality mass-spec data, and controls had poorer quality data.

"The obvious lesson here is that not only do you have to keep the protocol the same, but you have to do randomization of patient enrollment and sample processing," said Coombes.

Coombes also emphasized that it is important to statistically process data in order to account for quirks in instrumentation. To illustrate his point, Coombes referred to a cancer study that used a MALDI-TOF instrument to produce three datasets.

In the study, the first two data sets had not been statistically processed to account for baseline variation. The third data set had been statistically corrected. When the data was re-analyzed, Coombes' research group found that the third data set was reproducible, while the first two data sets were not.

"Technology can overwhelm biology quite easily," Coombes said during his ABRF presentation. "Processing makes a difference. If you don't process [MALDI] data correctly, the variable baseline will dominate."

When asked to comment on Coombes' presentation, David Muddiman, a professor at the Mayo Clinic College of Medicine (see Proteomics Pioneer, p.6), said that his research group worries more about sample processing and "things you can't change later," rather than data processing.

"In terms of pre-processing, we do everything including changing our pipette tips every 20 samples, and designing our experiment so that all samples are processed on the same day, with the same person," said Muddiman. "In terms of baseline processing, that's something you can do later by re-analyzing data. Plus, we use FTICR, and I think baseline is something unique to [the MALDI-TOF] community."

Coombes explained that the hump that appears in MALDI-TOF baseline is probably as a result of the way MALDI matrixes are ionized. Baseline is not as much of a problem with other instruments, he said.

Coombes said it might be a good idea to have a journal committee put together a checklist of things that researchers should be aware of in conducting an experiment, including sample handling, run order randomization, instrument calibration, baseline correction, noise reduction, and other aspects of data processing.

"They could have some kind of standard, like the MIAME standards for microarrays," said Coombes.

'One Size Doesn't Fit All'

However, some other researchers thought that experiments are too variable to be overseen by a committee.

"One size doesn't fit all," said Steven Gross, the director of the mass spectrometry facility at Cornell University's Weill Medical College. "There are so many different kinds of experiments. Things don't necessarily apply from one experiment to the next. I think it's the job of reviewers and journal editors to make sure that the data is validated."

Gross said he found Coombes' talk "enlightening," and said he "never thought about what order to run samples in. Most people just assume that the technology is working. But I'm surprised these things slip by reviewers."

There is no gold standard for data processing, Coombes pointed out. He suggested producing some kind of standard by simulating different kinds of complex spectra and then thinking about the best way to process the data.

"There's all kinds of phenomena — periodic phenomena, electronic noise. Processing makes a difference," he said.

Tom Skyler, a marketing representative for Bio-Rad, said that Coombes' talk made a lot of sense. "Researchers should use common sense," he said. "You don't want to have all the controls in front, and all the cases in the back. What Coombes said makes a lot of sense as far as doing statistical analysis."


The Scan

For Better Odds

Bloomberg reports that a child has been born following polygenic risk score screening as an embryo.

Booster Decision Expected

The New York Times reports the US Food and Drug Administration is expected to authorize a booster dose of the Pfizer-BioNTech SARS-CoV-2 vaccine this week for individuals over 65 or at high risk.

Snipping HIV Out

The Philadelphia Inquirer reports Temple University researchers are to test a gene-editing approach for treating HIV.

PLOS Papers on Cancer Risk Scores, Typhoid Fever in Colombia, Streptococcus Protection

In PLOS this week: application of cancer polygenic risk scores across ancestries, genetic diversity of typhoid fever-causing Salmonella, and more.