Skip to main content
Premium Trial:

Request an Annual Quote

MAQC Participants Struggle with ‘Robust’ Classifiers


During MAQC’s seventh face-to-face project meeting, consortium participants presented some initial results of the second phase of the project, called MAQC-II. While MAQC I evaluated the reproducibility of microarray experiments across different labs and platforms, MAQC-II is focusing on the prediction of biological outcomes based on microarray data.

Four participating groups are independently analyzing several clinical and toxicogenomics data sets with the goal of identifying “best practices” for developing classifiers that are reliable enough to use in a clinical setting.

Much of the effort comes down to statistics, and one of the four working groups is devoted to that aspect of the project. The Regulatory Biostatistics Working Group’s goal is to “develop a standard operating procedure document about how to build and validate predictive models,” says Greg Campbell, a biostatistician at the US Food and Drug Administration’s Center for Devices and Radiological Health, and a coordinator of the RBWG.

Campbell stresses that while he and several other FDA staffers are involved in the working group, the documents it produces are not official FDA guidance documents or recommendations.

The SOP document is meant to guide the MAQC analysis groups as they develop their own statistical analysis plans, or SAPs, which are detailed, step-by-step descriptions of each method used to develop a predictive model. The SAPs and classifiers are “frozen” and submitted to the RBWG, which then evaluates the models based on accuracy, sensitivity and specificity, and reproducibility, or “robustness.”

Campbell emphasized that the goal of the initiative is to come up with an analysis plan and stick with it — much as a developer of a diagnostic genomic signature would have to do when submitting a classifier to the FDA for approval.

“The whole point is to select a classifier and validate it,” he says, noting that the RBWG’s role is to encourage the microarray analysis community to “move away from pure, exploratory playing with the data.”
Bernadette Toner

Short Reads

Waters and Rosetta Biosoftware have announced that they will integrate Rosetta’s software with two mass spectrometers from Waters. The deal requires that Rosetta’s Elucidator software for protein expression data management be made compatible with Water’s Q-Tof Premier and Synapt high-definition mass spectrometers.

Several universities in New York have formed a statewide consortium to put the state on the cutting edge of computational biology. Robert McGrath, provost and VP for Brookhaven Laboratory Affairs at Stony Brook University, is leading the consortium, which also includes Columbia University, Cornell University, and New York University.

The NIH released the first data from the Genetic Association Information Network project via NCBI’s dbGAP database. Summary-level data is available to the public, while individual-level data requires preauthorization.

Russia’s Interior Ministry recently penned a law laying the groundwork for the establishment of a genome data bank. The Russian government claims the database will also be used to help fight terrorism and “extremism,” as well as to facilitate the identification of bodies with DNA analysis.

Genedata said pharma firm UCB will use two of its platforms to support drug-discovery research targeting central nervous system disorders, allergy/respiratory diseases, immune and inflammatory disorders, and oncology.


US Patent 7,228,239. Methods and systems for classifying mass spectra. Inventor: Lucio Cetto. Assignee: The Mathworks. Issued: June 5, 2007.
According to the abstract, this patent covers methods and systems for “classifying mass spectra to discriminate the absence or existence of a condition…[and] determining a first or higher order derivative of the signals of the mass spectra, or any linear combination of the signal and a derivative of the signal, to form a mass spectra data set for training a classifier. … [That] classifies mass spectra samples to improve discriminating between the absence or existence of a condition.”

US Patent 7,228,237. Automatic threshold setting and baseline determination for real-time PCR. Inventors: David Woo, Clinton Lewis, Nasser Abbasi. Assignee: Applera. Issued: June 5, 2007.
This patent covers “a system and methods for quantitating the presence of nucleic acid sequences by evaluation of amplification data generated using real-time PCR. In one aspect, the methods may be adapted to identify a threshold and threshold cycle for one or more reactions based upon evaluation of exponential and baseline regions for each amplification reaction.”

Data Point

97 million

Number of entries — corresponding to 170 gigabases of sequence — currently held in the database at European Molecular Biology Laboratory

The Scan

Tens of Millions Saved

The Associated Press writes that vaccines against COVID-19 saved an estimated 20 million lives in their first year.

Supersized Bacterium

NPR reports that researchers have found and characterized a bacterium that is visible to the naked eye.

Also Subvariants

Moderna says its bivalent SARS-CoV-2 vaccine leads to a strong immune response against Omicron subvariants, the Wall Street Journal reports.

Science Papers Present Gene-Edited Mouse Models of Liver Cancer, Hürthle Cell Carcinoma Analysis

In Science this week: a collection of mouse models of primary liver cancer, and more.