Skip to main content
Premium Trial:

Request an Annual Quote

TOOLKIT: Storey s Software Helps Microarray Researchers Mind Their P- and Q-Values

Premium

By now, most biologists working with microarrays have made statistical methods like t-tests and p-values a part of their everyday lives — albeit grudgingly. So when biologists began badgering John Storey, a biostatistician at the University of Washington, for a new statistical tool, he knew they really must have needed it.

“Biologists were asking, ‘How do I assess significance?’” said Storey. “They know that the p-value doesn’t tell them what they want to know.”

Storey soon realized that the p-value, which describes significance in terms of the false positive rate, wasn’t helping biologists determine which genes to study in large, genome-wide data sets, such as those delivered by microarray experiments. What his biologist colleagues were really looking for, Storey said, was the false discovery rate — a measure that judges the significance of the actual set of genes selected to study.

For example, a false positive rate of 5 percent means that around 5 percent of “uninteresting,” or null, features out of an entire study will be mistakenly called significant, but a false discovery rate of 5 percent means that 5 percent of only those features called significant are in error.

Biologists commonly use p-value cutoffs to determine their gene lists in microarray experiments, but Storey cautioned that this provides little — if any — information about the significance of the genes actually selected. He recommends they instead use the “q-value” — “basically just a user-friendly measure based on the false discovery rate, just as the p-value is based on the false positive rate” — for that task, and developed some software to make the process a little less painful for biologists.

In a recent paper in the Proceedings of the Natural Academy of Sciences [PNAS 2003 100 (16): 9440-9445], Storey and co-author Robert Tibshirani introduced the software, appropriately called Q-Value, and suggested ways in which it could be applied in several published microarray studies to alleviate the difficulties of interpreting p-value thresholds.

Biologists tend to use an iterative process of arbitrary p-value cutoffs — say 0.05 — combined with biological knowledge, Story said. If a researcher expects to see a particular gene show up in the selected set, the initial p-value threshold is shifted until that gene appears, but the researcher doesn’t know what impact that adjustment has on the significance of the genes selected. The q-value “justifies what [biologists] were going to do anyway,” by adding a statistical measure to interpret the significance of the selected genes, he said.

The software, a set of R functions that is freely available at http://faculty.washington.edu/~jstorey/qvalue/index.html, estimates q-values for a given list of p-values and generates a series of graphs to help the user decide the significance of various cut-off points. For each q-value threshold, it indicates how many significant results to expect; and for each number of significant results, how many false positives to expect.

Storey said that prior to the PNAS paper, over 300 users had already downloaded Q-Value based on word of mouth alone. He is currently working on a more user-friendly point-and-click version of the software.

The concept of the false discovery rate was first proposed in 1995, Storey said, but the idea has only recently been extended to work on a genome-wide scale with tens of thousands of features. The work proves that bioinformatics is not limited to borrowing established statistical methods, but can actually contribute a few of its own. “Genome-wide studies have really inspired us to take a different perspective on some of these ideas about statistical significance,” said Storey. “The field really has motivated some new statistical ideas.”

— BT

Filed under

The Scan

For Better Odds

Bloomberg reports that a child has been born following polygenic risk score screening as an embryo.

Booster Decision Expected

The New York Times reports the US Food and Drug Administration is expected to authorize a booster dose of the Pfizer-BioNTech SARS-CoV-2 vaccine this week for individuals over 65 or at high risk.

Snipping HIV Out

The Philadelphia Inquirer reports Temple University researchers are to test a gene-editing approach for treating HIV.

PLOS Papers on Cancer Risk Scores, Typhoid Fever in Colombia, Streptococcus Protection

In PLOS this week: application of cancer polygenic risk scores across ancestries, genetic diversity of typhoid fever-causing Salmonella, and more.