Skip to main content
Premium Trial:

Request an Annual Quote

TOOLKIT: Storey s Software Helps Microarray Researchers Mind Their P- and Q-Values

Premium

By now, most biologists working with microarrays have made statistical methods like t-tests and p-values a part of their everyday lives — albeit grudgingly. So when biologists began badgering John Storey, a biostatistician at the University of Washington, for a new statistical tool, he knew they really must have needed it.

“Biologists were asking, ‘How do I assess significance?’” said Storey. “They know that the p-value doesn’t tell them what they want to know.”

Storey soon realized that the p-value, which describes significance in terms of the false positive rate, wasn’t helping biologists determine which genes to study in large, genome-wide data sets, such as those delivered by microarray experiments. What his biologist colleagues were really looking for, Storey said, was the false discovery rate — a measure that judges the significance of the actual set of genes selected to study.

For example, a false positive rate of 5 percent means that around 5 percent of “uninteresting,” or null, features out of an entire study will be mistakenly called significant, but a false discovery rate of 5 percent means that 5 percent of only those features called significant are in error.

Biologists commonly use p-value cutoffs to determine their gene lists in microarray experiments, but Storey cautioned that this provides little — if any — information about the significance of the genes actually selected. He recommends they instead use the “q-value” — “basically just a user-friendly measure based on the false discovery rate, just as the p-value is based on the false positive rate” — for that task, and developed some software to make the process a little less painful for biologists.

In a recent paper in the Proceedings of the Natural Academy of Sciences [PNAS 2003 100 (16): 9440-9445], Storey and co-author Robert Tibshirani introduced the software, appropriately called Q-Value, and suggested ways in which it could be applied in several published microarray studies to alleviate the difficulties of interpreting p-value thresholds.

Biologists tend to use an iterative process of arbitrary p-value cutoffs — say 0.05 — combined with biological knowledge, Story said. If a researcher expects to see a particular gene show up in the selected set, the initial p-value threshold is shifted until that gene appears, but the researcher doesn’t know what impact that adjustment has on the significance of the genes selected. The q-value “justifies what [biologists] were going to do anyway,” by adding a statistical measure to interpret the significance of the selected genes, he said.

The software, a set of R functions that is freely available at http://faculty.washington.edu/~jstorey/qvalue/index.html, estimates q-values for a given list of p-values and generates a series of graphs to help the user decide the significance of various cut-off points. For each q-value threshold, it indicates how many significant results to expect; and for each number of significant results, how many false positives to expect.

Storey said that prior to the PNAS paper, over 300 users had already downloaded Q-Value based on word of mouth alone. He is currently working on a more user-friendly point-and-click version of the software.

The concept of the false discovery rate was first proposed in 1995, Storey said, but the idea has only recently been extended to work on a genome-wide scale with tens of thousands of features. The work proves that bioinformatics is not limited to borrowing established statistical methods, but can actually contribute a few of its own. “Genome-wide studies have really inspired us to take a different perspective on some of these ideas about statistical significance,” said Storey. “The field really has motivated some new statistical ideas.”

— BT

Filed under

The Scan

Study Finds Few FDA Post-Market Regulatory Actions Backed by Research, Public Assessments

A Yale University-led team examines in The BMJ safety signals from the US FDA Adverse Event Reporting System and whether they led to regulatory action.

Duke University Team Develops Programmable RNA Tool for Cell Editing

Researchers have developed an RNA-based editing tool that can target specific cells, as they describe in Nature.

Novel Gene Editing Approach for Treating Cystic Fibrosis

Researchers in Science Advances report on their development of a non-nuclease-based gene editing approach they hope to apply to treat cystic fibrosis.

Study Tracks Responses in Patients Pursuing Polygenic Risk Score Profiling

Using interviews, researchers in the European Journal of Human Genetics qualitatively assess individuals' motivations for, and experiences with, direct-to-consumer polygenic risk score testing.