Statistical significance — getting to that p value cutoff of 0.05 — may not be all it is cracked up to be. In the Proceedings of the National Academy of Sciences this week, Texas A&M University's Valen Johnson examines how classical statistical testing matches up to Bayesian evidence thresholds.
He reports that p values of 0.05 or less "represent only moderate evidence against null hypotheses." Further, he calculates that some 17 percent to 25 percent of marginally significant findings are false. He suggests that scientist adopt a more stringent cutoff for p values at 0.005 or even 0.001.
“Very few studies that fail to replicate are based on p values of 0.005 or smaller,” Johnson tells Nature News' Erika Check Hayden.
Similarly, Ivan Oransky at Retraction Watch notes that "just-significant results" have been plaguing psychological research. Two studies, both appearing in the Quarterly Journal of Experimental Psychology, found that p values coming in right around the 0.05 threshold and below were more common in the psychological research than expected. The authors of one study suggest that "[t]he problem may be alleviated by reduced reliance on p values and increased reporting of confidence intervals and effect sizes."
Or, as Hilda Bastian says at her Absolutely Maybe blog at Scientific American, data can be examined in a variety of ways, and researchers shouldn't rely on just one measure. "The p value is not one number to rule them all," she adds.