E. J. Wagenmakers writes:
You may be interested in a recent article [by Nieuwenhuis, Forstmann, and Wagenmakers] showing how often researchers draw conclusions by comparing p-values. As you and Hal Stern have pointed out, this is potentially misleading because the difference between significant and not significant is not necessarily significant.
We were really suprised to see how often researchers in the neurosciences make this mistake. In the paper we speculate a little bit on the cause of the error.
From their paper:
In theory, a comparison of two experimental effects requires a statistical test on their difference. In practice, this comparison is often based on an incorrect procedure involving two separate tests in which researchers conclude that effects differ when one effect is significant (P < 0.05) but the other is not (P > 0.05). We reviewed 513 behavioral, systems and cognitive neuroscience articles in five top-ranking journals (Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience) and found that 78 used the correct procedure and 79 used the incorrect procedure.
I assume this has been an issue for close to a century; it’s interesting that it’s been noticed more in the past few years. I wonder what’s going on.
P.S. E. J. writes, “I know of no references that precede your work with Hal Stern.” I wonder, though. The idea is so important that I’d be surprised if Fisher, Yates, Neyman, Box, Tukey, etc., didn’t ever discuss it.