As usual, I agree with Paul Meehl: “It is not a reform of significance testing as currently practiced in soft-psych. We are making a more heretical point than any of these: We are attacking the whole tradition of null-hypothesis refutation as a way of appraising theories.”

Javier Benitez sends along the about quote from Meehl’s 1990 article, Appraising and Amending Theories: The Strategy of Lakatosian Defense and Two Principles that Warrant It:

I wish that Shalizi and I had known about that Meehl article when we were writing this. When we wrote that article, we were thinking about Bayesian statistics and how to situate good applied statistics work from a philosophy-of-science perspective. We weren’t thinking about the relevance of these ideas to understanding problems with junk science.

P.S. Thanks to Zad for sending along the above picture of a cat who seems to be thinking carefully about falsificationism.

12 thoughts on “As usual, I agree with Paul Meehl: “It is not a reform of significance testing as currently practiced in soft-psych. We are making a more heretical point than any of these: We are attacking the whole tradition of null-hypothesis refutation as a way of appraising theories.”

  1. Yep, there is no “right” way to have cancer, just like theres no right way to do NHST. If forced to have NHST the best you can do is mitigate the damage, but why are we forced to have it?

  2. “…but why are we forced to have it?”
    because social scientists and biomedical researchers were trained to believe that NHST is a valid method, if not the only method, of evidence certification.

    Unfortunately, the current system of publication, grant funding, and promotion does not punish the negative consequences of that system of belief.

  3. If someone wanted to devise an error-analysis tree, it would include items like these:

    data
    biased/unbiased
    representative/unrepresentative
    reproducable/unreproducable
    fixed or changeable over time
    noise
    normality
    fraud
    error
    statistics
    power
    N
    criteria
    error detection and correction
    degrees of freedom
    analysis
    methods
    experimenter degrees of freedom
    accommodates non-normality?
    accommodates noise?
    accommodates disagreement between theory and results?
    error
    effect of noise on validity of analysis.
    theory
    weak/strong
    specific/can explain any result
    hypotheses
    error
    philosophy
    confirmation/proof/verification basis

    NHST seems to touch only three of these (and they are certainly not a complete breakdown):
    theory–>hypotheses
    philosophy–> confirmation, etc.
    statistics–>criteria

    It’s not necessary to get into a philosophical pissing contest about th euse of NHST to see that NHST is nearly the least of your worries. Most work I have seen seems to assume that all of these error tree branches are happily in accordance with idealized statistical experiments. Ha! Lots of luck!

    • “NHST seems to touch only three of these” – I’m not sure I understand this. Doesn’t NHST and the problems associated with it have a lot to do with N, power, noise, normality, etc.??

      • I may have been too cavalier in how I phrased it. No, I don’t think that the main problems associated with NHST have much to do with N, power, noise, and a lot of other things. You can analyze the data without the step of NHST, for example. Yes, noisy data with small effects makes it easier to draw unwarranted NHST conclusions, but they can also lead to unwarranted conclusions of other kinds, too.

        But a hypothesis is a kind of theory, or a sub-theory, or something of the sort. If that theory is wrong, weakly supported, or poorly posed, getting a NHST won’t help you to get a reliable result, no matter how favorable the statistics seem to be.

        My point is, IOW, that there are so many elements in the error analysis tree that getting a NHST result is a tiny part of the ultimate reliability of the analysis. That’s even if the hypothesis is well-posed, capable of being properly falsified, and there is no bias or wishful interpretation.

        Andrew has often written that most of the kinds of effects that he’s interested are not zero – because they never are truly zero. The questions are what sign do they have, and are they large enough to make a difference. A NHST is worthless in that case, because you know that a well-designed experiment is going to show a non-null effect. The NHST here is an example of an ill-posed hypothesis, which is a subset of problems with theory.

Leave a Reply to jd Cancel reply

Your email address will not be published. Required fields are marked *