As usual, I agree with Paul Meehl: “It is not a reform of significance testing as currently practiced in soft-psych. We are making a more heretical point than any of these: We are attacking the whole tradition of null-hypothesis refutation as a way of appraising theories.”

Posted on March 3, 2020 9:51 AM by Andrew

Javier Benitez sends along the about quote from Meehl’s 1990 article, Appraising and Amending Theories: The Strategy of Lakatosian Defense and Two Principles that Warrant It:

I wish that Shalizi and I had known about that Meehl article when we were writing this. When we wrote that article, we were thinking about Bayesian statistics and how to situate good applied statistics work from a philosophy-of-science perspective. We weren’t thinking about the relevance of these ideas to understanding problems with junk science.

P.S. Thanks to Zad for sending along the above picture of a cat who seems to be thinking carefully about falsificationism.

12 thoughts on “As usual, I agree with Paul Meehl: “It is not a reform of significance testing as currently practiced in soft-psych. We are making a more heretical point than any of these: We are attacking the whole tradition of null-hypothesis refutation as a way of appraising theories.””

Anoneuoid on March 3, 2020 10:44 AM at 10:44 am said:

Yep, there is no “right” way to have cancer, just like theres no right way to do NHST. If forced to have NHST the best you can do is mitigate the damage, but why are we forced to have it?

Reply ↓
Garnett on March 3, 2020 11:35 AM at 11:35 am said:

“…but why are we forced to have it?”
because social scientists and biomedical researchers were trained to believe that NHST is a valid method, if not the only method, of evidence certification.

Unfortunately, the current system of publication, grant funding, and promotion does not punish the negative consequences of that system of belief.

Reply ↓
- jd on March 3, 2020 11:39 AM at 11:39 am said:
  
  “does not punish” – It actually promotes it. For example, power analysis is required for grant submissions.
  
  Reply ↓
- Anoneuoid on March 3, 2020 1:01 PM at 1:01 pm said:
  
  True, I still remember believing that.
  
  Reply ↓
Dan Wright on March 3, 2020 11:50 AM at 11:50 am said:

For those interested, other Meehl publications are available at: https://meehl.umn.edu/all-publications.

Reply ↓
- Anoneuoid on March 3, 2020 12:59 PM at 12:59 pm said:
  
  And lecture videos here: https://meehl.umn.edu/talks/philosophical-psychology-1989
  
  Reply ↓
Tom Passin on March 3, 2020 11:50 AM at 11:50 am said:

If someone wanted to devise an error-analysis tree, it would include items like these:

data
biased/unbiased
representative/unrepresentative
reproducable/unreproducable
fixed or changeable over time
noise
normality
fraud
error
statistics
power
N
criteria
error detection and correction
degrees of freedom
analysis
methods
experimenter degrees of freedom
accommodates non-normality?
accommodates noise?
accommodates disagreement between theory and results?
error
effect of noise on validity of analysis.
theory
weak/strong
specific/can explain any result
hypotheses
error
philosophy
confirmation/proof/verification basis

NHST seems to touch only three of these (and they are certainly not a complete breakdown):
theory–>hypotheses
philosophy–> confirmation, etc.
statistics–>criteria

It’s not necessary to get into a philosophical pissing contest about th euse of NHST to see that NHST is nearly the least of your worries. Most work I have seen seems to assume that all of these error tree branches are happily in accordance with idealized statistical experiments. Ha! Lots of luck!

Reply ↓
- jd on March 3, 2020 12:24 PM at 12:24 pm said:
  
  “NHST seems to touch only three of these” – I’m not sure I understand this. Doesn’t NHST and the problems associated with it have a lot to do with N, power, noise, normality, etc.??
  
  Reply ↓
  - Tom Passin on March 3, 2020 6:42 PM at 6:42 pm said:
    
    I may have been too cavalier in how I phrased it. No, I don’t think that the main problems associated with NHST have much to do with N, power, noise, and a lot of other things. You can analyze the data without the step of NHST, for example. Yes, noisy data with small effects makes it easier to draw unwarranted NHST conclusions, but they can also lead to unwarranted conclusions of other kinds, too.
    
    But a hypothesis is a kind of theory, or a sub-theory, or something of the sort. If that theory is wrong, weakly supported, or poorly posed, getting a NHST won’t help you to get a reliable result, no matter how favorable the statistics seem to be.
    
    My point is, IOW, that there are so many elements in the error analysis tree that getting a NHST result is a tiny part of the ultimate reliability of the analysis. That’s even if the hypothesis is well-posed, capable of being properly falsified, and there is no bias or wishful interpretation.
    
    Andrew has often written that most of the kinds of effects that he’s interested are not zero – because they never are truly zero. The questions are what sign do they have, and are they large enough to make a difference. A NHST is worthless in that case, because you know that a well-designed experiment is going to show a non-null effect. The NHST here is an example of an ill-posed hypothesis, which is a subset of problems with theory.
    
    Reply ↓
tom on March 3, 2020 11:52 AM at 11:52 am said:

Sorry, the above was supposed to have been formatted as an indented list. Can I get that effect by adding … tags?

Reply ↓
- Tom Passin on March 3, 2020 6:23 PM at 6:23 pm said:
  
  That was supposed to be similar to [pre]..[/pre]. I guess the question has answered itself.
  
  Reply ↓
Thanatos Savehn on March 4, 2020 8:29 PM at 8:29 pm said:

This is awesome.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

As usual, I agree with Paul Meehl: “It is not a reform of significance testing as currently practiced in soft-psych. We are making a more heretical point than any of these: We are attacking the whole tradition of null-hypothesis refutation as a way of appraising theories.”

12 thoughts on “As usual, I agree with Paul Meehl: “It is not a reform of significance testing as currently practiced in soft-psych. We are making a more heretical point than any of these: We are attacking the whole tradition of null-hypothesis refutation as a way of appraising theories.””

Leave a Reply Cancel reply