Our hypotheses are not just falsifiable; they’re actually false.

Everybody’s talkin bout Popper, Lakatos, etc. I think they’re great. Falsificationist Bayes, all the way, man!

But there’s something we need to be careful about. All the statistical hypotheses we ever make are false. That is, if a hypothesis becomes specific enough to make (probabilistic) predictions, we know that with enough data we will be able to falsify it.

So, here’s the paradox. We learn by falsifying hypotheses, but we know ahead of time that our hypotheses are false. Whassup with that?

The answer is that the purpose of falsification is not to falsify. Falsification is useful not in telling us that a hypothesis is false—we already knew that!—but rather in telling us the directions in which it is lacking, which points us ultimately to improvements in our model. Conversely, lack of falsification is also useful in telling us that our available data are not rich enough to go beyond the model we are currently fitting.

P.S. I was motivated to write this after seeing this quotation: “. . . this article pits two macrotheories . . . against each other in competing, falsifiable hypothesis tests . . .”, pointed to me by Kevin Lewis.

And, no, I don’t think it’s in general a good idea to pit theories against each other in competing hypothesis tests. Instead I’d prefer to embed the two theories into a larger model that includes both of them. I think the whole attitude of A-or-B-but-not-both is mistaken; for more on this point, see for example the discussion on page 962 of this review article from a few years ago.

23 thoughts on “Our hypotheses are not just falsifiable; they’re actually false.

  1. I’m reminded of a related article article:

    Cooper, W. H., & Richardson, A. J. (1986). Unfair comparisons. Journal of Applied Psychology, 71(2), 179-184.

    He makes the argument that if you’re doing model comparisons each model has to be fairly given a chance to fail.

  2. Nice article; thanks for the link and discussion. I especially appreciate the following comments toward the bottom of p. 961 (especially the last sentence of the quote):

    “I realize, however, that my perspective that there are no true zeros (information restrictions aside) is a minority view among social scientists and perhaps among people in general, on the evidence of Sloman’s book. For example, from chapter 2: “A good politician will know who is mo tivated by greed and who is motivated by larger principles in order to discern how to solicit each one’s vote when it is needed” (p. 17). I can well believe that people think in this way but I don’t buy it: just about everyone is motivated by greed and by larger principles. This sort of discrete thinking doesn’t seem to me to be at all realistic about how people behave—although it might very well be a good model about how people characterize others.”

    My own observation is that a lot of people do often think in such discrete ways — and I find it frustrating to talk with them, because my thinking is typically shades of gray. Something I would like to see as a subject of (good) psychological research is the prevalence of discrete vs continuous (shades of gray) thinking — and lots of related questions, including:

    How the differences in thinking cause conflicts

    How to deal with someone who thinks in the other manner than you do.

    Whether people consistently think in one of these ways, or whether they think differently in different contexts.

    How/why people think in which way (e.g., the roles of nature and nurture in thinking one way or the other)

    • Some thoughts re my last question:

      I wonder if a really good calculus course has an influence — I remember when in high school I “got” the idea of a limit, as something you cold get closer to, but might not ever reach exactly (which is closely tied in with the idea of “no true zeros”). I wonder which came first, the chicken or the egg? Did calculus help me become a continuous thinker, or did I “get” the idea because I was already a continuous thinker? Or maybe I had some predilection to be a continuous thinker, but was not quite there until calculus gave me a little push?

    • Quote from above: “For example, from chapter 2: “A good politician will know who is motivated by greed and who is motivated by larger principles in order to discern how to solicit each one’s vote when it is needed” (p. 17). I can well believe that people think in this way but I don’t buy it: just about everyone is motivated by greed and by larger principles. This sort of discrete thinking doesn’t seem to me to be at all realistic about how people behave—although it might very well be a good model about how people characterize others.””

      I always have trouble with this reasoning (“just about everyone is motivated by greed and by larger principles”), which to me seems to roughly imply that 1) everyone is the same, and 2) we can not (try and) distinguish between people based on how they (might) act.

      To me it’s similar to sentences like “everyone is biased” which seems a sub-optimal description of reality, and seems a sub-optimal way of phrasing things for instances where these type of thoughts are used and/or expressed (e.g. in a discussion).

      Would you agree with the gist/sentiment of the sentence when it would read as follows:

      “A good politician will know who is MORE or LIKELY TO BE motivated by greed and who is MORE or LIKELY TO BE motivated by larger principles in order to discern how to solicit each one’s vote when it is needed.” ?

      • Anon:

        I think just about everyone is motivated by greed and by larger principles, but in different ways and in different proportions. I don’t think everyone is the same, nor do I think it’s impossible to distinguish people in their motivations; I just think it makes sense to do this in a continuous rather than a binary way.

  3. ‘Falsification is useful not in telling us that a hypothesis is false—we already knew that!—but rather in telling us the directions in which it is lacking”

    You have said something similar before and it always triggers in my mind this whenever you say it:

    From Where do we stand on Maximum Entropy by Jaynes:
    “We have here a total, absolute misconception about every point I have been trying to explain above. If Wolf’s data depart significantly from the maximum-entropy distribution….then the proper conclusion is not that maximum entropy methods “ignore the physics” but rather that the maximum entropy method brings out the physics by showing us that another physical constraint exists beyond that used in the calculation.”

    • Trevor:

      Yes, exactly. Jaynes was one of my major influences in forming my perspective/philosophy of model building and model checking. Reading passages such as the one you quote above was a key step in the development of my understanding.

  4. I am sure some theoretical comparisons are stellar. But most, to me at least, present false dichotomies as well. From my observation, some individuals are better diagnosticians. Maybe on account of their fluid and crystallized intelligence, regardless of their credentialing.

    In so far as including any one or two theories is a subjective choice obviously. I, therefore, would prefer to evaluate theories in depth. That can take many years, I speculate.

    Among the psychology circles on Twitter, I’m surprised how little discussion there is about theories. I lean to Paul Rozin’s views on psychology as I have pointed out before on this blog. Ascertaining the quality of insights is not necessarily a linear effort.

    • Maynard Smith said the following of Mathematical models:

      “I think it would be a mistake, however, to stick too rigidly
      to the criterion of falsifiability when judging theories
      in population biology. For example, Volterra’s
      equations for the dynamics of predator prey
      species are hardly falsifiable. In a sense they are
      manifestly false, since they make no allowance for
      age structure, for spatial distribution, or for the
      many other necessary features of real situations.
      Their merit is to show us that even a simple model
      for such an interaction leads to a sustained
      oscillation, a conclusion that would be hard to reach
      from purely verbal reasoning” (Evolution and the Theory of Games)

      Might a similar point hold for statistical models grounded in evidence?

      What is our question? What have we learn? Has confidence changed? What do we do with this understanding?

  5. It is intriguing to me how my thinking is quite similar to Andrew. I’m reading the articles above. Our values are very similar. I had given up hope until I came to this blog. Now I can connect with such wonderful minds.

    I, however, don’t have a statistics background.

  6. “All the statistical hypotheses we ever make are false. That is, if a hypothesis becomes specific enough to make (probabilistic) predictions, we know that with enough data we will be able to falsify it.”

    Can anyone help me out here in understanding why this is the case? I am not a statistician, but a molecular biology graduate student, and would like to understand the point being made here. I would really appreciate any insight people may have, because this post was really fascinating to me.

    The stumbling blocks to my understanding take a couple basic routes:

    1.]

    From philosophy the problem of induction would suggest that this statement cannot be taken as wholly true, i.e., it doesn’t necessarily follow, because as far as we know it could be the case that no amount of data will falsify it. The turkey could expect tomorrow to be like today and every other day it has experienced, and yet the farmer has other plans.

    2.]

    I am not sure what to do about this information. It makes me think of Borges’ On Exactitude in Science, in the sense that any given scientific model is essentially an incomplete map, and so, after enough description is collected, will be shown to be incomplete, and so strictly speaking false. But it seems as if there is already a tacit understanding that the model is a model and not reality. We don’t fill in the whole model as if it were reality; things are bracketed out, so at least broadly speaking, it isn’t strictly speaking true as if models are known beforehand to be false. A road map isn’t trying to be a road, it is trying to be a general description of the road. So the fact that the road map isn’t a road doesn’t mean the road map is false.

    3.]

    Thinking along something like Quine’s Web of Belief, a model could be thought of as a set of propositions or ideas or whatever resting on peripheral ideas (say something like first principles, which in biology might be basic descriptions of how certain assays work or thinks like DNA or genes), which in turn rest on even more peripheral ideas. But there are points at which certain pieces are necessary for the model to work. Like say molecular biology would be pretty screwed if it turned out that DNA didn’t actually exist.

    It seems as if one could align two models up and compare them, by looking only at the necessary pieces. If the pieces are mutually contradictory, and you provide sufficient evidence that the world is one of the ways, then it would follow that you could disregard one of the models.

    Neither of the models is true, but for the reason that a road map isn’t a road, but by comparing them at the level of what they claim to be, you can make distinctions like this road map is better, because that one doesn’t show this road as existing, even though it does?

    4.]

    Is this equivalent to saying, given that any hypothesis contains nouns and verbs which are related in a tacitly known (all of which could be sketched out in painful detail and textbooks written on it if we had to), that since it not only possible but extremely likely that given enough data one of the nouns or verbs or tacit relations referred to in the hypothesis don’t actually exist or aren’t true, like say a hypothesis references a species, and when you look close enough maybe there is no actually good point of demarcation between two groups and “species” isn’t an ontologically real category, therefore the hypothesis is false. Or is it like saying something else entirely?

    • Charlie:

      You can see my books and articles for dozens of applied statistics examples. In all of them, the models are approximate and, with enough data, could be rejected. Rejecting a statistical model is not the same as saying “DNA didn’t actually exist”; it’s more like saying there are flaws in some very specific mathematical model of the connection between DNA and observed data. These models always have flaws, and we can learn from these flaws; this is one way we improve our models and get better understanding and better prediction.

  7. If person A claims that model B is good/appropriate for a certain aim, for example a specific prediction task, then this claim can be falsified. The thing here is not whether the model is true or false but rather whether it does the job it was employed for. With that, the term “falsification” is still fine, isn’t it?

    • I agree in spirit, but I don’t think this is practical. In every situation, it will be debatable whether a particular model is “good” for the task at hand. So, in principle, every theory is falsifiable, in that someone might find it is not up to the task at hand. That brings us back to Andrew’s statement that all models are false.

      I actually think Mayo’s idea of “severity testing” is a move in the right direction. It is not binary, but some theories are more severely tested than others. I’m still struggling with how to formalize this concept (I didn’t find Mayo’s formal examples that clear or useful), but I do think it is a promising start.

      • “In every situation, it will be debatable whether a particular model is “good” for the task at hand.”
        Obviously, however the debate may lead to specific testable claims, or they may be provided, or requested from the person who claims that the model is good.

        I’d think that what “severely tested” should mean is somewhat more dependent on the specific situation, background, aims, and interests of those involved than what one may think when reading a philosophers’ book, however as a general concept to strive for I’m pretty fine with it.

        • > as a general concept to strive for I’m pretty fine with it
          I think almost everyone would be fine with it – what my grade 9 science teacher drove into us – they would distract us during experiments to create errors to discuss with the class.

  8. Popper also thought most scientific hypotheses (even good ones) are false. That’s why he thought the notion of verisimilitude — or closeness to the truth — is very important.

    • The problem with “closeness to the truth” is that in order to make this concept precise, one needs to assume that there is a “truth” to which a model/theory can be “close” in a well defined sense. As long as we’re talking about pragmatic aims such as prediction of which we can observe success/”closeness”, this is not too pragmatic, but whenever science wants to go beyond this, it’s hard.

Leave a Reply to Christian Hennig Cancel reply

Your email address will not be published. Required fields are marked *