Skip to content

What is “explanation”?

“Explanation” is this thing that social scientists (or people in their everyday lives, acting like social scientists) do, where some event X happens and we supply a coherent story that concludes with X. Sometimes we speak of an event as “overdetermined,” when we can think of many plausible stories that all lead to X.

My question today is: what is explanation, in a statistical sense?

To understand why this is a question worth asking at all, compare to prediction. Prediction is another thing that we all to, typically in a qualitative fashion: I think she’s gonna win this struggle, I think he’s probably gonna look for a new job, etc. It’s pretty clear how to map everyday prediction into a statistical framework, and we can think of informal qualitative predictions as approximations to the predictions that could be made by a statistical model (as in the classic work of Meehl and others on clinical vs. statistical prediction).

Fitting “explanation” into a statistical framework is more of a challenge.

I was thinking about this the other day after reading a blog exchange that began with a post by sociologist Fabio Rojas entitled “the argo win easily explained”:

The Academy loves well crafted films that are about actors or acting, especially when actors save the day. These films often beat other films. Example: Shakespeare in Love beats Saving Private Ryan; the Kings Speech beats Black Swan, Inception and Social Network. Bonus: Argo had old Hollywood guys saving the day.

Thomas Basbøll commented, skeptically:

If is is so easy to explain, why didn’t you predict it, Fabio? . . . It’s not like you learned anything new about the nominated films over the past 48 hours besides who actually won. Isn’t this just typical of sociological so-called explanations? Once something had happened, a sociologist can “easily” explain it. If Lincoln had won I suppose that, too, would have been a no-brainer for sociology.

I could see where Basbøll was coming from, but his comment seemed to strong to me, so I responded to the thread:

To be fair, Fabio didn’t say “the argo win easily predicted,” he said “explained.” That’s different. For a social scientist to make a prediction is clear enough, but we also spend a lot of time explaining. (For example, after the 2010 congressional elections, I posted “2010: What happened?”.) Explanation is not the same as prediction but it’s not nothing. For a famous example, Freudian theory explains a lot but does not often predict, and Freudianism has lots of problems, but it is not an empty theory. The fact that Fabio could’ve explained a Lincoln win does not make his Argo explanation empty.

But this got me thinking: what exactly is explanation, from a statistical standpoint? (Over the years I’ve spent a lot of time considering commonsense “practical” ideas such as mixing of Markov chains, checking of model fit, statistical graphics, boundary-avoiding estimates, and storytelling, and placing them in a formal statistical-modeling framework. So I’m used to thinking this way.) Explanation is not prediction (for the reason indicated by Basbøll above), but it’s something.

I think that “explanation,” even in the absence of “prediction,” can be useful in helping us better understand our models. Rojas’s Argo explanation helps him elaborate his implicit theory of the Oscars, essentially constraining his theory as compared to where it was before the awards were announced. In that sense, “explanation” is an approximation to Bayesian updating. What “explanation” does is to align the theory to fit the data, which is comparable to the statistical procedure of restricting the parameters to the zone of high likelihood for the observed data.

Prediction is important, it’s essential for model checking, but explaining is another word for inference within a model. Without explanations (including after-the-fact explanations), it would be difficult to understand a model well enough to use it. Another way of putting it is that explanation is a form of consistency check.

What would make all my theorizing relevant here? It would be relevant to social science if it helps us to formulate our explanations in terms of what we have learned from the data: in this case, how are Rojas’s post-Oscars views of the world different from his views last week. If Basbøll is right and Rojas did not forecast the Argo win ahead of time, that’s fine; to that extent, his explanation will be more valuable to the extent that it articulates (even if only qualitatively) the role of the new information in refining his theories.

I’m curious what Rojas thinks of this, as he’s the one who created that particular explanation. I am sympathetic with Basbøll’s skepticism, but I feel like I get some value from explanations such as Rojas’s (or Freud’s), so I’d like to adapt my philosophy of scientific understanding to allow a role for such explanations, rather than to follow a Popper-like route and dismiss them as meaningless. Much of my efforts as a statistician have been devoted to adjusting the foundations to give a place for methods that are evidently (to me) helpful in understanding the world.

P.S. Larry and others note that to really be an “explanation,” a story should be causal. That sounds right to me. A purely predictive explanation would not really feel like an explanation at all. This is an important point that I’d not thought of, when writing the blog above.


  1. yop says:

    I like your viewpoint but isn’t bayesian updating about a given model? I think explanation involves the fitting of any plausible stories (models) to events of interest (data), not only pre-existing theories. That is a part of why social explanation has bad press.

    I find your blog post truly enlightening. Explaining is often a synonym of over-fitting and models should be checked with out-of-sample predictions. But shouldn’t we also link these issues with the difference between causal and descriptive inference? The explanation of social phenomena is not only weak because it over-fits the data, but also because it relies on observational data. We can imagine all kinds of confounders behind Rojas’ predictor (movies about actors may be done by people who care more about cinema as an art and who produce better films).

  2. […] See full story on […]

  3. Kaiser says:

    Excellent topic. I feel like explanation is related to causal inference. Prediction, esp. modern methods, is not concerned at all about causality. We’d use any variable however strange if it improves predictive accuracy. Most of the predictive variables are indicators and don’t “explain” any behavior. It is a beast to distill neural networks, random forests, ensemble models, etc. into a “story”.
    Explanations usually use simple models with the intent of modeling causal effects. I say intent because I don’t think causal models are useful if they don’t generate testable predictions. So it’s necessary to collect more data, and iterate. So yes, I think there is a connection to updating.

    • Rahul says:

      In my mind it is a hierarchy:

      “Explanation without prediction(1)” is useful; but less useful than “Prediction without explanation(2)”. The most useful of them all is: “Prediction with explanation(3)”.

      Often, (1) and (2) are stepping stones to (3).

      Yet, IMHO, predictive power of a model is indeed its ultimate test and should always remain the final goal.

      • Richard says:

        I agree that (3) is most useful but you would also not want to understate (1). The theory of evolution is a model that has enormous reach in terms of explaining the patterns of life on earth, but is unable to predict the future trajectory.

        • Richard says:

          The reason being, as Jonathan(a different one) points out below, is that there are are so many possible future trajectories for which it is impossible to gather all of the required input variables to make a correct prediction. But once it has occurred the theory can readily explain the observed outcome.

          • Rahul says:

            But that’s an example where *no* competing theory can predict either. My point is, if there is a theory that can indeed predict well, it does gain some traction over one that cannot.

      • Jeff says:

        Usefulness has nothing to do with it. (1) helps us to understand things when (3) is not available because of complexity. That does not make it “less useful.”

        An explanation of how mass shootings happen, for example, would be enormously useful, even though it could not predict the exact day and hour of the next one.

        • Rahul says:

          Yet, a theory of mass shootings that correctly predicts a statistical picture of the demographics or spatial location of future shooters is far more useful (I think) than a theory that cannot.

  4. Carlos says:

    Yudkowsky is always a little too verbose, but I feel his “technical explanation of technical explanation” deserves a mention in this discussion:

  5. gwern says:

    It’s worth noting that AFAIK, the various betting and prediction markets all expected _Argo_ to win. If so, we know that its victory was indeed predictable because it *was* predicted.

    The reasons were probably related to ‘buzz’ and anonymous comments (for example, that anonymous stream of thought by a director member as he went through the ballot) – I suspect if one were to look at the price histories, _Argo_ would become a clear favorite a few months ago. However, the subject matter has been known since before the movie was even released…

    So a prediction, to be clearly uncontaminated by hindsight bias and selection bias (everyone who had an equally elegant theory as to why _Lincoln_ was sure to win will now be very quiet about said elegant theory), must be registered far in advance of the Oscars.

  6. Another distinction between prediction and explanation is that if you’re just trying to predict you can include variables that from an explanatory standpoint are almost truisms or tautologies. It’s a completely different type of endeavor to say “movies about grim historical events usually win Oscars” versus “movies that the bookies favor and that won the Golden Globes usually win Oscars.” The latter gives better predictions than the former but it doesn’t really explain anything.

    • Trey Causey says:

      True, uninteresting variables are often good predictors. But I wonder if this is more a case of proximate vs. ultimate predictors. Bookies favor certain movies and those movies win Golden Globes perhaps because they are about grim historical events. I’m just not sure that prediction and explanation are necessarily as distinct as laid out. If the goal of deductive theory is to generate testable hypotheses (which themselves are predictions or, more often, retrodictions), then finding ultimate causes are more important than finding predictors that explain more variance.

  7. ralmond says:

    David Madigan and I struggled with the problem more years ago than I care to remember. David more or less convinced me at the time that the answer was weight of evidence (WOE). We took the idea of the evidence balance sheet from Speigelhalter and Knill-Jones and made some fancy displays. Madigan, Mosurski, and Almond (1997)

    The weight of evidence has some of the properties that I think you are looking for. In particular, it is recording the difference in log odds for a hypothesis versus its negation caused by a finding. This naturally gives low weight to common findings, and high weight to rare finding. For example, if you are trying to establish the proposition that a student is at a high level of proficiency, getting an easy item correct will not provide much evidence, but getting a hard item correct will.

    The big drawback of WOE is that it is only defined for binary propositions. Lately I’ve been playing with mutual information as an alternative (although Fisher Information is also closely related). But it is much harder to form the problem cleanly when you are trying to explain something that is non-binary.

  8. ceolaf says:

    I like your distinction between explantion and prediction. I’d like to add to that.

    Yes, I think that a model is a form of explanation. But a model is merely a one of many possible explanations. I think we need to remember that we live in a world of multiple possible explanations.

    However, we often want more than possible explanation. We often even want more than a plausible explanation. We sometimes want The Explanation.

    However, we live in a world of many possible outcomes. We end up not merely wanting an explanation of each of those possible outcomes. We want more.

    We often want a meta-explantion for why the explanation that won out won out. Yes, that might be the explantion for Argo’s win, but it doesn’t explain why this time the explanation for Lincoln winning didn’t prove more powerful. We want to know why this explaination and not that explanation.

    We want to know why this model and not that model.

    Is goodness of fit a sufficient meta-explanation? Isn’t goodness of fit dependent upon which data you have? Do we sometimes select a less well fitting model for some other reason?

    When is goodness of fit a good enough meta-explanation, and when do we go by substantive knowledge of the phemonema or relevant theory? (Is that meta-meta-explanation?) (I’m getting into epistemology too much, right?)

    I’ll try to find my tail.

    What is “explanation?” I don’t think it is just a model. I think it it is how/why we select (or build) the models we build, be they statistical or not.

    • zbicyclist says:

      “We often even want more than a plausible explanation. We sometimes want The Explanation.”

      I was going to write that explanations are cousins to religious apologia — something happened (or there is some religious belief) that needs to be justified / explained / explained away.

      Religious apologia aren’t so much a search for truth as for debating points (or, the avoidance of having some evidence collapse your entire world view).

      This week, for example, we have explanations for how Obama / House Republicans / Senate Democrats / American voters are each the “explanation” for the sequester debacle (including, of course, the explanation that the media is to blame for overblowing relatively minor cuts). These are really debaters’ arguments aimed at explaining how it is completely plausible, maybe even inevitable, that your side is right and provides The Explanation.

      If my apologia point seems obscure (how many of this blog’s readers studied theology?), perhaps Nate Silver said it better in his book, when he talks about pundits trying to explain how everything was inevitable from their point of view (even if they had predicted the opposite in last week’s show).

  9. Greg says:

    I like Jim Woodward’s view that an account explains to the degree that it allows one to answer questions about what would have happened if things had been different. By that standard, I don’t think Rojas has given us an intelligible explanation because there’s no single way to make sense of the idea of what would have happened if Argo hadn’t been about actors.

    I would add to Andrew’s point about explanation as model fitting that the model needs to admit a causal interpretation.

  10. revo11 says:

    I think of retrospective “explanations” (including mathematical models) as a form of hypothesis generation. An explanation that hasn’t been tested by prediction is basically an untested hypothesis of causation. It’s not worthless, but correct ones will be sparse and most of them will be wrong.

    Incidentally isn’t what you’re describing more an issue of “underdetermination”? I tend to think of “overdetermination” and “underdetermination” in the mathematical sense – where an “underdetermined” system has an multiple solutions (i.e. explanations) consistent with the constraints imposed by the data.

  11. I’m struck by what seems to me to be the self-evident wrongness of the critic’s implied claim that “if someone can know with near-certainty how something occurred, then they should have been able to predict it”. But this is clearly incompatible with a non-deterministic universe — even just one low-probability causal event can be known with certainty to have occurred but could not be expected to be predicted. This critic would perhaps not include such cases. But many outcomes we’re interested in are the result of causal chains; and in those cases you could have every particular causal relationship in the chain be something with medium-to-high probability yet the ultimate outcome as the product of those result in something unlikely. In such a case, we might be able to both forensically determine exactly each causal relationship in the chain and find each one of those more likely than not (and so this explanation would be extremely convincing and not at all like some freakish turn-of-events) and yet no one would claim that this sequence and the outcome could have been predicted.

    A lot of things in life are like that. In a game with relatively simple rules and relatively simple physics such as billiards, we can analyse a game and say with certainty how it is that one player won and the other didn’t. That doesn’t mean that the outcome should have been predicted.

    Philosophically, it’s very interesting to consider why explanations and predictions are not time-reversed versions of each other. At the deepest level of abstraction, the reason for this is either that the universe is non-deterministic or that the universe is deterministic but that all available analytical tools are insufficient for perfect descriptions and therefore use only approximations.

    But this is more about human psychology than anything else. It’s not uncommon for people to feel as this guy does. A notable example is the WTC collapse. The outcome seems to most everyone to be astounding, difficult to believe that it really happened. And so especially in that case, a clear and confident explanation strikes some as being unconvincing given that no one predicted such a thing could have happened. The lack of prediction is seen as damning of the explanation, and especially so because the outcome was so big and surprising. But this is very much an example of what I describe earlier — each, crucial causal relationship in the chain was both quite likely on its own and each of those relationships is extremely well-understood. But the entire chain is just one among a very large number of possible chains, almost none of which culminate in the buildings’ collapse. Prior to the disaster, it was quite reasonable that when architects and engineers and public safety officials all considered the possibility of an airplane strike, they focused on small planes and not a large passenger jet. And when they did consider a large passenger jet, they didn’t consider one striking the building at extreme speeds, far outside its flight profile (at least one of the two jets was moving at speeds where the wings literally could have been ripped off). And they didn’t consider an impact from a jet, at such speeds, fully laden with fuel. And they didn’t consider such an impact directed to the building’s core. But all of these things are required to get a large amount of jet fuel to burn around the steel core that supported the floors. And someone would have to consider such an event for it to occur to them that melting that core would likely cause a floor to collapse onto the one below and which would cause a chain-reaction pancaking.

    Each of these elements is well-understood and, alone, reasonably high-probability for the particular event upon which the chain relies. So the explanation is extremely clear and, in retrospect, the chain and its conclusion is “determined”. But it’s entirely reasonable that almost no one ever considered such an outcome. And so the lack of prediction on no way reflects upon the reliability of the explanation.

    And yet many people feel exactly that this is the case. The role that emotion is playing relative to disturbing or amazing events is almost certainly very important here. Not so much with Oscar predictions.

    But I think that this intuitive desire to see explanations and predictions as being equivalent has everything to do with something I talk about often: that human cognition is essentially teleological. Because our cognition is largely socially oriented to interpersonal interaction, our primary model-building faculty is one of building models of self-motivated agents, other individual beings like ourselves who we presume have motivations and goals and methods like our own. Reasoning backward from a presumed goal is a very powerful heuristic for predicting the behavior of agents who have motivations and goals that are well-understood. It’s an efficient solution to the problem. So it’s our chief mechanism for comprehension of most everything. That’s why children ask “why” and not “how”. It’s why it was so difficult to move from pre-scientific teleological natural science. (And note how I wrote that sentence — which is very much about “how”, yet I used the word “why”. But that’s how we speak.) Our strongest impulse for comprehension is to ask “for what purpose, for what goal, did this thing happen?”

    This implicit worldview is the inverse of the first-cause deterministic universe. In the teleological universe, there’s the final cause and everything is determined backwards from it. With our predisposition to this mode of thought, then just as in the clockwork deterministic universe an explanation is equivalent to a prediction. If we know all causal relationships and the result, then we should have been able to know this sequence and outcome beforehand. This is why people distrust confident and clear explanations that don’t have corresponding predictions — they intuitively believe that such explanations are implicitly asserting the corresponding predictions, even when they’re not. (Sometimes they are, but often they’re not.)

  12. Jonathan (a different one) says:

    It was really easy to explain the Titanic sinking, but almost impossible to predict. Same with the Challenger. Or the WTC. The problem is that predictive modeling, in any sufficiently complicated problem, always leaves out huge portions of the inputs and solves a simpler problem. The Titanic, as modeled, couldn’t sink. But the model was wrong. Once the damn thing sinks you can figure out why the model was wrong. It’s not literally impossible in these cases to make the right model — the one that predicts catastrophe in these events — but the possible space of every catastrophe that can happen is so large that the probability of properly modeling is small just from the curse of dimensionality.

    My day job is to explain, for the most part, not predict. (Or, more correctly, to use explanatory models to predict the but-for past under hypothetical manipulation of inputs.) But I know better than to use my models to predict.

  13. I see no strong difference between explanation and prediction. If a model fits the data well (including a penalty for large numbers of degrees of freedom in the most rigorous case), then rationally, we assign it a high probability. Logically, it makes no difference if the data arrived before or after the theory. What kind of theory would you get without some prior set of events to base it on, anyway?

  14. Peter Dorman says:

    This one seems so obvious to me that I must be missing something. Suppose there is a large set of causal factors, call it A+, that collectively determines a given outcome. Suppose there is a subset, call it A-, that we observe prior to the outcome. We predict on the basis of a model that tells us how to incorporate A-. After the event we know additional factors that bring us closer to A+. Some were observed during the time interval in which the outcome was observed. Others can be inferred from the fact that the outcome eventuated (and perhaps in the way it eventuated or some other quality). Thus we can have a richer ex post model to explain from. That’s hindsight for you.

    So say I have observed two patterns in past Oscars for best picture: some celebrate actors and acting, others the liberal virtue of moral complexity. I have information about the nominees for best picture and their content, and they tell me that Argo and Lincoln should be the two front-runners. (Actually, ZD30 is also about moral complexity, or wants to be, but let’s keep it simple.) My theory may or may not help me predict which one of the two it will be. Ex post, I can infer from the selection of Argo that the first pattern must have predominated in deliberations by the voters (not observable directly). This extra information permits me to explain.


  15. Geoff Sheean says:

    Isn’t explanation related to Peirce’s idea of abduction, concerning explanatory/plausible hypotheses for an observation? In Medicine, this will often lead to testing a hypothesis, which is more straightforward predicting in a Bayesian sense. Since probability is almost always involved (very, very few tests with sensitivity and specificity of 100%), this cannot be considered hypothetico-deductive.

  16. Steve Sailer says:

    Let me point out once again that what the world loves to talk about are things that are inherently hard to predict, such as which way Oscar voters will lean between two very Oscarish movies.

    As an example, I’ll make up an explanation for Argo over Lincoln: voters reasoned, “The best thing about Lincoln is the lead performance, so we’ll give Daniel Day-Lewis the Best Actor Oscar for that, but the rest of Lincoln was a little dull. The worst thing about Argo is Ben Affleck’s lead performance, so we’ll punish Ben Affleck twice, by not nominating for Best Actor and not even nominating him for Best Director because he hired himself to play the lead. But that’s overkill, because Argo is still an exciting, intelligent movie, and we’d like to encourage more such movies for grown-ups to be made, so we’ll let Ben get up on stage as a Best Producer.” Now, that’s a pretty fun explanation, but I have no idea if it is true about the recent past or how to test it, and it seems kind of unlikely to be a good guide to making predictions about the future because it relies on an idiosyncratic set of circumstances.

    The bigger picture is that we’re pretty good at predicting what movies won’t be Best Picture. Hundreds of movies were released last year, and the vast majority weren’t even nominated for Best Picture, and practically nobody ever expected them to be Best Picture. This includes “The Avengers,” which was, by any objective standard, an astonishing feat of movie-making. But few were surprised that The Avengers didn’t win Best Picture (or even be one of the nine nominees), because we already possess, semi-intuitively, quite accurate prediction models of what kind of movies are Best Picture timber, and The Avengers fit that model.

    No, what interests human being are things that are hard to predict because they are pretty close to toss-ups.

    So, we end up doing a lot of explaining after the fact about why Argo beat Lincoln because we aren’t very interested in talking about things we are good at predicting.

    • Andrew says:


      Yes, this is related to the statistical point that in binary classification, you get no information from the perfect predictions, and no information from tossups. What are most informative are the predictions with odds around 2:1, where more information can make a difference.

      The example I always use is presidential elections and my annoyance at people who claim to have predicted 10 out of the past 15 presidential elections, or whatever. Realistically I don’t think a forecaster should get credit for predicting the ties of 1960, 1968, and 2000, or the landslides of 1964 and 1984.

      • Steve Sailer says:

        Speaking of predictions and explanations, I’ve been interested in the topic of school achievement test scores since 1972. If I’ve read one article per workday on this subject since then, I’m up to about 10,000 articles on the topic. I noticed back when I was 13 in 1972 that the usual pattern is that whites and Asians do better on average than blacks and Hispanics. So, that’s the prediction I’ve been making ever since.

        And over those 10,000 or so subsequent articles, my predictions have turned out to be right maybe 99.5% to 99.% of the time.

        I’m fascinated by finding explanations for the small number of cases where my prediction didn’t turn out to be right. For example, St. Louis, being accessible by ship from the Caribbean, long had a small but elite Hispanic community of Latin American merchants and professionals. so the white-Hispanic test score gap in Missouri was long smaller than in most other states.

        Now, an accuracy rate of 99+% for four decades in the social sciences sounds pretty good, especially in a field that is crucial for understanding all sorts of policy questions: education, affirmative action, immigration and so forth. But, my high accuracy in prediction just leads to suspicions that my explanations must be unthinkable, so let’s assume the future can’t be like the past. For example, Republican strategist Karl Rove has long implied that Hispanic voters will, Real Soon Now, have lots of wealth (to question that would be racist) and therefore will be worried about, say, the Death Tax, and therefore will vote Republican Real Soon Now.

        So, far I’ve been right and Karl has been wrong, but predictive accuracy is less in demand than explanatory desirability.

        • Nick Cox says:

          I observed when I was 3 that on average women are shorter than men, and I’ve been right almost 100% of the time since. Occasionally I notice women’s basketball teams. I don’t take any credit for this, because nature arranged it that way, and other people noticed first, as I did find out quickly.

          Sure, that’s a cheap retort, but precisely how is Steve’s example different? As everybody’s aware, we can argue about whether society arranges it that way, or other things are involved too, and what if anything should be done; and all of that is important. But in the limit, anybody can claim success for “predicting” widely-known facts.

  17. Thomas says:

    I know that my view is old-fashioned, but here goes… When someone like me argues that there is an essential, logical, connection between explanation and prediction, we don’t that mean that if anything can be explained then it could have been predicted. Very often we don’t seek the sort of information we’d need to make the prediction until after the surprising event has happened. What I’m saying is: *if* we had known X before E took place, then *we would have been* able to predict E.

    Some of the examples suggested here fit precisely this mold. If we had known that the O-rings were frozen, we could have predicted the Challenger would explode. (Indeed, much of the scandal in that case had to do with the fact that the accident could have been predicted and prevented.) In the case of the WTC, we’d need to know (a) that planes would hit and (b) much more precisely what happens to steel under the ensuing fire. It took that event for us to learn just how dangerous large fires in steel-framed buildings can be. But do note that if there’s another 9/11, engineers (and fire-fighters) better well *predict* that the building is going to collapse within about an hour and take the necessary precautions. That prediction is connected to the explanation we now have.

    It’s important to emphasize that I’m talking about what I think we *should* mean by “explanation”, not what we in fact counts as explanation in ordinary life. If your inquiry doesn’t yield information that *if you had had it ahead of time* would have let you predict the explanandum then you simply haven’t yet explained the event. You’ve *offered* an explanation, but there must be more to it.

    It’s perfectly OK that there’s no possible way you could have known that two planes would have been hijacked on a particular day. It’s even OK that no-one really understood how the steel in the WTC would respond to the fires. The point is that *if* they had known those things they would have predicted the collapses. That’s exactly the sense in which those collapses can be considered “explained” today.

    One last thing. If you can only predict something with, say, 80% certainty, that’s fine too. Just grant me that you only have 80% of the full explanation. If you knew a little more you’d be able to predict it with 90% certainty. Etc. Since we’re talking about events that have already happened, we’d need to get to 100% certainty, i.e., a certainty commensurate with the reality, before we would say the event has been explained “fully”. Yes, we’re often satisfied with less. The connection between explanation and prediction, however, remains.

  18. Larry Wasserman says:

    I would say that explanation is
    a causal DAG

    • Andrew says:


      Good point. I noted in P.S. above.

      • Thomas says:

        I don’t understand that P.S. It is precisely a causal explanation that is predictive. If A causes B, then you can predict B from A.

        • Andrew says:


          Yes, if A causes B then you can predict B from A. But the converse is not true. You can have a predictive model that is not causal. But then it’s not much of an explanation, at least not by itself.

          • Thomas says:

            I agree with that. Nate Silver’s model of election outcome was built, as I understand it, on polling data. He also predicted Argo, based on the awards the films were given earlier in the year. In both cases the result is predicted without being explained. For me predictive power is the standard to which one can hold explanations. Since prediction is useful in itself, I don’t take explanatory power to be a requirement of predictive models.

      • Dan says:

        Hey, why does Larry get credit for noting causality is important! He’s about the last commenter of 20 or so, most of whom mention causality.

      • Fernando says:

        I thought this discussion was all over the place until I got here, and read the P.S.

        The beauty of simplicity.

    • Paul says:

      Is the “A” required?

  19. Ag says:

    I am confused by this meaning of over determined. In linear inverse problems then you would calk such a situation under determined.

  20. Nick Cox says:

    A bundle of various simple-minded comments:

    Explanation is clearly not a technical word, and there are lots of interpretations out there, so trying to seize on one form of explanation and identifying it as the best or strongest form — let alone the correct or true sense — is unlikely to work except among some small set of like-minded adults in exactly the same sub-field.

    I vote for working with causation too to the extent possible, but you just changed the question to what is causation?

    Often explanations should pay attention to purposes or goals, and for organisms and even some machines, not just people. No doubt it could be argued that causation can cover that too, but (in statistical terms) your variables have to record purposes too, at least indirectly.

    I’d wonder about the extent to which explanations by researchers match the explanations sought or entertained by people ordinarily, including researchers off duty. Many of us are highly satisfied by histories; indeed many, perhaps most, historians seem to reject cause as a naive or useless category. Also, “understanding” that need not (cannot?) be formalised is what gets us through more of our daily lives than “explanation”, I suggest.

  21. Christian Hennig says:

    Statistical models are about repetition. Sometimes bland repetition (i.i.d. models), sometimes more contrived (several repeating influences work together in a repeatable fashion although the combination of values that we see in a specific situation may not have occurred before).
    But this is an idealisation and how statistical models relate to explanation and prediction may depend strongly on how well the “repetition”-idealisation works.
    Whatever happens may very well be a result of a collaboration of events that is unique (or happens for the first time) and have an original element in their collaboration, too, so there is no way to predict them by models based on repetition. One may still explain them (or at least try). As a statistician this may mean coming up with a new repetition-based model assuming that there is now a condition that enables repetition that hasn’t been there before, or not known or taken into account before, which could be tested predicting the next ten situations. However, not only such a model counts as explanation and there is not much sample size for setting up and checking the model in the first place (although intelligence may substitute for some of that).
    Life didn’t sign a contract to be friendly to statisticians.

  22. Paul says:

    Popper did not dismiss ‘explanation’. He did argue that there were no ‘ultimate’ explanations, but at the same time clearly recognizing the role of explanation in science (from Objective Knowledge, 1979):

    “I reject the idea of an ultimate explanation: I maintain that every explanation may be further explained, by a theory or conjecture of a higher degree of universality. There can be no explanation which is not in need of a further explanation…”

    “By choosing explanations in terms of universal laws of nature, we offer a solution to precisely this last (Platonic) problem. For we conceive all individual things, and all singular facts, to be subject to these laws. The laws (which in their turn are in need of further explanation) thus explain regularities or similarities of individual things or singular facts or events.”

    Quantum computation physicist David Deutsch has written a terrific book on explanation — The Beginning of Infinity: Explanations That Transform the World — in which he gives Popper a lot of credit. Here’s a wonderful 16 minute talk by Deutsch on explaining ‘explanation’ (in which he discusses statistics a bit).

  23. Eric Rasmusen says:

    As Nick Cox, said, if you say explanation is about causation, you have to tell us what causation means.

    I really like Andrew’s statement that:

    “Prediction is important, it’s essential for model checking, but explaining is another word for inference within a model.”

    I’d go further and suggest that maybe explanation is another word for inference within a population. It’s really useful to have a simple relationship between X and Y. We can explain when full moons have occurred by noting that every 30 days a full moon occurs. That explanation doesn’t help explain other things, but it’s a good way to organize the data on past full moons. Is it causal? That depends on what you mean by causal.

  24. Thomas says:

    Yes, if we ask, “Why was there a full moon on May 6?” the answer (which is an explanation) is “There’s a full moon every thirty days and the previous full moon was on April 6.” The model explains the exact date of the full moon, but that’s about all, and the regularity (and I would mean this by “causal”) causes the date. Of course the model models a natural process that is cyclical (an orbit) and if we ask, “Why is the moon sometimes (i.e., not always and not never) full?” we would need to invoke that model to explain it. It would yield lots of predictions too. If it didn’t, it wouldn’t explain anything. Interestingly, that model would not have to yield exact dates. It might only say, “Over a one-month period, the moon wanes and waxes.”

  25. […] relevant to Brian’s recent posts on prediction: statistician Andrew Gelman asks what “explanation” is in a statistical sense, and why we might care about […]

  26. alex says:

    I don’t think causation is needed for explanation. What if two things are interrelated, either structurally or because they both cause each other, or select each other? I can explain why a = pi r^2, but neither a or r cause the other.