Thinking like a statistician (continuously) rather than like a civilian (discretely)

John Cook writes:

When I hear someone say “personalized medicine” I want to ask “as opposed to what?”

All medicine is personalized. If you are in an emergency room with a broken leg and the person next to you is lapsing into a diabetic coma, the two of you will be treated differently.

The aim of personalized medicine is to increase the degree of personalization, not to introduce personalization. . . .

This to me is a statistical way of thinking, to change an “Is it or isn’t it?” question into a “How much?” question. This distinction arises in many settings but particularly in discussions of causal inference, for example here and here, where I use the “statistical thinking” approach of imagining everything as being on some continuous scale, in contrast to computer scientist Elias Bareinboim and psychology researcher Steven Sloman, both of whom prefer what might be called the “civilian” or “common sense” idea that effects are either real or not, or that certain data can or can’t be combined, etc.

My preference for continuous models is closely connected to the partial pooling of Bayesian inference, but I don’t think I have this attitude because I’m a Bayesian. [Hey, look, when you’re not paying attention, you slip into non-statistical discrete thinking! — ed. Yup, guilty as charged. — AG.] To put this more formally, I think that my training and experience with Bayesian methods has reinforced my preference for continuity, but in turn my taste in modeling has affected what methods I use. [There you go again! — ed. I know, I know, I can’t help it, thinking like a human. — AG.] After all, there are lots of discrete Bayesian models out there (Bayes factors, etc.) and I don’t like them—in fact, I have a problem with their discreteness.

Another example is the use of a numerical measure rather than a yes/no summary (for example, a depression inventory scale rather than a cutoff yielding the distinction “is or is not depressed”).

Or consider decision making. Lots of theory and evidence, from Paul Meehl onward, suggests that people tend to think lexicographically (making decisions by first considering factor A, then using factor B to break the tie if necessary, then using factor C to break the tie after that, and so on) rather than continuously (for example, constructing a numerical weighted average of A, B, C, etc.). Sure, lexicographic rules are clean and easy to understand, and there are settings where they can approximate or even outperform a weighted average, but we can also get the reverse, a lexicographic rule that’s a complicated mess (see, for example, Figure 6 of this article).

And yet another example is our acceptance of uncertainty. One of the big themes of statistics is that we should be more comfortable admitting what we don’t know, and one of the big problems with many statistical methods as they are applied in practice is that are taken as a way of denying uncertainty. For example, you conduct an experiment, analyze your data, and conclude the results are statistically significant, or not. The implied (and often explicitly stated) conclusion is that the effect is real, or it is not.

To put it another way, I have two problems with the formulation of statistical tests and conclusions as “true positive,” “true negative,” “false positive,” “false negative.” Here are my problems:

1. “true”/”false”: In almost all cases of interest, I don’t think the underlying claim is true or false (at least, not in a way that can be directly mapped into a particular statistical model of zero effect, as is generally done.)

2. “positive”/”negative”: To me, this one’s the biggie. Even if you were to accept the idea that the null hypothesis might be true, I don’t think it’s a good idea to summarize scientific conclusions in this yes/no, significant/not-significant way. Sure, sometimes you really have to make a decision (apply policy A or policy B), but that’s a decision problem. At the inferential stage, I’d prefer to acknowledge my uncertainty.

P.S. See also p.76 of this article.

28 thoughts on “Thinking like a statistician (continuously) rather than like a civilian (discretely)

  1. But the insistence on continuity may have consequences that was not intended: It is more difficult to make flexible continuous models, so we end often (too often?) up with linear models. Tree models in regression s was invented to combat this linearity! and there you have it — tree models are — discrete! Easier to avoid linearity with discrete models.

    There surely is a compromise somewhere , but where? additive models? D Cox argued against them, saying that interactions are more important …

  2. In thermodynamics physicist don’t control the micro-state of a gas directly. They manipulate macro-variables like pressure and temperature. Only indirectly through them do they affect the micro-state.

    The same is true for much or most of the social and life sciences. A doctor doesn’t control the micro-state of their patient. They control macro-variables like “the amount of drug X in their body” and hope that effects some other macro-variable (their blood pressure). A Doctor prescribing a drug only controls the micro-state indirectly.

    If that’s the situation your in, then asking “did macro-variable A cause macro-variable B” and treat it as a yes or no question is going to be highly inappropriate. In that case do like Gelman does.

    On the other hand, if you can control the “macro-state” directly (for example cutting a patient’s spinal cord) then you’re going to want to ask some yes/no questions about cause/effect just like a naive civilian would.

    P.S. All civilians are naive.

    P.P.S. in thermodyanimcs nobody ever seemed to care about this issue. The physicists just went ahead and found stable relationships between the macro-variables and then used them to make accurate predictions.

  3. I like the continuity that you are talking about but I have trouble conceptualizing how this can be applied to some circumstances.

    I can see that if you are comparing schools, you could envisage a particular school being a realisation of a continuum of schools. However, in some cases I struggle to see the continuum – for example, if I was comparing the effect of two distinct chemicals for a particular effect, these are distinct. While it might be possible to envisage some representation of the chemical properties being a continuum, it would be impossible to create compounds with any given set of properties. The discrete nature of chemistry would mean that some values are impossible to realise.

    Am I missing the point here?

  4. Isn’t it how we look at solving problems a problem in itself? I mean that the difference between discrete and continuous thinking may be continuous.

  5. This particular universe is doomed to be discrete, but our representations of it (models) are doomed to be continuous :-(

    We can re-represent these models (our representations) as being discrete – for simplicity sake – but that is just a convienience.

  6. Sadly, one of the more important medical outcomes (death) is a relatively discrete state. Okay, there are some states of “alive” that are very sub-optimal. But, generally speaking, living or dead is a binary choice of state. :-)

    • What’s estimated when death is the outcome of interest is typically the probability of death, conditional on X.

      In that case, the discrete/continuous alternatives are to either frame a study in terms of “is X associated with death?” or as a direct estimate of the dependence of the probability of death on X. The latter is the better approach.

    • Medically, death is assessed by deciding whether certain (non-negative) continuous-valued measurements (such as EEG measurements aggregated over a finite time period) are zero (or close enough to zero to be called zero). That’s a continuous question.

  7. The situation you’re expressing sounds similar to that expressed by system dynamicists (Jay Forrester, John Sterman, and many others). That field usually takes a deliberately continuous view of a problem even when the underlying situation may have discrete characteristics. The thinking processes used in system dynamics come largely out of continuous feedback control theory, and the goal is (usually) insight, not numerical prediction.

    • I think you’re describing a separate issue, although there are certainly cases when things get blurry (no pun intended).

      There’s the issue of problem representation and then there’s the issue of what’s being estimated (which can include parameter estimation, model selection statistics, p-values, etc. as special cases). I think the problem here is with respect to discretization of “what’s being estimated” – it should be continuous even if aspects of the problem representation are discrete (outcomes, predictors, latent variables) .

      The system dynamicists are imposing a continuous representation of problems that are often inherently discrete (for example, the number of molecules of a specific enzyme in a cell). Here the discretization is with respect to the problem representation. This is sometimes an acceptable approximation but often not, particularly when the number of particles is small or when uncertainty in the model prediction needs to be quantified.

      Another way of looking at this is to ask whether you’re losing useful information about nature. Discretizing an effect into true/false throws away information. Likewise, when a continuous dynamical system is used to model a discrete process, that throws away information as well.

  8. Are you opposed to discrete presentation of continuous models? Don’t most non-Baynesian methods also use continuous models? I’m thinking about this post related to your post about Nate Silver’s too-precise probability estimates. In that post, you seem to be advocating discretizing the probabilities.

    • Kaiser:

      I don’t think of rounding as discretizing. Nate presented the probability as 65.7%. I would write this as 70%. Either way, we’re rounding. I’m just rounding at a different point than Nate is. He’s not presenting the number as 65.671839238902348902390823490%.

  9. The application of decision aids in medicine involves discrete choices — should a patient receive drug A or drug B, should a patient receive a follow-up call from a nurse after discharge, etc. The models that determine the good or bad outcomes from these choices may be continuous but eventually something must be done and that choice relies on a threshold value from the model.

  10. Some confusion here. Personalized medicine has a specific definition: does the compound require some “stuff” from the patient in order to be made? Dendreon’s Provenge is the au courant archetype. It is a discrete distinction. Kind of like being pregnant.

  11. Andrew,

    I fail to see how the “continuous vs discrete” distinction is even remotely connected to causal inference.

    For concreteness, Judea was interviewed recently by AMSTATS (link: http://magazine.amstat.org/blog/2012/11/01/pearl/) and cited three typical tasks addressed by causal inference methods.

    (1) What is the expected effect of a given treatment (e.g., drug) on a given outcome (e.g., recovery)?
    (2) Can data prove an employer guilty of hiring discrimination?
    (3) Would a given patient be alive if he had not taken the drug, knowing that he, in fact, did take the drug and died?
    (Questions (2) and (3) are usually answered probabilistically).

    Can you take any of these examples and demonstrate how a research question that is stuck in discrete thinking can be redeemed by continuous thinking?

    • Elias:

      My own preferences aside, I think that some phenomena (causal and otherwise) can best be studied using discrete models, others using continuous models, and yet others by a mix of the two approaches. In the two links above, I gave examples where Sloman and you, in different settings, used discrete formulations where I would prefer continuous. In Sloman’s example, he writes that if beer consumption affects intelligence, than intelligence cannot affect beer consumption. Once you have a model in which either of these causal statement can be true, I don’t see why they can’t both be true, to varying extents. But Sloman is thinking discretely, it has to be one or the other (or neither), not both. Sloman’s example is indeed connected (not just “remotely” to causal inference. In your example, you were considering whether data from one city should be used to make inferences for decisions in another city. You framed it discretely, as a yes/no question, whereas I prefer a partial-pooling approach. I’m not trying to “redeem” anything, it’s just that I prefer a continuous model for bias correction rather than having to choose between no pooling and complete pooling. This comes in with our radon example also, where we use a statistical model to combine sparse precise unbiased data with dense noisy biased data. That example is causal too; we were concerned with the effect of radon remediation on health outcomes.

  12. Andrew,

    To understand the distinction you make between discrete and continuous thinking, we need to draw parallels. You wrote:

    “In your example, you [Barenboim] were considering whether data from one city should be used to make inferences for decisions in another city. You framed it discretely, as a yes/no question, whereas I prefer a partial-pooling approach.”

    True, I do frame the question as a yes/no question: “Is it or isnt it possible to obtain an unbiased estimate of the causal effect in the new city?”. Can you explicate what research question you pose for this task which has a continuous flavor?

    The “partial-pooling approach”` is a “method of analysis”, not a “question” nor a “frame of a question”.

    • Elias:

      In Sloman’s example, my question is, what are the causal effects? His answer is that it’s one or the other or neither. I prefer a continuous answer in which both can be there, and I am interested in the strength of the effects, not a yes/no of whether they are there.

      In your example, my question is, how much pooling should be done? Your answer is all or none, I prefer a continuous answer of partial pooling.

      In the radon example, my question is, what is the health risk in a particular house. Some people prefer to frame this as a discrete labeling of a house as safe or unsafe. I prefer a continuous answer.

      • Andrew,

        You write: “In your example, my question is, how much pooling should be done? Your answer is all or none, I prefer a continuous answer of partial pooling.”

        Sorry, but “how much pooling” is not a research question. It chooses a method of analysis but does not define the purpose of the analysis.

        For comparison, my research question was: “Is it possible to obtain an unbiased estimate of the causal effect in the new city?” It refers to something researchers may wish to know, not to the method they may use for finding it out.

        Can you sharpen it a bit, so that we can see where the continuous element comes in?

        eb

        • Elias:

          You write: “Sorry, but ‘how much pooling” is not a research question.”

          I am glad that the research world is diverse and that it is not up to you to decide what other people consider research questions! I went through enough of this sort of attitude in the earlier years of my career—people declaring that what I was doing was not research, or not serious, or whatever—that I have no patience with it now. You go work on your own research questions about unbiasedness or whatever, I will work on my own research questions about partial pooling. It is my experience that many many people other than myself consider these to be research problems too.

        • Andrew,

          I think you misunderstood my question. No one doubts the importance of asking “how much pooling”. But all the distinctions you made between continuous and discrete thinking were related to the goals of research questions, not to the methods we use to answer them. For example, when you talked about the radon problem, your goal was to asses the magnitude of “the effect of radon remediation on health outcomes”.

          All I am asking now is what the goal is of asking “how much pooling”, and whether the distinction between continuous versus discrete modifies that goal.

          The many people who consider “how much pooling” to be a research question share an implicit understanding of what comes after “How much pooling, in order to obtain ….” A few people, including myself, are not in possession of this knowledge.

          I have shared with you my “in order to obtain..” Can you share the one behind pooling? It should not take more than a couple of lines, and it would clear the differences between our goals (if any)?

        • As a supplement to my last post, I would like to summarize how I see the current state of our exchange on this issue.

          We know from first principles that no causal conclusions can be obtained without causal assumptions. Causal assumptions usually come in a yes/no format. For instance, Rubin’s ignorability assumptions invoke conditional independence (among counterfactuals) and are therefore yes/no type statements.

          Therefore, given that Andrew stated that he prefers a continuous approach to causal inference, contrasting my yes/no approach, I was curious to see how he  managed to encode causal assumptions in a continuous format.

          So far, it is still not clear to me how his two problems (“how much pooling” and  “the effect of radon”)  incorporate causal assumptions in the analyses.  Such assumptions (e.g. ignorability) are understood to be necessary in handling “causal” problems, I am curious therefore to see what mathematical form these assumptions take when continuous thinking is applied.

          eb

  13. Pingback: Weekly links for November 18 « God plays dice

  14. Lots of interesting points raised in what I think is the most interesting post of yours I have read. Paradoxically, I think your continuous versus either-or is not completely discrete, or continuous! With much decision-making, I think you are spot on – it’s silly to think that a person who scores a 25 on a scale is clinically depressed while a person with a 24 is just hunky-dory.

    It’s also true, however, that people sometimes consider Factor A and then proceed to Factor B as a tie-breaker. Take marriage as a decision. When I met my husband, he was very upfront by our second date about the fact that he wanted to have children, and, sooner rather than later. If motherhood was not in my plans, marriage wasn’t in his. Interestingly, a couple of years ago, one of my daughters broke up with a guy who didn’t want children in his future. She said, “He never wanted to have kids and I want to be a mom some day, so that was a deal-breaker for me.” Yes, there are ways to model this. The fact is, sadly, most people don’t think that deeply and just throw all the variables into a regression equation. (I confess, some days most people includes me).

  15. The author, and all of us, really need to study how the neurons in the brain work to get at the basis of how neural systems — in all animals — process uncertainty. Holler if interested.

    Hint: “Higher order concepts” like emotion, choice, decision making, free will, etc have been debunked.
    Hint #2: Apparently brains make the decision go/no go in 150 ms! That’s real fast for any sorts of calculations. It also turns out that all animals are reall good at probabilities. “The bug is over there. I am here?” The bad Bayesian animals died very young.

    The fact that stimuli (reality) is continuous (feedback), episodic and fractal is a good reminder.

  16. Pingback: What is expected of a consultant « Statistical Modeling, Causal Inference, and Social Science

Comments are closed.