Chasing the noise

Fabio Rojas writes:

After reading the Fowler/Christakis paper on networks and obesity, a student asked why it was that friends had a stronger influence on spouses. In other words, if we believe the F&C paper, they report that your friends (57%) are more likely to transmit obesity than your spouse (37%) (see page 370).

This might be interpreted in two ways. First, it might be seen as a counter argument. This might really indicate that homophily is at work. We probably select spouses for some traits that are not self-similar. While we choose friends mainly on self-similarity of leisure and consumption (e.g, diet and exercise). Second, there might be an explanation based on transmission. We choose friends because we want them to influence us, while spouses are (supposed?) to accept us.

Your thoughts?

My thought: No. No no no no no. No no no. No.

From the linked paper:

A person’s chances of becoming obese increased by 57% (95% confidence interval [CI], 6 to 123) if he or she had a friend who became obese in a given interval. . . . If one spouse became obese, the likelihood that the other spouse would become obese increased by 37% (95% CI, 7 to 73).

So no no no no no. There’s no good evidence

In discussing this example, I’m not getting into the controversies (also here) surrounding this research, nor the issue that the difference between significant and non-significant is not itself statistically significant, nor the general point that published estimates tend to be too high and I certainly don’t believe the upper ends of the confidence intervals shown above. As I’ve written before, I think the Christakis and Fowler work is interesting. Yes, the analysis has threats to validity, and I agree with the critics that this research has not proved anything definitively, but I think it’s thought-provoking work on an important topic, and research has to start somewhere.

So, just to say it again: I’m not criticizing Christakis and Fowler here. The statement in their abstract is clear, and it is also apparent that the intervals overlap. The mistake made by Rojas’s student is to overinterpret chance variation. The student is chasing the noise.

Here’s a quick analogy. Suppose you pull two students out of the classroom to take basketball free throws. Amy sinks 6 out of 20 shots, and Beth sinks 4 out of 20. Amy and Beth are of similar physical conditioning, and neither is on the basketball team. The basic statistical estimate is that Amy’s probability of success is 30% and Beth’s is 20%. Rojas’s student might then come up with some theories as to why Amy is better: maybe Amy comes from a more supportive family so she has more self-confidence and a higher probability of success; maybe Beth was under more pressure because she took her shots after Amy was done so she had a fixed goal; maybe they were responding in different ways to expectations; Beth is 2 cm taller and maybe she was overconfident and didn’t focus; etc. But then we compute the standard errors: the (simple classical) estimate for Amy’s success rate is 0.30 +/- 0.10, and the estimate for Beth is 0.20 +/- 0.09. The difference could easily, easily be explained by chance. It’s not that any of these (hypothetical) explanations are necessarily wrong—indeed, they could all be true, it could be that Amy’s family is more supportive and that Beth was under more pressure etc etc—but there’s essentially no evidence to support them. The explanations could just have well been made before any shots were even taken.

Similarly with “friends vs. spouses in influence.” Any or all of Rojas’s student’s stories could be true—or none of them. The trouble is that they’re being used to explain a pattern that could well be noise.

It still could be a useful exercise: first explain pattern A, then explain pattern not-A. It’s a great way to think about the connections between theory and empirical results. But please, please don’t forget that in this case there’s just about no evidence for either of the two possibilities. Even if you completely accept the framework of the published research and don’t worry about any selection effects, you’re comparing the confidence interval [6, 123] to the interval [7, 73]. That’s not a statistically significant difference, to say the least! Again, it doesn’t make the student’s theories wrong, it’s just that they just as well could have been formulated in the absence of any data at all. The data add essentially nothing here.

I’m doing this all not to rag on Rojas, who, after all, did nothing more than repeat an interesting conversation he had with a curious student. This is just a good opportunity to bring up an issue that occurs a lot in social science: lots of theorizing to explain natural fluctuations that occur in a random sample. (For some infamous examples, see here and here.) The point here is not that some anonymous student made a mistake but rather that this is a mistake that gets made by researchers, journalists, and the general public all the time.

I have no problem with speculation and theory. Just remember that if, as is here, the data are equivocal, that it would be just as valuable to give explanations that go in the opposite direction. The data here are completely consistent with the alternative hypothesis that people follow their spouses more than their friends when it comes to obesity.

P.S. I checked back on Rojas’s post and, scarily enough, two of the three comments offer potential explanations for the difference, with neither commenter seeming to realize that they are chasing noise. Again, there’s nothing wrong with theorizing but I think it’s helpful to realize that these are pure speculations with essentially no basis in data, so one could just as well be giving explanations for why the underlying difference goes in the opposite direction.

22 thoughts on “Chasing the noise

  1. What are your thoughts on publishing only CIs, leaving the point estimates out of figures, even tables and text?

    Anybody who wanted to flout the uncertainty would have to actively work around the CIs (e.g. by averaging their end points and ending up exposing their ignorance if they do that to odds ratios. Or it might force them to read the supplement).
    At the moment the point estimates are still the most prominently presented piece of information and the easiest to (imagine that you) understand, but also easily the most abused piece of information (okay maybe a distant second to p-values :-).

    Actually, now that I think about it, they may do the least harm in figures (e.g. coef plots) where you can’t really perceive them without the CIs, but in tables and text they’re only there for those who care anyway.

    • Ruben:

      No, I’m not a big fan of reporting confidence intervals. The trouble is that they draw attention to the endpoints of the intervals, and typically the upper endpoint is some ridiculous number that nobody should believe.

      • What’s your preferred (non-graphical) way of reporting uncertainty? I’ve taken to reporting estimates and standard errors, since it makes people do a little work if they want to compare my estimates to one another, but still quantifies the uncertainty.

        • See Andrew’s post and comments from a while back, and follow links, although these are for graphics.
          I’ve always disliked the usual display of C.I.s that visually overemphasize the extremes.

          My favorite graphical treatment is here, IPCC AR4 (climate). Compare 2nd and 3rd graphs, which cover the same period.
          The first, “spaghetti” graph does not have C.I.s, which would be really hard to show. The second plot combines the lines and their C.I.s into a density plot, which to me far better communicates the reality of the data.
          Of course, that graph is much harder work to create than simple line graphs.

  2. Would love to get your take on the following argument. I’m interested in whether an effect say of variable X is appreciably different from zero. I have a simple linear regression model of the form Y ~ a + b*X. When I do the simple regression b is significantly different from zero (say 2 and a half standard errors from zero) and all is well. Now I control for a confounding variable Z, so now Y ~ a + b*X + c*Z. The effect size b declines to about half of its previous value, and becomes statistically insignificant (say 1 and a quarter standard errors from zero), with the standard error in the two estimates staying about the same.

    Here’s the conclusion I’d like to make: “on its own, X matters” and “factoring in the effect of Z explains about half of the effect of X”. Is this legit? If we looked at only the second regression, we’d have no evidence (or very weak evidence) that X matters at all.

    • I don’t know what Andrew will say about this but I have a few comments of my own.

      First, when you say you’re interested in the “effect of X”, that could be interpreted as a causal question. If these are observational rather than experimental data, you’re not going to answer that just by looking at regression coefficients in absence of other information.

      Second, you started out asking if the effect of X is “appreciable” but then you immediately switch to discussing whether it’s statistically significant. Those are two very different questions: if your data set is large enough and predictions have very little error then even an effect that is negligible in practice can be statistically significant (with a sample size of 1 billion you could find that people with some specific characteristic are 1 mm taller than other people), and with a small, variable sample even a large and important effect could be lost in the noise.

      Also, presumably you didn’t just choose X at random from a list of potential predictive variables; you probably have some reason to believe it’s useful.

      Putting it all together, I think you can say that X improves your ability to predict Y, but that this effect may be small if you also know Z. If Z is cheap to determine and X is expensive, then you need to do some kind of decision analysis to figure out if it’s worth collecting X. Otherwise you may as we’ll keep X and Z; X isn’t hurting you.

      Perhaps you have some additional information that would let you put weak priors on the regression coefficients…but watch out for thar correlation/causation issue: it’s entirely possible for X to have a positive causal effect on Y but for X and Y to be negatively correlated in your data.

  3. “A person’s chances of becoming obese increased by 57% (95% confidence interval [CI], 6 to 123) “

    Naive question. Isn’t that an awfully broad CI?! If it may have been a 6% increase or a 123% increase and we cannot really tell isn’t that a pretty useless model or technique?

  4. It seems to me that fat friends and fat spouses would be related (i.e., one tends to find spouses from their friend network); so to model them as separate things may miss the potential interdependence between the two. I’d create a variable with four levels: fat friends and fat spouse; fat friends but nonfat spouse, nonfat friends but fat spouse; and nonfat friends and nonfat spouse as the reference. I think you could then see if there were clear differences between having fat friends and having a fat spouse. My guess would be that having both would be more important than have one or the other.

    Andrew, I agree with your interpretation of the above results but I was wondering what you would recommend if we had similar numbers that needed to be used to direct agency resources. Say we had the same result for a probation agency trying to figure out what risk factor for recidivism they needed to intervene on for those under supervision. If factor A (say homelessness) increased recidivism by 57% and factor B increased recidivism by 37% (say unemployment) yet their respective CI’s overlapped, which factor should we decide to intervene on? Most agencies don’t typically have the resources (time, money, staff) to tackle multiple issues. And though one could make the argument that an intervention on either factor would potentially impact the other, it seems to me, barring any additional information, that the one with the higher OR should be targeted first.

    • Digithead:

      In response to your question in your second paragraph: I think the way to go is a formal decision analysis, balancing costs, benefits, uncertainties, and prior information. I would not necessarily first try the intervention with the higher point estimate or the higher significance level based on a single noisy study. Sure, the study provides some information, but I think you have to do your best to look at the big picture.

      • Yes, looking at the big picture is a necessity, but often the big picture is pretty cloudy and decisions are made on little to no evidence (or p-values that mislead).

        I share your skepticism of small studies with big effects but how can we have any influence on policy if all we say is look at the bigger picture, collect more data, and try to balance costs and benefits? At some point a decision has to be made.

        The current controversy over disease testing, especially prostate and breast cancers, highlights this issue. The pro-groups use early detection and treatment to support their position and the anti-groups arguing that that the adverse effects of treatment outweigh the benefits. Where is statistical line in the sand that says one position is better than the other? It’s definitely subjective. Similarly when choosing to decide which social intervention to perform where there is much more variation (rather than just noise) that clouds the big picture.

        But that’s probably another post so I’ll quit derailing this thread.

        • Digithead:

          You write: “I share your skepticism of small studies with big effects but how can we have any influence on policy if all we say is look at the bigger picture, collect more data, and try to balance costs and benefits? At some point a decision has to be made.”

          My reply: I do think that formal decision analysis can he helpful even if imperfect. We give examples in the decision analysis chapter of BDA. In any case, I think that we can do better than apply simple rules such as to use the higher point estimate or the higher significance level from a single study.

          Your cancer screening example is a good one because in that problem I think people have made useful progress using decision analysis.

    • The one that has the biggest effect under reasonable models for the success rate of intervention. Let’s first ignore the causal/correlation question and assume both effects are causal. You could intervene on homelessness and find that it does nothing to reduce homelessness, so although it seems to be the bigger lever, it’s stuck a lot tighter, to use a physical analogy. Perhaps on the other hand, intervention on unemployment tends to reduce unemployment and also by doing so reduce recidivism… So choosing to intervene on homelessness, though it seems to have the biggest bang, is actually a terrible idea under these circumstances.

      Pilot programs to estimate the effects of intervention are of course a good idea, but barring that perhaps data from other agencies, other localities, etc could be used to build a model for the intervention.

      • Daniel:

        Regarding your last paragraph: My impression is that the purpose of a pilot program is not to estimate effects but rather to get a sense of the practical challenges of implementation.

        • I think that depends. During welfare reform, some states were given waivers to try different programs with the requirement that there was a rigorous evaluation built in (see Connecticut Jobs First). Also, I think a lot of the big RCTs in Development are at least framed as pilot projects – granted, they aren’t all run by governments, but they are set up specifically to measure impacts with the idea that if they proved cost-effective they would be scaled up. I’m thinking of the Miguel/Kremer de-worming stuff, and the Duflo/Dupas/Kremer stuff on Kenyan education reform.

  5. On July 26, 2007, the blog Big Fat Deal
    posted complaints (to put it mildly) about the “obesity is contagious”
    claims, under the heading “Will Purell Help?” The next day, a commenter
    wrote, “I find the article interesting, mostly because it’s a great find
    for anyone who wants to teach students how to ask questions about research
    studies.” In recent years, a number of courses in a wide variety of
    disciplines have been doing just that. We don’t know what Prof. Rojas
    replied to his student, but certainly a good exercise is to find the same
    or similar mistakes made by the authors themselves in that very article. In
    fact, it underpins their claim that a shared environment is not responsible
    for the associations they claimed to have found. Answers to this exercise
    do appear on the web, but they are not complete.

    • Russ:

      Just to be clear, I do think that the criticisms that you and others have made of that work are valid. I just thought, for the purpose of responding to Rojas’s student’s question, it was simplest to point out that, even if the obesity-contagion study had no problems of validity, it would still be “chasing the noise” to seek an explanation for a difference that was well within random variability. I see people—including experienced researchers—doing this all the time, so I wanted to point out the problem without getting into the details of the particular study.

  6. Andrew, you are indeed doing a service to the community with this post. Because of the prevalence of this type of mistake, as you say, I think it would be good for students to learn about it in more than theoretical terms. They should get actual practice in spotting the mistake. This helps to develop their critical thinking skills, which, as you know, are sorely needed in practicing (or even just consuming) statistics. The same type of practice is beneficial for other types of mistakes that are common in statistics. Don’t you agree?

    • Russ:

      Yes, I agree. As we say in statistics, God is in every leaf of every tree. Any example, when considered carefully, can yield insight after insight after insight.

  7. Pingback: comments on andrew gelman’s dec 21 post | orgtheory.net

  8. Pingback: Somewhere else, part 104 | Freakonometrics

Comments are closed.