The inclination to deny all variation

One thing we’ve been discussing a lot lately is the discomfort many people—many researchers—feel about uncertainty. This was particularly notable in the reaction of psychologists Jessica Tracy and Alec Beall to our “garden of forking paths” paper, but really we see it all over: people find some pattern in their data and they don’t even want to consider the possibility that it might not hold in the general population. (In contrast, when I criticize these studies, I always make it clear that I just don’t know, that their claim could hold in general, I just don’t see convincing evidence.)

The story seems pretty clear to me (but, admittedly, this is all speculation, just amateur psychology on my part): in general, people are uncomfortable with not knowing and would like to use statistics to create fortresses of certainty in a dangerous, uncertain world.

Along with this is an even more extreme attitude, which is not just to deny uncertainty but to deny variation. We see this sometimes in speculations in evolutionary psychology (a field where much well-publicized work can be summarized by the dictum: Because of evolutionary pressures, all people are identical to all other people, except that all men are different from all women and all white people are different from all black people [I’ve removed that last part on the advice of some commenters; apparently my view of evolutionary psychology has been too strongly influenced by the writings of Satoshi Kanazawa and Nicholas Wade.]). But even in regular psychology this attitude comes up, of focusing on similarities between people rather than differences. For example, we learn from Piaget that children can do X at age 3 and Y at age 4 and Z at age 5, not that some children go through one developmental process and others learn in a different order.

We encountered an example of this recently, which I wrote up, under the heading, “When there’s a lot of variation, it can be a mistake to make statements about ‘typical’ attitudes.” My message there is that sometimes variation itself is the story, but there’s a tendency among researchers to express statements in terms of averages.

But then I recalled an even more extreme example, from a paper by Phoebe Clarke and Ian Ayres that claimed that “sports participation [in high school] causes women to be less likely to be religious . . . more likely to have children . . . more likely to be single mothers.” In my post on this paper a few months ago, I focused on the implausibility of the claimed effect sizes and on the problems with trying to identify individual-level causation from state-level correlations in this example. At the time I recommended they give their results a more descriptive spin, both in their journal article and in their mass-media publicity.

But there was one other point that came up, which I wrote about in my earlier post but want to focus on here. The article by Clarke and Ayres includes the following footnote:

It is true that many successful women with professional careers, such as Sheryl Sandberg and Brandi Chastain, are married. This fact, however, is not necessarily opposed to our hypothesis. Women who participate in sports may “reject marriage” by getting divorces when they find themselves in unhappy marriages. Indeed, Sheryl Sandberg married and divorced before marrying her current husband.

This footnote is a striking (to me) example of what Tversky and Kahneman called the fallacy of “the law of small numbers”: the attitude that patterns in the population should appear in any sample, in this case even in a sample of size 1. Even according to their own theories, Clarke and Ayres should not expect their model to work in every case! The above paragraph indicates that they want their theory to be something it can’t be; they want it to be a universal explanation that works in every example. Framed that way, this is obvious. My point, though, is that it appears that Clarke and Ayres were thinking deterministically without even realizing it.

46 thoughts on “The inclination to deny all variation

  1. evolutionary psychology (a field whose precepts are sometimes summarized by the dictum: Because of evolutionary pressures, all people are identical to all other people, except that all men are different from all women and all white people are different from all black people).

    Who actually came up with that nonsensical dictum? The basic premise of ev psych is to deny the existence of any racial differences. This was a conscious decision by the founders of the field (e.g., Tooby & Cosmides) who wanted to distance the field from the acrimonious racial disputes of older hereditarian paradigms.

    • P:

      You may be right that some evolutionary psychologists want to deny the existence of any racial differences. But that’s hardly a universal view among evolutionary psychologists. Give Nicholas Wade a call; he can probably give you a long list of evolutionary psychologists who are very interested in racial differences.

      • I don’t think Wade could name more than a couple. Of course, if you redefine “evolutionary psychology” to include, say, differential psychologists and behavioral geneticists, you will find lots of “evolutionary psychologists” with racialist views, but it’s absolutely clear that evolutionary psychology as a field has been dogmatically non-racialist from the outset.

      • I agree with P: evolutionary psychology, as a discipline, has gone to considerable lengths to argue against important racial differences in cognition. For instance, the gold standard in EP is to find a putative effect in every culture on the planet, African cultures included. Other EP research has aimed to undermine “race” as a fundamental aspect of the psychology of social classification, seeing racial classification instead as a byproduct of a universal in-group-outgroup psychology e.g.:

        http://www.pnas.org/content/98/26/15387.full

        As further evidence that race plays little role in EP, abstracts for the HBES meeting, which is the one most evolutionary psychologists attend, are here:

        https://www.hbes.com/conference/

        Search for race. You will find very few abstracts that discuss it, and those that do rarely argue for innate racial differences.

        • Tooby and Cosmides did much to rescue sociobiology in the early 1990s from the Thought Police, but their notion of only paying attention to human universals is inevitably limited and thus tends to run out of gas. From the late 1990s onward, most of the intellectual energy in the evolutionary understanding of humans has been more balanced and open-minded about the inevitability of racial differences evolving than in classic early 1990s evolutionary psychology.

  2. How does one try to systematically figure out if the pattern in the data is or is not likely to hold in the general population?

    If the pattern is not likely to hold in the general population what interest do we have in looking at such patterns? Ergo, if we are letting through too many such papers, where the pattern seen in the sample is not likely to hold in the general population, isn’t this a systemic flaw in our publishing processes?

  3. The social science literature is littered with ToEs. By comparison physicists are a bunch of losers. The sooner they are replaced with psychologists, the closer we will get to unlocking the secrets of the universe.

  4. I appreciate these posts very much. Having spend most of like working in the physical sciences and engineering, I’m often frustrated reading so many facile statistics-based conclusions in the literature (and I’m not discounting the physical sciences too).

    I’ve thought of these conclusions as a sort of “reification of the mean”, in which the investigator creates a mystical uniform population that have the properties of the mean and then examine a representative their mystical population.

  5. Andrew’s statement, “in general, people are uncomfortable with not knowing” is, I think, central to all of this. Routinely, I see people present data in which the noise clearly make any estimation of signal meaningless, yet the conclusion is almost never, “We just can’t say anything about this effect.” Why?
    My two (not exclusive) guesses have been that (1) The researcher(s) have put lots of work into the measurement, perhaps years, and simply can’t handle, emotionally, the possibility that it’s all in some sense for nothing. (2) Researchers are not trained to really understand that some effects are not measurable. The examples they’re presented with as students are overwhelmingly ones in which conclusions are successfully reached. Dealing with “unknowability” is somewhat foreign — either in a precise sense (Cramer-Rao bounds, etc.) or an intuitive sense (from “playing” with data). I suppose possibility (3) is deliberately unethical behavior, but I think this is less common than some combination of (1) and (2). Any other options?

    • Raghuveer: basically I agree with you, but;

      In (1) it’s not just emotion. If one’s further career depends (or appears to depend) on having *something* to show from the current study, then regardless of what’s actually merited one will clutch at straws until something shows up. This is quite rational, for any given individual, even if it’s not what overall we want to happen. As many others have complained, the incentives are all wrong.

      In (2) I don’t think unknowability is quite the right term. Often the problem isn’t that data say exactly nothing, it’s that they just provide nowhere near enough information to be useful. Researchers who’ve been trained to think seriously about implications of confidence intervals can actually do okay in this regard – concluding that e.g. the data wouldn’t be surprising if the truth was anywhere between A and B, and A and B have massively different implications for the underlying science. (Even better analysis would bring in some prior information on what the truth might plausibly be). Again it’s not an original thought, but it’s much harder to argue that data is next-to-useless if all you have is simplistic Yes/No use of tests – which is all many researchers do have.

      • George: On (1), I certainly agree that there are “rational” incentives to do the wrong thing, but I think there are plenty of people who sincerely believe that their data interpretations are justifiable, but whose judgement is clouded by their emotional investment in their work. (And, I suppose, their emotional investment in their careers.) As Feynman wrote, “You must not fool yourself — and you are the easiest person to fool.” This is different (in motivations, if not in outcome) from someone coldly realizing that if they do some sketchy science their career prospects will brighten. I don’t think this distinction has come up much in the many excellent discussions this blog — though I could well be the only person who finds it “real” and interesting!

        • Raghuveer: I am inclined to agree with you — but would add that judgment may be clouded by emotional involvement in their beliefs as well as in their work. (Perhaps these two kinds of emotional involvement can reinforce each other.)

  6. Good luck fighting this fight. Most people do not understand randomness and its manifestations. In addition to Tversky and Kahneman, I can recommend Leonard Mlodinow’s book “The Drunkard’s Walk.”

  7. I have this argument in a general sense all the time: people want to believe patterns exist where they don’t and they want to believe that understanding why x happened means we can prevent variations of x from happening again. The former is obvious, the latter seems elusive to people because they don’t grasp that stuff happens because complex events come together and we can’t eliminate the underlying reality that life is complex and thus has a large random component and that we can’t eliminate random mistakes, random variations in human systems.

    I always thought computer science grasped this long ago, as in the “They’ll always build a better idiot” slogan. In practical terms, we waste lots pursuing procedures that cost more than the potential benefit, especially since included in the potential benefit is a denigration of the chances of an unforeseeable event though that event really can’t be controlled for or it would be foreseen.

    • Jonathan,

      Is it that “life is complex and thus has a large random component,”or the other way around: That life has a large random component and thus is complex?

      If one means “life” literally, then I would argue for the second option — in particular, when cells divide, the “stuff” in the cells is not evenly distributed between the two daughter cells; this constitutes a random component at the smallest level of life that then propagates randomness to other levels of life.

  8. Aren’t there two entirely different errors being conflated in this post?

    One: assuming a result (e.g. a correlation) found for in a special population holds for a wider _population and

    Two: assuming a population result is any true of all individuals in that population

    It is polite and probably correct to be agnostic about the Tracy and Beal results generalize to the full population of
    women (yes, it _could_ be true), but surely there is no formulation about individuals (what would it be: “if _you_ are
    a woman you wear pink/read shirts more often near peak fertility?”) that has any chance of being other than obviously false at best,
    nonsensical at worst.

    • +1

      “assuming a population result is true of all individuals in that population”… see, for example, the new AHA statin guidelines in which doctors are encouraged to prescribe statins to any individual patients with a calculated 10-year “risk” of 7.5% or greater. This calculated “risk” comes from a population-level model (with no confidence or prediction intervals). They are basically observed proportions (i.e., averages) at the population level. People, including doctors (and insurance companies who cover doctors for malpractice), just love certainty. You can even calculate “your risk” here: http://my.americanheart.org/professional/StatementsGuidelines/Prevention-Guidelines_UCM_457698_SubHomePage.jsp .

      So now we’re prescribing potentially harmful drugs based on made up (but certain!) numbers.

      • I’m not sure I find the Cardiovascular risk guidelines you link to so objectionable. First, they seem to avoid the “here is _your_ risk” language (the paper cited is generally even clearer: “The estimated risks are specific to defined combinations of the risk factors”). And even if I try to apply them to myself, I (not a doctor) cannot cite obvious factors that undercut the assessment – it’s a multi-factor model that seems to try within the bounds of reasonable simplicity to cover all the important bases.

        I was objecting more to “X is correlated with Y in some population, you are in this population, you have X, therefore you have a higher chance of Y” when for many such people this will be an obviously false and disprovable claim under any known
        interpretation of probability/chance.

  9. Of course, they don’t deny all variation. After all, they target differences between two subpopulations. But it may be true that people are inclined to discard variation if it is not the one they study. Maybe situation can be helped (somewhat) if basic stat classes included examples were the variation itself is the variable of interest and not nuisance. For example, in psychology it might be something from the group dynamics where you expect some degree of cohesion (the group must state together), but also differences (people should complement one another) or in poly sci where national political parties must maintain large degree of diversity to appeal to different voters (it seems like Democratic party is doing it to a fault).

    • I sometimes give an example of medication to control blood sugar in diabetes: The variation in blood sugar is important (too high and too low are both deleterious), so it’s not just the mean blood sugar that needs to be taken into account; it’s also the variability (a medication that produces high variability can be harmful).

  10. Pingback: Perilous use of statistics | LARS P. SYLL

  11. Uncertainty, variation, and the idea that there is regularity in uncertainty and variation are why statistics is a hard subject for most people.

    The fact that all we can ever say is more likely/less likely/on average when controlling for … is a lot harder than if p then q, if not p then not q.

    The other day I was talking to someone who teaches sociology of education in an urban college and she was saying how almost all of students in the room were living counter examples to every prediction of who gets to college. So I said, this is where you explain what R Square of .2 really means, not to mention relative versus absolute risk (http://www.flickr.com/photos/nychra/8519745210/in/photostream http://www.flickr.com/photos/nychra/8519722512/in/photostream/ )

    • “The fact that all we can ever say is more likely/less likely/on average when controlling for …”

      I’m not sure this is true, the practice of limiting oneself to collecting and then averaging “snapshot” data of dynamic systems would cause that limitation whether it was necessary or not.

      “The excuse is often made that social phenomena are so complex that the relatively simple methods of the older sciences do not apply. This argument is probably false. The analytical study of social phenomena is probably not so difficult as is commonly believed. The principal difficulty is that the experts in social studies are frequently hostile to science. They try to describe the totality of a situation and their orientation is often to the market place or the election next week. They do not understand the thrill of discovering an invariance of some kind which never covers the totality of any situation. Social studies will not become science until students of social phenomena learn to appreciate this essential aspect of science.”
      http://www.brocku.ca/MeadProject/Thurstone/Thurstone_1952.html

      • Of course it would be “thrilling” to discover that there is a single variable that explains college attendance in the US (not to mention across all cultures and times) but I think you are more likely to find that in a tabloid than coming out of serious scholarship.

        • Meaning explains 100% of the variance in. I didn’t mention anything about snapshot data, the scholarly work in the area of educational attainment obviously never would rely on snapshots, it would be nonsensical for it to do so. I am comfortable with there being unexplained variance and that the best we can say is that the odds differ for groups (and I’m not going to write out all of the qualifications that I’m assuming we can stipulate to about data, culture, historical epoch etc) not that we know with certainty whether a given 5 year old will enroll in college prior to age 20.

  12. As an academic, let alone as an academic blogger, you have an intellectual reponsibiity to get the basic facts straight. To anyone who reads evolutionary psychology (not uninformed critiques, but the actual primary literature), it’s almost impossible to come away with the conclusion that “all men are different from all women.” Evolutionary thinking can be used to derive specific, testable hypotheses about sex differences, but this is always with respect to a particular ability and it’s underlying mechanism. So there may be sex differences in spatial cognition but no sex differences in color vision. In other words, there are some domains for which sex differences are predicted, and some for which no sex differences are predicted. The lazy characterization that evo psychology claims “all men are different from all women” reflects a failure to understand the basic point that sex differences, where they exist, are domain-dependent. Interestingly, you don’t mention that other factor in which evolutionary psychology predicts differences in certain psychological domains, which is age (e.g., see work on life history theory). These predicted age differences are motivated by the different adaptive problems that had to be solved at different points across development over our species evolutionary history.

    I’m not sure I follow the general point of the post. A good theory may not explain all of the data, but that’s likely because much of the data violates a ceterus paribus clause. In the social sciences, it’s practically impossible to perfectly control all extra factors (not the ones of theoretical interest). If we could hold everything else constant, one would expect the data to conform to the theory, if it’s a good theory. If there are other factors that systematically interact with the factors captured in the theory, the theory may need to be expanded to explain any deviant data points.

    • Adam:

      I’ve not read all of the evolutionary psychology literature, but I’ve read papers such as “Big and tall parents have more sons: further generalizations of the Trivers–Willard hypothesis,” “Beautiful parents have more daughters: a further implication of the generalized Trivers–Willard hypothesis,” “The fluctuating female vote: Politics, religion, and the ovulatory cycle,” “Women are more likely to wear red or pink at peak fertility,” and “The ancestral logic of politics: Upper-body strength regulates men’s assertion of self-interest over economic redistribution”—papers that have been published in serious scientific journals and exhibit what seem to me to be extreme levels of gender essentialism of the sort that is not usually seen except in country clubs, locker rooms, school playgrounds, internet comment threads, and the scripts of Mad Men.

      But I have no doubt that there’s lots of evolutionary psychology work that is not centered on gender stereotypes, so I’ll alter my post accordingly, to make it clear that I’m only talking about the primary literature that I’ve seen.

      Finally, regarding your last paragraph: No, it’s simply not true from a statistical standpoint that if you could hold everything else constant, one would expect the data to conform to the theory. Not at the level of each data point, that is. Any decent theory of psychology should have human variation as part of it. As we say in statistics, it’s not y = a + bx, it’s y = a + bx + epsilon. And epsilon represents real human variation.

      Points that fall off the line are not “deviant” data points, they’re part of the reality. I’d say that labeling such points as “deviant” is part of the problem, it’s part of an insistence that a theory should cover all cases, it’s pre-statistical thinking that, it seems to me, is associated with so much trouble.

      • The epsilon point is important. If you are modelling a random variable, you are saying, more or less, that you believe that, given the predictors, the random variable has a mean and a variance and that given those and using the rules for its distribution, it’s a roll of the dice. The “reification of the mean” that we talked about the other day is partly a consequence of not understanding this.

  13. But when we unpack epsilon, what do we have? Is it just unexplainable error? I imagine a good theory is forced to introduce additional terms if other variables are identified that systematically capture some of the variance better than the original model. That should gradually eat away at epislon. In practice this is a hard and thankless job since most people focus on the terms that are doing the heavy lifting of capturing the most, not the least, variance. An interesting question here is do we ever get to the point at which epsilon can’t be further reduced because there are no other variables that can be introduced that capture some real aspect of the mechanism generating the data, a point at which we can’t reduce the space betweeen theory and data? If so, do we still want to identify epislon with human variation? What epislon is at this point is a bit of mystery to me. I’ve read some who suggest remaining variation at this point represents random effects arising from quantum processes, so epsilon would index processes of quantum mechanics, but I’m really not sure.

    • Adam:

      Yes, I agree about explaining epsilon. The point is that, realistically, there’s always going to be a lot of epsilon left over, especially when considering very simple models such as that of Clarke and Ayres: their model is simple both conceptually (it’s an example of what Daniel Drezner calls “piss-poor monocausal social science”) and statistically (it’s a regression of state-level variation on a few predictors). Simple models can be fine—for example, I’m a big fan of Doug Hibbs’s model predicting elections from the economy—but it’s ridiculous to think they can explain every data point. That was what I was getting at in my post.

  14. I have a great deal of skepticism for evolutionary psychology. There was a time when I found the theories quite enticing.

    Then I developed an interest in evolutionary biology, and ended up participating in a “journal club” on the subject for several years. Now research in evolutionary psychology seems like so much amateur speculation with so little evidence.

    A while ago I read a review (in Science or Nature) of a book on evolutionary psychology. I don’t recall the author or title of the book, but do recall that the reviewer described the theories in the book as “Just So Stories”. I can’t help but agree.

  15. Pingback: "Academics should be made accountable for exaggerations in press releases about their own work" - Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science

  16. Pingback: The piranha problem in social psychology / behavioral economics: The "take a pill" model of science eats itself - Statistical Modeling, Causal Inference, and Social Science

Leave a Reply

Your email address will not be published. Required fields are marked *