I (inadvertently) misrepresented others’ research in a way that made my story sound better.

During a recent talk (I think it was this one on statistical visualization), I spent a few minutes discussing a political science experiment involving social stimuli and attitudes toward redistribution. I characterized the study as being problematic for various reasons (for background, see this post), and I remarked that you shouldn’t expect to learn much from a between-person study of 38 people in this context.

I was thinking more about this example the other day and so I went back to the original published paper to get more details on who those 38 people were. I found this table of results:

But that’s not 38 people in the active condition; it’s 38 clusters! Looking at the article more carefully, we see this:

The starting race and SES and starting petition were randomized each day, and the confederates rotated based on these starting conditions. In total, there are 74 date–time clusters across 15 days.

That’s 38 clusters in the active condition and 36 in the control.

And this, from the abstract:

Results from 2,591 solicitations . . .

Right there in the abstract! And I missed it. I thought it was N=38 (or maybe I was remembering N=36; I can’t recall) but it was actually N=2591.

There’s a big difference between 38 and 2591. It’s almost as if I didn’t know what I was talking about.

But it’s worse than that. I didn’t just make a mistake (of two orders of magnitude!). I made a mistake that fit my story. My story was that the paper in question had problems—indeed I’m skeptical of its claims, for reasons discussed in the linked post and which had nothing to do with sample size—and so it was all too easy for me to believe it had other problems.

It’s interesting to have caught myself making this mistake, and it’s easy to see how it can happen: if you get a false impression but that impression is consistent with something you already believe, you might not bother checking it, and then you mentally add it to your list of known facts.

This can also be taken as an argument against slide-free talks. I usually don’t use slides when I give talks, I like it that way, and it can go really well—just come to the New York R conference some year and you’ll see. But one advantage of slides is that with slides you have to write everything down, and if you have to write things down, you’ll check the details rather than just going by your recollection. In this case I wouldn’t’ve said the sample size was 38; I would’ve checked, I would’ve found the error, and my talk would’ve been stronger, as I could’ve made the relevant points more directly.

During the talk when I came to that example I said that I didn’t remember all the details of the study and that I was just making a general point . . . but, hey, it was 2591, not 38. Jeez.

20 thoughts on “I (inadvertently) misrepresented others’ research in a way that made my story sound better.

  1. Two thoughts, besides the obvious one that this apology is a good and suitable one (presumably you’ve emailed it to those authors):

    1. It’s actually not so bad making mistakes, especially in something that’s not a final publication, so long as you fix them up later like this. Mistakes are bad, but it’s not worth the effort to triple-check everything. The penalty of embarassment later probably makes us exert inefficiently *much* effort in fact, just to serve our pride.

    2. When we see some mistake like this one, even though it’s big and self-serving, our first thought should be that it’s just a dumb mistake, not that it’s a dishonest one. Of course, if the person won’t fix it, our second thought should be either that the person is lazy or that it’s a dishonest mistake, or both.

    I’ve been thinking about church scandals a lot, and these ideas apply there too, although usually the public doesn’t hear of them until things get well beyond part (2).

    • I’ve kinda been thinking about this in teaching lately. In undergrad stats, the probability of making a typo is pretty high… Even though I proof everything and get a TA to proof again, they still slip through. It’s actually way more efficient to proof then put it in front of 250 student eyes, someone will find the errors. Downside is that I feel dumb, but it sure does catch typos way more efficiently.

  2. You-a culpa! :) 3 stars for transparency!

    Once a prof I was taking a class from was complaining about something in a rival’s paper and when I finally got him to explain it in detail I showed him that it was explained correctly in the text. He hadn’t looked at the text, he had looked only at figures and he had misunderstood them.

    But I think the idea of using slides to force yourself to check the details is kinda silly :).

    The best approach is for all of us to read entire papers carefully, take notes and figure out all the details as we go. From what I’ve seen both academically and professionally, overlooking the basics is the #1 mistake.

  3. Two things: (1) props for owning up, and (2) agree re: no slides. I talk to juries for a living and while the son et lumiere thing is currently ascendant I find that most people prefer someone who talks to them; who doesn’t read from slides; who doesn’t serve Spam out of a can; and, who listens to what they have to say, whether verbally or via body language. There’s not much to this world that doesn’t entail talking to, and learning from, one another.

  4. Andrew said, “But one advantage of slides is that with slides you have to write everything down, and if you have to write things down, you’ll check the details rather than just going by your recollection.”

    This is one reason why, when teaching, I try to write out lecture notes and post them on the web so students can download them before class. I print them on paper for myself and display them on a doc cam when I use them.

    Another reason I like to do this is that I can incorporate “exercises” that (I hope) help students “get” the concepts as they are introduced. For example, see pp. 27 – 28 and 38 – 40 at https://web.ma.utexas.edu/users/mks/CommonMistakes2016/WorkshopSlidesDay1_2016.pdf

  5. Andrew,

    welcome to the wonderful world of confirmation bias (i.e., seeing/finding information that is consistent with what you already believe)! It’s a powerful bias and we’re all prone to use it. But it’s certainly refreshing to see someone not being defensive about it and admitting it instead.

    The bigger challenge is, of course, to evade this bias when checking your own hypotheses in research. If a hypothesis is NOT supported by the data, I, like many other researchers, leave no stone unturned to find out whether the null result is an artifact of a coding error or some other mistake. If a hypothesis IS supported, however, I am less inclined to find out whether this could be due to a coding error or some other artifact (again probably like many other researchers). Even with the best intentions, I doubt that researchers can completely neutralize confirmation bias in themselves when it pertains to their most cherished hypotheses. That’s one of the reasons why open data and methods (including well-commented and well-structured analysis scripts in particular) are so important for ensuring the integrity of research. Others who look at my work will be less likely to share my own confirmation biases and might therefore be better at spotting whatever hypothesis-confirming mistakes I have made.

    In any event, kudos for being so frank about this. Posts like this may help to change the culture in the behavioral/social sciences in the long run and in a direction that helps to make everyone a better (i.e., more humble) scientist.

  6. I share other commentators’ ‘kudos’. This was the right thing to do.

    This being said, more generally: will this case affect the way you (Andrew) will cover other people’s work in the future? I’ve noticed that you often qualify your posts with “I haven’t read it closely” or something like this. Which is fair: who has time to read everything? But here’s my concern. The Michael Greenstone’s of this world can probably live with your criticism and hopefully learn from it. However, junior faculty or PhD students are much more exposed. I’m certainly not saying that they are beyond criticism. This would be against the spirit of research. But when you drop the hammer on a paper, do you think you have the responsibility of reading it closely before suggesting that it’s faulty?

    I realize I will probably sound like I’m accusing you of something. I’m not! I’m just concerned that you don’t fully realize the influence you have on people’s views. The author you mention here was lucky to have advisors who came bat for her in the comments (if I remember correctly). Others won’t be so lucky. In such a tight academic job market, I can easily see how it could cost people interviews.

    • mic said,
      “However, junior faculty or PhD students are much more exposed. I’m certainly not saying that they are beyond criticism. This would be against the spirit of research. But when you drop the hammer on a paper, do you think you have the responsibility of reading it closely before suggesting that it’s faulty?”

      I think that if you are dealing with a junior faculty or students, it would be most appropriate to send criticisms to the author(s) (and the Ph.D. advisor if the author is a student), and give them a chance to respond before any “public” publishing of the criticisms (such as on a blog).

      • Hi Andrew –

        As you know I’m a huge fan of you and your work, but count me among those who are a little concerned. There is something peculiar about a situation where a senior academic of your stature can skim over a paper written by a junior researcher, make unfounded public criticisms that induce some very ugly commentary (the comment section of your original blog post on Melissa Sands’ paper is very disappointing to read) … and then upon noticing the mistake, receive plaudits and accolades from other high status researchers.

        It is especially strange when I think about (what I take to be) the substance of your comments at the recent Metascience conference, namely that we should all be less polite when calling out “shoddy science” (a term I dislike greatly, but I believe that was your term?) I wonder whether you would apply the same standard here? Judged by the lights of your own remarks, it would be appropriate for people to call your original comments “shoddy criticism” and argue that there has been far too much politeness extended towards you. That doesn’t feel at all correct to me – everyone makes mistakes from time to time and we all deserve kindness and politeness. I’m grateful that you have taken the effort to openly acknowledge the error, but I think it might be worth pondering how this comment section would look if everyone were to pounce on you the way that your commenters did upon Sands. I think there is a double standard in operation here, and as a community we might do well to remedy this.

        Kind regards, as always
        Danielle

        • Danielle:

          Thanks for the comment. There are a few things going on here, so let me go through them one at a time.

          1. As discussed in the above post, I made a mistake when discussing that research at a recent conference. To make a mistake that makes the other paper look bad, and in a way that supports my preconceptions, is a bad thing no matter what. So, sure, feel free to call my talk sloppy or shoddy, or you could call it junk science, or negligence, or an ethical lapse, or an example of the poor standards that exist in meta-science communication. Any of these seem fair to me. If someone gave a talk alluding to a paper of mine with N=2591, and they said it was N=38, I’d be pretty annoyed!

          2. I don’t think I’ve ever used the word “shoddy” in my life. It’s just not a word I use. (Or maybe I have, and I’ve forgotten! But it certainly doesn’t sound to me like a word that I would use, either in speaking or writing.) No big deal—I say “crappy” all the time, and it pretty much has the same meaning. I do think there is crappy science out there, and some of it gets acclaim, attention, and funding.

          3. I stand by what I wrote in my original post from May 2018. I don’t think anything in that post is “unfounded,” in your words. If there’s a particular item in the post that you think is incorrect, let me know.

          4. As much as possible, it should be about the work, not the people. Criticism of people can come up—for example there was the issue of Brian Wansink not responding to criticism and actively encouraging his students to do bad science, or Marc Hauser not letting other people see his videotapes, or researchers publishing paper after paper on a topic without addressing serious criticisms of their work—but if we’re talking about one paper, it shouldn’t be so relevant who is the author. In that post from May 2018, I took pains to emphasize that “I have no problem with these data being published; my problem is that the paper (implicitly endorsed by the journal) makes strong claims that are not supported by the data. . . . First, this is not a slam on the people who did this study. We all have some hits and some misses; that’s the way research goes. This particular experiment was a potentially good idea that didn’t work. The problem is not with the study but in the way that it was mistakenly presented as a success.”

          5. I don’t recall exactly what I said at the metascience conference—if you were there, you might well recall my words better then I do!—but when I was talking about being less polite, I’m talking about being less polite regarding the research claims, not regarding people. So I’ll say that again here.

          Just as a reference point, here’s an example of a criticism of my work, a criticism that was harsh, impolite . . . and valuable. That comment motivated me to check my work, and there were serious problems with what I was doing. I don’t mean a typo in my code, I mean that my whole model was sloppy and I had to do months of work to improve it. See here for the full story.

          In this case, I appreciated the harsh comment from an outsider. It would’ve been fine if he’d been more polite, but what’s most important is that he got the criticism out there. I had published work with serious mistakes, and he did me and the larger community the service of pointing out some of my errors.

          Anyway, I publish things with errors from time to time—it happens!—and people should feel free to point out these errors. Of course they should! What I write is in the public record. If I didn’t want my work to be criticized, I shouldn’t be making it public.

          6. One more thing. Politeness has a social cost too. Consider the case of Brian Wansink. Concerns about his claims and research methods had been known for years, but they all were under the radar. Wansink was able to just push aside legitimate concern and continue to get all sorts of funding and publicity for what was, yes, bad science. All the problems only came out after some rude people like James Heathers came into the picture and made some noise.

          7. I agree 100% with your statement, “everyone makes mistakes from time to time and we all deserve kindness and politeness.” We all make mistakes; what’s important is to learn from them. One reason these conversations are so difficult is that there are so many social pressures and incentives for people to not admit mistakes. The place where we should be firm is on the science. If a published paper makes a claim that you don’t think is supported by their data, I think you should say so loudly—and this should not be taken as a personal attack on the author. As psychology researcher Daniel Gilbert wrote, “Publication is not canonization. Journals are not gospels. They are the vehicles we use to tell each other what we saw (hence ‘Letters’ & ‘proceedings’).” I think Gilbert has earlier gotten some things wrong regarding the replication crisis, but I fully agreed with him on this one. Let’s be firm on (what we perceived to be) errors, while being kind to the people involved and while recognizing that we are all flawed people.

        • Hi Andrew – thanks for the reply!

          I agree with a lot of what you’re saying (especially point 7!), and I’ll take it as given that you know how much respect I have for you, so I’ll jump straight to the points of disagreement :-)

          To start with, I have some thoughts about your points (3) and (4). You mention that you stand by what you wrote in the blog post, but I’m curious how you now feel about what you wrote in the comment section, and highlight the passage where you mentioned that you were not intending this as a “slam” on the authors. Nevertheless, you also made both of the following comments…

          – “My guess is that what happened was a mixture of overcommitment and group-think. The overcommitment came when somebody, somewhere, decided that this was a great idea for an experiment. The group-think came when various people signed off on the results because other people signed off on them. The paper appeared in PNAS, that was evidence for giving it awards, then in turn it’s hard for people in the loop to imagine that the paper is fatally flawed”

          – “A lot of social scientists seem to turn off their critical thinking when a randomized experiment is involved.” [the clear subtext being that this is what happened to the author of the original paper]

          … On my reading, both of these are instances you’re no longer making an assertion about statistics, and have made it a little personal. It is difficult to read either of these comments as anything other than aspersions cast upon the reasoning and scientific judgment of the author of the paper and those who thought highly enough of it to award it a prize. In my experience, this kind of elision from statistics to persons happens a lot, and while I’d certainly agree this is a very minor example, it is often much worse.

          But if I may, the thing I want to mention is this comment that you made at the time, because I think it speaks quite nicely to concerns I have about the open science community:

          – Nobody’s talking about “punishing a researcher.” I think it was a mistake for this paper to have been published with such strong statements, and I think it was a mistake for a committee of political scientists to give the paper an award. But not publishing a paper as is, or not giving a paper an award, is not a punishment!

          I completely agree with your third sentence: it is not a punishment to be denied an award or a publication if neither is warranted! Similarly, the second sentence is unremarkable, though perhaps overstated given that the sample size concern was mistaken. It’s the first sentence where I think your perspective may not be well-calibrated.

          People rarely talk openly about directing inappropriate “punishment” at their colleagues, but nevertheless people do get punished inappropriately. I don’t believe that happened in this instance, of course, but I think it is naive to assert that “nobody is talking about punishing a researcher” as though there were any evidentiary value to such a claim.

          To illustrate what I mean, it is remarkable to me how frequently people in the open science community bring up the case Brian Wansink, and how rarely they recall the case of Amy Cuddy. At the time she was a junior researcher, and she was most certainly subject to quite harsh social punishment. I’m of the opinion that a lot of what happened to her was absolutely a form of bullying, and while I do not include you among the people to blame for that, I do think that your commentaries made it easier for people to bully her. To my knowledge there has been very little acknowledgement of how poorly we as a community treated her. She may have been wrong in her scientific claims, but she most certainly did not deserve the treatment she received.

          I guess what I’m trying to get at, in a circuitous fashion, is that while politeness does indeed carry a cost, unrestrained bluntness can and does quickly turn to bullying in academic communities. It happens a *lot* on twitter, for instance. It is my opinion that there is a very serious culture problem in the open science community to which community leaders have turned a blind eye. I find it deeply unpleasant, and — having once or twice been on the receiving end of that treatment — have chosen to keep my distance from that community. Indeed I am only bothering to comment here because of the respect I have for you personally.

          The other thing I think I’m trying to hint at is that not everyone has the luxury of being rude or blunt in their public comments. Consider, for instance, what happens to a transgender woman who behaves rudely, or speaks from anger in a public setting. I don’t imagine you have much personal experience with that, but I do, and the consequences are extraordinarily unpleasant. The punishment won’t come from the *academic* community, but rather from other people who are looking for any excuse to demonise trans women (sadly, I speak from bitter personal experience here). Worse, the political consequences for transgender people as a class, should such parties be successful in those efforts, are extremely dire. Given these considerations, I cannot safely afford to do anything except be polite. This is the political reality for me. It’s one that I accept, the world being what it is, but it means I take a very dim view of any attempts extol the virtues of rudeness and or to excuse aggressiveness in scientific communication.

          My apologies for the length of the comment, and again I want to stress that I really do agree with you on most things. In a kinder world I’d prefer to be discussing the various reasons why we both dislike Bayes factors. But that isn’t the world we live in I’m afraid. In any case, I’ve said what I came to say, and I hope my perspective has some value here.

          Best
          Danielle

        • Danielle:

          Thanks again for the comments. I agree with most of what you write here too!

          Regarding your last point: Yes, I agree that criticism is easier in some places than others. I appreciate that people such as James Heathers, Nick Brown, Anna Dreber, Simine Vazire, Uri Simonsohn, etc., will stick their necks out and criticize published work—even to the extent of being attacked for it. I fully appreciate that many people, for personal or career reasons, are not able to do so, and it’s good that people who can afford to be blunt, sometimes are.

          I do think I see where Cuddy, Wansink, and lots of other people are coming from. If you work in academia, the most likely thing is that the news media won’t notice your work at all. If you’re lucky, you’ll get some attention, and even if you don’t enjoy the attention at a personal level, it’s good for your work: after all, we do our research for a reason, and we wan’t people to know about it. When we do get coverage, it’s typically uniformly positive, 100% positive. That’s the way it goes: mostly academic work is entirely ignored, sometimes it’s treated as some sort of brilliant discovery.

          Cuddy and Wansink got this uncritical coverage (I’ve received it too), they rode with it to fame and fortune. Then people noted problems with their work, so much so that the first author on the article that brought Cuddy fame withdrew all the claims of the article, and so much that many of Wansink’s articles were found to be based on no data at all. Neither Cuddy nor Wansink responded well to these problems. That said, I can understand how it happened: It can be pretty shocking to get any negative news coverage at all, after receiving years of unstinting public praise.

          I agree with you that some of the negative coverage can be problematic. But I’d also like to pin some of the blame on the earlier positive coverage. I’ve written before about the problems with the scientist-as-hero framing. I think such coverage is inaccurate (scientists are not lone geniuses), it misrepresents experimental science as producing deterministic truths, it is the product of a misunderstanding of statistics (the idea that causal identification + statistical significance = discovery), and, in addition to all this, it creates this later problem when flaws are found in published work.

          If Wansink and Cuddy were heroes before, then when they respond defensively rather than openly when people point out actual flaws in the work that made them famous, does that make them villains? No. They weren’t heroes before, and they aren’t villains now. They’re just people. Scientists. Humans.

          I think that a bit less hype on the upswing might make it easier for people to admit problems on the downswing.

          P.S. Regarding the two paragraphs you quote from my previous post: Yes, I stand by these paragraphs. I do think there is group-think and overcommitment in these situations, I do think that publishing in PNAS was a mistake and that publication in a top journal gives the paper status which in turn motivates awards. And I do think a lot of social scientists seem to turn off their critical thinking when a randomized experiment is involved. They really are trained to focus on causal identification, unbiased inference, and standard errors. We’ve seen this a million times, that researchers consider lots and lots and lots of explanations for their data, but very rarely do they consider the explanation that things have happened by chance, and if the study were repeated, it could easily go in the opposite direction. That’s a big motivation for the replication movement, that statistical explanations are not enough for people; they need to see the replication. Recall the wonderful 50 shades of gray paper.

          P.P.S. No need to apologize for the length of the comment. That’s one reason I prefer blogs to twitter: we can go on as long as we want here.

        • I agree with a lot of that too. I wish journalists in particular would avoid dramatic, “catchy” summaries of scientific research — I’ve never had any press coverage of my work, thankfully, it sounds awful! Most of science — at least in my experience — is boring, workmanlike and incremental, and that’s totally fine. Similarly, I don’t like the way that researchers are constantly pressured to make overly strong claims, and am very glad that there’s been movement on that front. All of which is to say that I am exhausted with “hype” and am totally with you there.

          On other points, I’m less sure. The recurring point of difference is how you and I perceive “abrasive” criticism. I don’t like it. All too often I have seen “scientific criticism” used to justify behaviour that in everyday life would be called harassment. I’ve spent too much of my life with abusers, and my willingness to engage with people who use public shaming as a tactic is now exhausted. I acknowledge that you see things differently, and I cannot think of a way to reconcile that difference. Most likely there is no such way, so perhaps it is best to let the matter drop.

        • Danielle said,

          “The recurring point of difference is how you and I perceive “abrasive” criticism. I don’t like it. All too often I have seen “scientific criticism” used to justify behaviour that in everyday life would be called harassment. I’ve spent too much of my life with abusers, and my willingness to engage with people who use public shaming as a tactic is now exhausted. I acknowledge that you see things differently, and I cannot think of a way to reconcile that difference. Most likely there is no such way, so perhaps it is best to let the matter drop.”

          Danielle,

          If you are willing to reconsider letting the matter drop: I think it would be helpful to the community for you (and/or others) to give specific examples (either real or hypothetical) of ” ‘scientific criticism’ used to justify behaviour that in everyday life would be called harassment”.

        • Andrew, not having delved back into the previous commentary, I wonder if you can answer the question of how much does the change in N even matter to your previous conclusions? Sure, if the N were really 38 or whatever, it would weaken the paper even more. But even with N near 3000 in 38 groups, the conclusions might still be basically unfounded. Major reasons for this can be lack of a decent model, forking paths in choice of analysis, and soforth.

          So, how much do your conclusions about the stats change due to the N?

        • Daniel:

          Yes, knowing that N=2591 does not affect my conclusions about the claims made in the paper—which makes sense, given that I did not base my conclusions on N.

          Here’s why N matters. If N really were 38, then the design would be obviously flawed: how could you realistically expect to learn anything useful from a survey of 38 people? With N=2591, the design is no longer obviously flawed. It does seem plausible that, with the right design, we could learn a lot from that many people.

          The funny thing is, in the original blog post, I wrote, “I would not necessarily have said ahead of time that this experiment was doomed to be noisy. . . . This particular experiment was a potentially good idea that didn’t work.” I wouldn’t want to say that with N=38. So it’s weird that I got this N=38 thing in my head. Perhaps I was conflating this with other studies that did have small N’s. I don’t think this is a mistake I would’ve made in writing, and I have to be more careful about my accuracy when speaking as well.

        • Danielle:

          You wrote that my earlier blog post had “unfounded public criticisms” and that this more recent post received “plaudits and accolades from other high status researchers.”

          As noted in my comment above, I don’t think there was anything I wrote in that earlier post that was “unfounded.”

          Also, on what grounds do you say that this new post was applauded by “high status researchers”? Are the contributors to this comments section “high status researchers”? I don’t know most of these people; I have no idea. Your remark about the high status researchers rubs me the wrong way because throughout my career I’ve been attacked by high-status researchers. It’s been happening for decades! Recently, Susan Fiske has been attacking me. She’s perhaps higher status than anyone who’s ever appeared in this comment section. So please be careful about just assuming that high status people are all on the same side. Thanks.

Leave a Reply to Martha (Smith) Cancel reply

Your email address will not be published. Required fields are marked *