The flashy crooks get the headlines, but the bigger problem is everyday routine bad science done by non-crooks

In the immortal words of Michael Kinsley, the real scandal isn’t what’s illegal, the scandal is what’s legal.

I was reminded of this principle after seeing this news article about the discredited Surgisphere doctor (see here for background).

The news article was fine—it’s good to learn these things—but, as with pizzagate, evilicious, and other science scandals, I fear that people are drawing wrong lesson from this story.

Yes, science has cheaters and frauds (as seems to be the case with the Surgisphere guy) as well as greedy operators who want the recognition without doing the work (as seems to be the case with the Harvard author of those papers). But I think the bigger problem is the general acceptance of bad work, as we discussed yesterday regarding that recent BMJ article.

All this focus on fraud, or “p-hacking,” or whatever is, I fear, a distraction from the larger problem of bad work by people who are sincere, but (a) don’t know what they’re doing, and (b) think that they’re experts.

Remember, honesty and transparency is not enuf.

P.S. Erik sends the above photo of Ocelot, who is still patiently waiting in his flowerpot for people to regularize their noisy estimates.

21 thoughts on “The flashy crooks get the headlines, but the bigger problem is everyday routine bad science done by non-crooks

  1. The news story underscores some of the issues. So many people that worked with Dr. Desai found his methods and opinions wanting – yet he managed to proceed. This is not an isolated case. I’ve been in academia for 40 years and have often seen colleagues engage in what I would consider “malpractice.” Of course, this is not always an objective assessment – in many cases, they got evaluations from students better than mine. And there are legitimate differences in opinion concerning what constitutes “good teaching” or “good research.” While we may all agree about Desai having crossed the line, and Wasnink having crossed the line, they each got away with it for quite awhile. For each of these clear cases, there are myriad cases that are more questionable.

    What is largely missing is any real attempt to engage in meaningful discourse about what constitutes “good research” or “good teaching.” With research we have some provisions – working papers, peer review, promotion and tenure decisions, etc. are all somewhat faulty, but they at least exist. I’d be interested to hear how many readers of this blog have ever seen serious discussion of what constitutes “good teaching.” The only thing I can think of is what happens with promotion and tenure decisions, or annual evaluation of untenured faculty. But these are often solely reactions to poor teaching evaluations – or, at times, the desire of senior faculty to protect their self-interest from perceived threats from younger faculty. Rarely – almost nonexistent – is there a serious discussion about what someone does within their own classroom. Academic freedom, which I hold dear and important, is an easy excuse to prevent such discussion.

    It would appear that once someone has gained the status of an “experienced academic” there are no longer any questions about their competence (until they mess up and are caught, that is). Is that a good enough system? It would appear not. To do better, I think we will have to be willing to engage in real discourse, real exposure to other views, real risk of finding that we may be wrong in our methods, understanding, or practices. Perhaps that is why I have round team-teaching to be so rewarding: it is the only place where such exposure systematically occurs. Within our own classrooms, I suspect that the larger problem Andrew refers to is much larger than the “illegal” cases that get flagged.

    Even those “illegal” cases seem insufficiently understood. I have yet to see a full accounting of the Surgisphere scandal. The Lancet and the NEJM seem quite happy to be past the whole episode. Shouldn’t they be insisting that we find out if the data was falsified, whether it exists, and how the co-authors signed off on it? Shouldn’t all those institutions where Dr. Desai worked be investigating how they failed to do anything while his colleagues were disturbed by his practices? Shouldn’t his colleagues be questioning themselves about how they permitted it to continue, or failed to do anything to stop these practices? There are all uncomfortable conversations, but without them I fear that little will change.

  2. I thought that David Michaels wrote a very good account about the product defense firms that have burgeoned to aid manufactures sell their products. They enlist a considerable number of epidemiologists, physicians, statisticians to instill doubt and uncertainty in product liability cases. Michaels reviews the tobacco cases in the process and suggests that many of the same strategies are in use today.

  3. As long as there is more pay-off than penalty for skirting appropriate research practices the game will continue, and that goes for both academia and the media.

    • One simple form of regularization is to use an informative prior. For example if plausible effect sizes are between say 1 and 3 in some units, then some kind of prior that puts a lot of mass in the range 1-3 is a good choice. If plausible effect sizes are close to zero, then a prior with mass in the vicinity of 0 is good. So perhaps -0.1 to 0.1

      Sure, the effect size is not exactly zero, but that doesn’t mean it isn’t close to zero.

    • Just to add to Daniel’s reply. When Ocelot talks about “noisy estimates” he’s referring to the situation where the effect is small (close to zero) relative to the standard error of the estimate. In that case, you’re likely to overestimate the magnitude of the effect and it’s important to regularize (shrink to zero).

        • Sorry, I should have said: shrink *towards zero. It’s not about variable selection or deciding if some effect is exactly zero. The point is that it’s important to shrink noisy estimates because they tend to be too large.

          Noisy estimates tend to be too large for two reasons. (1) If b is an unbiased estimator of beta, then abs(b) is positively biased for abs(beta). This bias is large if b is “noisy”, i.e. abs(beta) is small and/or the standard error of b is large. (2) The bias is even larger when you condition on statistical significance; that’s the winner’s curse.

  4. I agree that we need more screaming about the problem of bad science (though I wouldn’t mind more screaming about misconduct, too – I think both are vastly underestimated by most people’s intuitions), but I think those of us screaming about p-hacking mean the same thing you’re calling bad science, not fraud or misconduct. The vast majority of p-hacking is done with no trace of bad intentions – people are taught that there’s a jewel hiding in their data and it’s their job to uncover it. They are aware that they’re digging for a positive result, so p-hacking is an appropriate term, in my opinion, but they aren’t aware of why that produces exaggerated results and false positives, so they aren’t aware there’s anything problematic about it. Even those of us who understand in the abstract that p-hacking is problematic often don’t think we’re p-hacking in a given instance of undisclosed flexibility. I think I remember that you don’t like the term ‘p-hacking’ for unintentional bad practices, but I do think that’s how most people use it, so most reform efforts aimed a raising awareness about and curbing p-hacking are efforts to curb garden variety bad science, not fraud or misconduct.

    • Simine:

      My wariness of the term “p-hacking” and other terms such as “questionable research practices” comes from people who react negatively when outsiders point out problems with their statistics or their data. For example, the ovulation-and-clothing researchers wrote, “Gelman suggests that we might have benefited from researcher degrees of freedom by asking participants to report the color of each item of clothing they wore, then choosing to report results for shirt color only. In fact, we did no such thing . . . Gelman’s concern here seems to be that we could have performed these tests prior to making any hypothesis, then come up with a hypothesis post-hoc that best fit the data. While this is a reasonable concern for studies testing hypotheses that are not well formulated, or not based on prior work, it simply does not make sense in the present case.” I see their point, but as Loken and I discuss in our forking paths paper, forking paths are a concern even if the authors only performed one analysis.

  5. Great post, though of course the next question is what to do about this. This article that just came out in eLife is quite good, though it’s sad that it’s necessary:

    Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript
    https://elifesciences.org/articles/48175

    In essence, it’s about actually understanding logic and numbers, beyond blindly and incorrectly applying statistics. I would argue (as the authors briefly note) that “when writing or reviewing a manuscript” is far too late to be doing this!

Leave a Reply to Dale Lehman Cancel reply

Your email address will not be published. Required fields are marked *