When does research have active opposition?

[cat picture]

A reporter was asking me the other day about the Brian Wansink “pizzagate” scandal. The whole thing is embarrassing for journalists and bloggers who’ve been reporting on this guy’s claims entirely uncritically for years. See here, for example. Or here and here. Or here, here, here, and here. Or here. Or here, here, here, . . .

The journalist on the phone was asking me some specific questions: What did I think of Wansink’s work (I think it’s incredibly sloppy, at best), Should Wansink release his raw data (I don’t really care), What could Wansink do at this point to restore his reputation (Nothing’s gonna work at this point), etc.

But then I thought of another question: How was Wansink able to get away with it for so long. Remember, he got called on his research malpractice a full 5 years ago; he followed up with some polite words and zero action, and his reputation wasn’t dented at all.

The problem, it seems to me, is that Wansink has had virtually no opposition all these years.

It goes like this. If you do work on economics, you’ll get opposition. Write a paper claiming the minimum wage helps people and you’ll get criticism on the right. Write a paper claiming the minimum wage hurts people and you’ll get criticism on the left. Some—maybe most—of this criticism may be empty, but the critics are motivated to use whatever high-quality arguments are at their disposal, so as to improve their case.

Similarly with any policy-related work. Do research on the dangers of cigarette smoking, or global warming, or anything else that threatens a major industry, and you’ll get attacked. This is not to say that these attacks are always (or never) correct, just that you’re not going to get your work accepted for free.

What about biomedical research? Lots of ambitious biologists are running around, all aiming for that elusive Nobel Prize. And, so I’ve heard, many of the guys who got the prize are pushing everyone in their labs to continue publishing purported breakthrough after breakthrough in Cell, Science, Nature, etc. . . . What this means is that, if you publish a breakthrough of your own, you can be sure that the sharks will be circling, and lots of top labs will be out there trying to shoot you down. It’s a competitive environment. You might be able to get a quick headline or two, but shaky lab results won’t be able to sustain a Wansink-like ten-year reign at the top of the charts.

Even food research will get opposition if it offends powerful interests. Claim to have evidence that sugar is bad for you, or milk is bad for you, and yes you might well get favorable media treatment, but the exposure will come with criticism. If you make this sort of inflammatory claim and your research is complete crap, then there’s a good chance someone will call you on it.

Wansink, though, his story is different. Yes, he’s occasionally poked at the powers that be, but his research papers address major policy debates only obliquely. There’s no particular reason for anyone to oppose a claim that men eat differently when with men than with women, or that buffet pricing affects or does not affect how much people eat, or whatever.

Wansink’s work flies under the radar. Or, to mix metaphors, he’s in the Goldilocks position, with topics that are not important for anyone to care about disputing, but interesting and quirky enough to appeal to the editors at the New York Times, NPR, Freakonomics, Marginal Revolution, etc.

It’s similar with embodied cognition, power pose, himmicanes, ages ending in 9, and other PPNAS-style Gladwell bait. Nobody has much motivation to question these claims, so they can stay afloat indefinitely, generating entire literatures in peer-reviewed journals, only to collapse years or decades later when someone pops the bubble via a preregistered non-replication or a fatal statistical criticism.

We hear a lot about the self-correcting nature of science, but—at least until recently—there seems to have been a lot of published science that’s completely wrong, but which nobody bothered to check. Or, when people did check, no one seemed to care.

A couple weeks ago we had a new example, a paper out of Harvard called, “Caught Red-Minded: Evidence-Induced Denial of Mental Transgressions.” My reaction when reading this paper was somewhere between: (1) Huh? As recently as 2016, the Journal of Experimental Psychology: General was still publishing this sort of slop? and (2) Hmmm, the authors are pretty well known, so the paper must have some hidden virtues. But now I’m realizing that, yes, the paper may well have hidden virtues—that’s what “hidden” means, that maybe these virtues are there but I don’t see them—but, yes, serious scholars really can release low-quality research, when there’s no feedback mechanism to let them know there are problems.

OK, there are some feedback mechanisms. There are journal referees, there are outside critics like me or Uri Simonsohn who dispute forking path p-value evidence on statistical grounds, and there are endeavors such as the replication project that have revealed systemic problems in social psychology. But referee reports are hidden (you can respond to them by just submitting to a new journal), and the problem with peer review is the peers; and the other feedbacks are relatively new, and some established figures in psychology and other fields have had trouble adjusting.

Everything’s changing—look at Pizzagate, power pose, etc., where the news media are starting to wise up, and pretty soon it’ll just be NPR, PPNAS, and Ted standing in a very tiny circle, tweeting these studies over and over again to each other—but as this is happening, I think it’s useful to look back and consider how it is that certain bubbles have been kept afloat for so many years, how it is that the U.S. government gave millions of dollars in research grants to a guy who seems to have trouble counting pizza slices.

82 thoughts on “When does research have active opposition?

  1. This is an interesting discussion.

    Consider that Pizzagate would have never happened if he didn’t write his blog post. Did we just get really lucky to find a researcher who has dozens of papers with clear problems? Or is this common in some fields but literally no one is reading the work? Are publications just seen as the vegetables you have to eat to get to the dessert of best selling books and media appearances?

    If an omnipotent program could reveal how many Wansinks are out there what would that number be? Is he just the tip of the iceberg or an anomaly? I’m really interested in automating some methods to investigate this.

    • I’m putting my bet on “common but no one is reading the work”. How many times does the typical paper get read anyways?

      Academic publishing is like a Write Only Memory.

      • I coined (independently, but surely not uniquely) the term Write-Only Document 20 or more years ago, to describe the kinds of reports I had to write in the large bureaucracy where I worked in the IT department. Fortunately I only had to write 2 or 3 of these a year; others were doing them every month. You know the kind of thing: Your 4-page report becomes a paragraph in your boss’s report, which becomes a sentence in the director’s report, which may or may not make it to the CEO’s report.

        When I moved to academia, it didn’t take me long to identify that many areas have Write-Only Literature. My first teacher told me once, “About 7 people will read your article, and one of those is your Mom”. (Her Mom was not a scientist. Your Mom may vary.)

        The other thing I noticed early on is that this town (i.e., soft social science research) is plenty big enough for everyone. The tacit agreement is that A won’t tell (or show) everyone what a load of garbage B’s theory or associated research is, as long as B does the same for A. (That’s in the literature, of course. After a couple of drinks at the conference bar with A and A’s supporters it’s open season on the morons who are dumb enough to follow B, whom everyone knows fudges data.)

        As someone wise once said, in these fields, theories are like toothbrushes: Everyone’s got one and nobody wants to use anyone else’s. Apparently physicists know that quantum mechanics and general relativity can’t be correct, but having acknowledged this, they seem to manage to get along, knowing that one day the debate will be resolved. Maybe we need a few more “theory death matches” in psychology and related areas.

        • I agree with this in part. I work within a forensic psych context and about 50% of my time is spent report writing, the majority of which, only the summary/recommendations/conclusions are ever read by anyone other than a supervisor. However, as weird as this may sound, these reports are not designed to be read in full, they are designed to provide a defensible rationale for my conclusions. Hence whilst the conclusions are the only part that is read, the conclusions themselves require the body of the report to be completed. In this case, if something goes wrong, then the body of the report becomes important, because this is the way in which people can decide if my conclusions were reasonable.

          I think that the scientific literature functions in a similar manner (sometimes). That is, it is the conclusions that are important, however the body of the work provides a record of how those conclusions were come to. This is demonstrated in the Wansink case, because upon closer inspection of the body of this work, we can conclude that his conclusions are bunk.

        • Perhaps, But I think another aspect of Wansink is that the risk function for being wrong is pretty damn flat. I mean, suppose some all you can eat buffet place decides based on his research to put the pizza closer to the customer than the sausage… this is supposed to lead to a 1% reduction in costs and a 3% increase in customer satisfaction ratings…

          and it doesn’t, it leads to a 0.5% increase in costs and a 1.2% increase in customer satisfaction ratings…

          when the “cost of being wrong” is a very slowly varying function of the stuff you do… it basically doesn’t matter what you do, and when the difficulty of measuring things reliably is large enough, you don’t even really know whether you improved things, made them worse, or just had seasonal variation in results…

          and this is the key to Wansink’s longevity. His research had no value, and no real world cost of being wrong for the consumer. His research was in essence, a milking machine for the government teat. The cost was entirely placed onto a 3rd party (namely taxpayers).

        • Daniel:

          The cost of Wansink was also paid by Cornell students and by companies such as McDonald’s that paid him for consulting. It’s hard to feel too much sympathy for McDonald’s, but as a sometime business consultant myself, I do get annoyed by consultants such as Wansink who run around getting consulting contracts based on hype.

        • Those are the costs of him doing crappy research. But imagine his lab was pristine, well grounded, full on good research practices. He’d still occasionally come to some wrong conclusions, maybe choose wrong models for his analyses… And in the end, no one would care, because whatever is a wrong conclusion has basically no effect on the world.

          So, yes, I think one of the biggests costs of “actual Wansinking” is that he pees in the pool and educates undergrads in crappy practices, and etc etc.

          But the costs or benefits of “theoretically perfect Wansinking” are minimal vs just doing stuff at random.

          And yes, I am now going to claim priority in the coining of the term “Wansinking”

        • Jordan:

          Wow, I followed your link and then checked out that Food and Brand Lab twitter feed . . . I was first amused and then horrified. This really is the reductio ad absurdum of p-value-based push-button science.

        • +1

          when the “cost of being wrong” is a very slowly varying function of the stuff you do… it basically doesn’t matter what you do, and when the difficulty of measuring things reliably is large enough, you don’t even really know whether you improved things, made them worse, or just had seasonal variation in results…

          I get into arguments with some of the data analysts at work on this point all of the time.

    • Keep in mind that most of Wansink’s papers are in obscure, minor journals that are rarely read and cited. He gets by on quantity, volume.

      Also as Prof Gelman said, Wansink’s papers are usually on “harmless” topics, uncontroversial and very specific. It would be different if the papers were on “big topics”, topics that everyone has a strong opinion about, and on which many people work using the same data or similar data. For example if you use CRSP financial securities data and make a dubious claim about what explains expected returns, you can be sure 1000 researchers will call you out on it.

      It’s like the difference between fishing in a small pond no one but you knows about, vs fishing in the Grand Banks in the Atlantic.

      • Jack:

        Ahhhh, but here’s the paradox: In a scientific context, Wansink’s work is obscure. Yet in the news media, he was not obscure at all, being featured in the New York Times, the New Yorker, NPR, etc., as well as Marginal Revolution, Freakonomics, etc. And he was respected enough to have received millions of dollars of government dollars and was appointed to a post in the government. And he was a superstar at Cornell, a well-regarded university. His work had real policy impact. So not such a small pond at all.

        • Well, some of Wansink’s work has made it into “prestigious” journals. He’s published in JAMA and JAMA Pediatrics, both of which have high impact factors and are (or at least claim to be) serious journals.

          Note: I contacted JAMA Pediatrics and they did not seem happy to learn of this scandal, so maybe they are a serious journal.

        • Impact factors are negatively correlated with the statistical power of studies though..so I don’t think that impact factor means anything in terms of research quality (research novelty yes, but not quality).

  2. An investigation into the fraud by Stapel revealed some things about the entire scientific enterprise concerning Social Psychology that are perhaps also relevant concerning the field Wanskink works in. Perhaps recent events regarding “pizza-gate”/Wanskink can be considered to be a “successful” replication of the Stapel case:

    https://www.tilburguniversity.edu/upload/3ff904d7-547b-40ae-85fe-bea38e05a34a_Final%20report%20Flawed%20Science.pdf

    “Suspicions about data provided by Mr Stapel had also arisen among fellow full professors on two occasions in the past year. These suspicions were not followed up. The Committee concludes that the three young whistleblowers showed more courage, vigilance and inquisitiveness than incumbent full professors.” (p. 46)

    “Another set of explanatory factors for the extent and duration of the fraud, alongside those set out in Chapter 4, reside in the general manner in which the research was performed both within and outside Mr Stapel’s working environment. It involved a more general failure of scientific criticism in the peer community and a research culture that was excessively oriented to uncritical confirmation of one’s own ideas and to finding appealing but theoretically superficial ad hoc results. ” (p. 47)

    “It is almost inconceivable that co-authors who analysed the data intensively, or reviewers of the international ‘leading journals’, who are deemed to be experts in their field, could have failed to see that a reported experiment would have been almost infeasible in practice, did not notice the reporting of impossible statistical results, such as a series of t-values linked with clearly impossible p-values, and did not spot values identical to many decimal places in entire series of means in the published tables. Virtually nothing of all the impossibilities, peculiarities and sloppiness mentioned in this report was observed by all these local, national and international members of the field, and no suspicion of fraud whatsoever arose.” (p. 53)

    • Anon:

      It’s interesting, in that one can make a sort of continuum from Stapel (pure fabrication) to Lacour (fake data constructed from real data) to Wansink (presumably there’s some real data somewhere that’s used as a sort of bag of Legos to construct whatever conclusions are desired) to Carney/Cuddy/Yap (data collection so sloppy and compromised that no conclusions can be trusted) to Hauser (real data but compromised data coding) to Fiske (sloppy data collection and analysis but nobody really cares about the results anyway) to Kanazawa (data may be perfectly clean but the statistical conclusions are unfounded as the analysis is pure noise mining).

      As I’ve often said, one thing these cases all seem to share is a conviction on the researchers’ part that their substantive conclusions are correct; thus the data and statistical analyses are merely confirmation of what the researchers already believe.

        • Jack:

          I don’t know where to draw the line.

          I’m picturing Wansink’s lab as being like a big living room with a bag in the center. Whenever someone does a study, they throw the numbers they’ve gathered into the bag. Then whenever someone wants to write a paper, they pull some numbers out of the bag and write it up. Yes, these writeups are lies, and this is research misconduct. But somehow maybe it doesn’t feel like a “sin of commission” because Wansink and his team think that this is just the way research is done.

          Similarly with Hauser. Sure, you’re supposed to independently code your data. But I’m guessing that he doesn’t think of it as cheating to code the data himself, he just thinks the stakes are too high to allow any uncertainty into the equation.

          Similarly with Kanazawa. With some effort he’s been able to avoid confronting the fact that many of his papers are based on meaningless manipulation of noise. Is it a “sin of commission” that he’s so carefully avoided learning what he’s been doing wrong?

        • Andrew says: “Is it a “sin of commission” that he’s so carefully avoided learning what he’s been doing wrong?”

          badabing!

          Same goes for the reviewers in these situations, especially in the more extreme cases, who didn’t bother with the slightest critical glance during peer-review. It’s a “don’t-look-and-you-won’t-see” mentality, kind of like walking ignoring a purse-snatch so you won’t have to waste your afternoon at the cop station.

        • I vaguely recall that in the past Andrew has said he often (sometimes?) finishes a paper review in under 15 minutes? I may be wrong.

        • I don’t know how much blame to put on the reviewers. When you start with the assumption that the authors know what they are doing it can be easy to miss problems.

          Look at the case of Alexa Tullett:
          https://rolfzwaan.blogspot.com/2017/03/duplicating-data-view-before-hindsight.html

          Even after seeing duplicated and deleted data she didn’t suspect deliberate fraud had occurred because she worked with the person.

          We only critically looked at Wansink’s papers because of his blog post. If the reviewers had read his blog post before looking at the papers maybe they would have noticed issues as well.

          As Alexa says, she is more cynical now. I think it would benefit everyone if instead of assuming authors know what they are doing, even if they have hundreds of publications, we start with the assumption that the paper and data contain serious problems.

        • A big part of the problem is that carefully reviewing a paper takes a lot of time — and reviewers are typically people who don’t have the time available to do a thorough review; it’s not like it’s something they are paid to do.

          I think the “critical perspective” is something that needs to become part of the culture — so that people look at their own work from a critical perspective, as well as their colleagues’ and and their students’ work, and don’t review a paper unless they have time to review it critically; or at least are honest on how carefully they have reviewed it — and don’t recommend publishing if they haven’t looked at it carefully.

          Of course, we need to acknowledge that there are people who take the view that “the integrity of the work is the responsibility of the author(s), not the reviewer or the journal.” In some sense I agree with that — but not entirely, because “it takes a village” to create a culture of research integrity.

        • @Jordan

          If the assumption is going to be that the authors know what they are doing why go through the charade of a peer-review process at all?

          Might as well get rid of it.

        • @Martha:

          >>>”A big part of the problem is that carefully reviewing a paper takes a lot of time — and reviewers are typically people who don’t have the time available to do a thorough review”<<<

          My opinion is that, if you don't have the time to do a careful review it behooves you to simply say no.

        • Interesting.

          When I was part of a group doing meta-analyses of clinical trials (e.g. https://www.ncbi.nlm.nih.gov/pubmed/2152097 ), for each paper one member was assigned as a severe critic and another as an advocate – seemed to work well.

          There needs to be the right mix of incentives, opportunity, resources and abilities to adequately discern what was (in)adequately done and reported in a research paper.

          p.s. my nephew was treated with TPN a few years after this paper was published, fortunately only once or twice :-(

      • “one thing these cases all seem to share is a conviction on the researchers’ part that their substantive conclusions are correct; thus the data and statistical analyses are merely confirmation of what the researchers already believe.”

        This often seems to me to be similar to “formal debates” — you follow the “rules of the game”, and someone decides you have won. (In this case, “winning” is getting your paper published — and if it gets into the popular press, you have really won!)

    • I had never seen this report, thanks for sharing it. The portion just below the one you quoted is worth drawing attention to:

      “Co-authors also reported more than once in interviews with the Committees that reviewers encouraged irregular practices. For instance, a co-author stated that editors and reviewers would sometimes request certain variables to be omitted, because doing so would be more consistent with the reasoning and flow of the narrative, thereby also omitting unwelcome results. Reviewers have also requested that not all executed analyses be reported, for example by simply leaving unmentioned any conditions for which no effects had been found, although effects were originally expected. Sometimes reviewers insisted on retrospective pilot studies, which were then reported as having been performed in advance. In this way the experiments and choices of items are justified with the benefit of hindsight.”

      Maybe this is something that should be emphasized more. We look a lot at authors who do sloppy work. Are the reviewers who give them feedback also ordering them to do sloppy work? What happens when an author submits a paper that is careful and thorough and the reviewer essentially tells them to sex it up a bit?

      • “What happens when an author submits a paper that is careful and thorough and the reviewer essentially tells them to sex it up a bit?”

        In my limited experience and case, i simply said i did not agree with the reviewer which told me to leave out certain analyses because they “did not add to the story” or something like that. The paper got published with my additional analyses included, but if it would have come down to it i would have simply gone to a different journal. I can clearly remember my professor telling me that “the reviewer is always right”, which i found astonishing and to which i replied that this can obviously not be the case. It is perhaps no wonder i did not land a job in science, nor do i want one anymore…

        I agree that things like “incentives”, “the reviewer told me to do X”, “the system is just bad”, and “publication pressures” play a possible role but i feel the responsibility of individual researchers is too quickly brushed aside in the general discussions about these matters.

  3. So basically, you don’t get opposition, because no one really cares. You just pick a topic that’s relatively innocuous. Don’t affect any vested interests.

    You get your publications but no one really wants to spend time and effort to verify your claims. Frankly, 99% of all publications fall in that bin I think.

    Basically, avoid upsetting the cart. Publish stuff that’s interesting but your conclusions shouldn’t affect anyone else’s business. And steer clear of really competitive areas.

    • +1 I never saw a company upgrade a million dollar piece of equipment based on a p-value saying a new model is better.

      Now, the FDA is another story. But FDA has no skin in the game, I guess.

      So “something important is at stake” ain’t enough unless that “something” belongs to the decision maker.

        • Usually as well as demanding all the data and redoing all of the statistics and the default is two studies with fully pre-specified and reviewed before the study is conducted with both providing p_values below the .05 (a _default_ not a strict rule).

          Its a large organisation with literally 100’s of statisticians employed full time and so there will be examples of bad decisions based on crummy statistical reasoning – but I am not sure why some folks think ” FDA is another story. But FDA has no skin in the game, I guess”. The have websites that discuss their statistical work and statisticians like Don Rubin and Dan Berry have worked with them on a number of occasions.

        • But my understanding is that checking that what was preregistered was what was reported is quite lax. (e.g., “outcome switching” is common). Also, that FDA regulations are influenced by regulations passed by congress.

        • The results of the COMPARE project (http://compare-trials.org/) are sobering. In addition to large amounts of unacknowledged outcome switching, they also found that some top journals don’t want to hear about it when the outcome switching is publicized. “We know best what should go in the paper, don’t bother us with details of the preregistration” was the attitude from some of them.

        • The FDA doesn’t do approvals based on the scientific literature though. They do it based on the filings, which are different.

          In my admittedly limited experience paying attention to review processes they will often re-analyze under their own methods. According to a colleague they will do a “missing=failure” analysis for example. I know one case where they also discussed excluded patients on a case by case basis.

          They certainly make mistakes. I do remember one where a company got something through by burying it in a footnote; the FDA later ‘noticed’ and went to court. They lost, to my outrage.

          Still, I assume spinning a journal editor is easier than spinning the FDA, if nothing else because you always have another editor to visit.

  4. If only it were true. If it is important, you can bet that there will be disagreements, but it is naive to think that much light will be revealed as a result. Unless the data is openly available, it becomes a p***ing match of contradictory results, each with their own set of forking paths. A side effect (but perhaps the most important effect) is that the public becomes convinced that there is no such thing as science, only opinions. The system is then “rigged” and “alternative truths” are as valid as real ones. As skepticism pervades all discussion, we do not get progress but only a schoolyard brawl.

    • “The system is then “rigged”

      What if it is?

      Bones, A. K. (2012). We knew the future all along: a priori hypothesizing is much more accurate than other forms of precognition. Poster presented at the Annual Meeting of the Society for Personality and Social Psychology, San Diego, CA.

      “Finally, a skeptic might counter that the JPSP authors could have conducted the studies, found results, dismissed inconsistent data, and then written the paper as if those were the results that they had anticipated all along. However, orchestrating such
      a large-scale hoax would require the coordination and involvement of thousands of researchers, reviewers, and editors. Researchers would have to selectively report those that “worked.” Reviewers and editors would have to selectively accept positive, confirmatory results and reject any norm violating researchers that submitted negative results. The possibility that
      an entire field could be perpetrating such a scam is so counterintuitive that only a social psychologist could predict it
      if it were actually true”

  5. Andrew:
    You write, “What could Wansink do at this point to restore his reputation (Nothing’s gonna work at this point)”

    I don’t agree that there’s NOTHING he can do. I mean, all the work he’s done at this point should be lit on fire, but America is the country of second chances.

    If he says this experience has made him aware of the replication crisis and how terrible his work is and voluntarily retracts all of his papers, and then uses the funding and wealth he has accumulated to hire talented statisticians and programmers and establishes a “fraud detection unit” like the Meta-Research Center in this post: https://www.theguardian.com/science/2017/feb/01/high-tech-war-on-science
    then I think he could come out of this scandal for the better.

    • Jordan:

      Sure. Or Wansink could quit his job, join the fire department, and spend the rest of his professional life rescuing cats from trees. But I don’t see that happening. My impression was that, when people asked me, “What could Wansink do at this point to restore his reputation,” they meant that he could restore the reputation of his published scientific work. And I think that’s hopeless.

  6. I get and support the big picture of this blog but push back (somewhat hard) about some of the details.

    First, with economic research, it’s true you get pushback from some groups if you research doesn’t support their political views…but you also get push-forward if your research supports their political views. In the few economic papers that I’ve read, I’ve felt that I am very confident in the authors political views as shown through the causal interpretation. To me, that feels rather unscientific. But I don’t claim to be an expert on economic research, so take that with a grain of salt.

    But biomedical research is a different story. I’m going to speak from my experience with biological research that only had trace amounts of the medical aspect to it. There is a strong form of pushback with biological science; it’s that people rely on your results. Your results can be p-hacked to the end of the earth, but if they are replicable, everyone is still happy with them. If everything was done with statistical rigor but the results are not replicable, no one is happy.

    I don’t mean to say that there are no problems in biological science; STAP is both an example of a scientific scandal and also that researchers do get push back (resulting in the PI’s suicide). You can get away with publishing small papers that no one cares about having results created solely from p-hacking. And given the feedback we’ve received from reviewers in regards to statistical methodology, its probably not hard to do so. But if it’s a paper people are going to care about and builds a lot of excitement, naturally people are going to try to use your results. Clever p-hacking does not build replicable results.

    Through this method, we actually get reliable science; if something comes out with p-values less than 0.05 all over the place, we think that’s interesting and take note. Once more than a few other researchers have USED that result to get new results, we consider those results reliable.

    In fields where future results are not dependent on the replicability of past results, we cannot rely on that test of science. Maybe in those fields, we should ask that the researchers make a non-trivial prediction based on their supplied model (i.e. “according to our model, cutting taxes will increase crime rates by 3% in the next year”). If that prediction does not hold, they don’t get to publish. Or they get to publish, but after a year, we stamp the paper with “prediction did not hold up”.

    • Can you share a few examples of a published direct replication in biomed? These are pretty rare in my experience. And we all know about the cancer/neuroscience replication problems…

      • A published replication is not strictly necessary for the field to stop trusting the result, though. It would, of course, be better for the original false claim to not have been published at all. And practice and publishing venues should be changed to guard better against the incredible inefficiencies and false beliefs that result.

        But in the meantime, researchers are not bound to act as if everything that gets published is certain, and good ones don’t act that way. Self-correction does happen a lot just as Cliff says. And of course, it also happens a lot less than the ideal.

        It would certainly be much much better if the the published literature actually represented prevailing opinions about which papers had replicable results. But just because you can’t read a paper that directly takes down a result doesn’t necessarily mean that it hasn’t happened.

        • A published replication is not strictly necessary for the field to stop trusting the result, though

          Publishing is not strictly necessary in the first place either…

          And yes, I know about all these unpublished “informal” replications and how that goes. The cherry picking and bias can get insane. “It didnt work, I must have done it wrong, better keep trying (alternatively, *shrug*)”; “Well it worked 3 out of 5 times, we probably messed up the other 2”. It is even less reliable than what gets published.

          What we have seen is the reports from Amgen and Bayer about sub-30% replication rates, the cancer reproducibility project reporting 40% of studies had to be dropped because all the funds were being wasted on figuring out the methods, then I guess 2/5 papers so far were deemed “replicated”* (although I remember they used the dumb statistical significance criterion for “replication”…)

          *http://www.sciencemag.org/news/2017/01/rigorous-replication-effort-succeeds-just-two-five-cancer-papers

        • > Publishing is not strictly necessary in the first place either…

          In my second sentence I agree with you. And in the sentences after that I recognize the rest of it too about how imperfect unpublished informal replication is.

          But, as we’ve been around before, I think there is a difference between imperfect and hopelessly useless.

        • Either way, I think the evidence we do have (not hidden away in file drawers) fails to support the optimism regarding biomed displayed in this thread (ie the cancer replication project is currently at ~0.6*0.4 = 0.24 success rate using the “standard” criteria). That published evidence happens to be consistent with my own personal experiences as well.

        • I wouldn’t personally describe my opinions about the current state of biomedical research as optimistic. Nonetheless, I think the process of replication via use that Cliff and I are pointing out is an important one, even though I think it doesn’t actually work very well.

          I generally agree that the state of the literature is bad, but reading the cancer replication report, it looks to me like the 2/5 number is misleading. Two of the replications failed for technical problems in the replication *and because the preregistration rules barred them from doing anything to resolve those problems*. The actual result from completed replications is 2/3 successes (by their standard). I’m not sure there is anything at all we can tell from that number.

      • I have seen posters that say “we were not able to replicate the results of A,B,C et al and here’s what we think happened”, but typically biological science does not strictly rely on direct replication. I think that’s much better.

        My point is that this is a field in which your long-term popularity is based on how much people can USE your results. If your results only apply to a very constrained environment (i.e. extremely dependent on current conditions that are ever changing), then no one will use your methods. If your results are strongly dependent on the noise in your originally produced dataset, people will not use your methods either. I’m not saying it doesn’t mean you won’t get published, but at least the idea won’t stick around for too long.

        As a simple example, take sequencing. Not that long ago, this idea was pie in the sky. But now everyone does it! I’m not saying everyone makes the absolute best use of it, but it is a technology readily available to most every bio lab out there. That’s the one of the litmus tests that lets people know your methods are reliable. In contrast, if the only evidence I see is a bunch of p-values less than 0.05, I will think to myself “that might be interesting…or it might not”. So my point is that in bio, there’s a long term push back that if you claim your methods do X, but no else is able to accomplish X with your methods, then people will lose interest in your methods. If you waste their time making lots of claims like this, they will lose interest in you as well.

        But in fields where published papers do not have immediate utility that must be relied to advance further research, then what do we have to measure a work’s utility besides more produced estimated effects/p-values/posterior probabilities/etc? If some research doesn’t have anything that is built on and dependent on it’s results, I don’t think I’ll ever personally get passed the “that might be interesting…or it might not” state, regardless of whether they used p-values, multi-level modeling, etc.

        Done right, statistics is a powerful tool that should point you in the correct direction in the face of uncertainty. But that “done right” part is so unreliable that I can’t help but feel that it should never be the final yardstick.

        • I’d agree that developing methods is probably the most promising avenue of bio research, probably because no one uses NHST to assess success.

          However, it’s my impression that many of the people who come up with these methods are “crackpots” who don’t believe much of what gets published using them:
          https://en.wikipedia.org/wiki/Raymond_Vahan_Damadian
          https://en.wikipedia.org/wiki/Kary_Mullis

          Regarding sequencing, I’ve never really gotten around to looking into it, but something is off about taking a bunch of cells and getting one sequence out. I’m fairly certain that in a cell population of reasonable size you will find at least one cell with a mutation for every bp.

    • “Through this method, we actually get reliable science;”

      Only if the random walk converges fast enough; it’s not obvious to me that it does, especially when the experiments assume certain cultural, or even linguistic, conditions. Consider a hypothetical priming experiment with the word “terrific” now and 150 years go; replication would be evidence against priming!

      More to the point though, scientific research should be regarded as an investment, and the question at hand is what is the efficient way to invest.

    • I think Cliff’s point is fairly straightforward:

      Some phenomena can be tested easily. If the results matter to the well-being of humans, people will test them by trying to put them into action. If they don’t work, they’ll be thrown in the trash bin.

      OTOH, some phenomena can’t be tested very well – or at least not well enough to kill off one faction or the other. Minimum wage is a great example. In my personal opinion, raising the min wage is destructive to the economy. But there are so many other factors that operate on such a wide range of time scales that it’s probably impossible to assess the impact of simply changing the min wage. Here in Seattle we recently passed a min wage law. But look: Amazon is on a mammoth long-term, multi-billion dollar building spree, funded almost entirely by sales that are occurring outside the city, much less the region, state or country. If you’re an employer, there are hundreds of ways to compensate for the extra cost, so counting the lost min wage jobs doesn’t tell the story. And you don’t have to cut them just this year, you can cut some next year or the year after. Your investments in automation might take a few years to materialize. on and on.

      And yet whenever Our Father Krugmann writes about min wage, he cites a single ancient study and claims the debate is closed.

  7. To me, this is the best part of working as an expert witness. All of your stuff is produced against someone’s interest, and whoever that is (if they are well-enough funded) hires a smart person to find every forking path. Now you still have the problem that Dale Lehman discusses above, that the *judges* of the the two sides are woefully underpowered to perform their tasks. But if you don’t keep score by their calls, but by the the internal (and monetary) satisfaction you get for being paid to endure harsh criticism in an atmosphere where even one Wansink-like episode ends your career, well, that seems like good fun. It’s the boring testimonies making boring obvious points that some lawyer wants for no obvious reason where you can slip up, mostly through laziness.

  8. I, too, enjoy working as an expert witness and find it to be a better forum for vetting ideas and analysis than usual academic and publishing venues. It is far from perfect – and seems to be getting less so all the time. I believe Canada does it better – experts are hired to work for the Court, not just the parties to the proceeding. In any case, testimony call for doing work that is meaningful and can withstand cross-examination. Why can’t the refereeing process do this?

  9. Andrew, thanks for distilling the whole mess down into this precise post and the comments section is very useful as well. I work in an area where social psychology is considered the gold standard of research and thus the whole area is completely full of Wansink stuff (“people recover from surgery faster if they have a view of nature out the window”, “obesity and diabetes are caused by not enough access to nature for the poor”, biomimicry is a particularly egregious idea in this field). No one even knows how to really read any of the footnotes or cares, since it is all about confirmation bias and the primary professional organizations in the area directly encourage such lack of rigor. Obscure as it may sound, the whole area of “research” into architecture and design is full of this kind of thing. But the really odd part is that the field is made up of people who have no idea what a good study is or could be (architects, designers, interior designers, academic “researchers” at architecture schools or inside furniture manufacturers trying to sell more). They even now have groups that pursue “evidence-based healthcare design” which simply means that some study somewhere says what they need it to say. The field is at such a low level that it is not worth mentioning in many ways except that it is deeply embedded in a $1T industry for building and construction as well as codes and regulations based on this junk. Any idea of replication is simply beyond the kenning in this field because, as one of your other commenters put it, the publication is only a precursor to Ted talks and keynote addresses and sitting on officious committees to help change the world (while getting paid well). Sadly, as you and commenters have indicated, no one thinks they are doing anything wrong at all. I only add this comment to suggest that there are whole fields and sub-fields that suffer from the problems outlined here (much of this research would make Wansink look scrupulous).

  10. Social science suffers from Steve Fuller once called “paradigmitis”. Researchers think that since “truth is relative to a paradigm” (as Kuhn taught us) and a paradigm is just the epistemic standards of a “scientific community”, all they need to do (in lieu of rigorous methods) is to build a community around their work. They leave out the part where you have to actively deal with anomalies. I think we should stop taking scientific communities that don’t have vibrant discussions and a good track record of identifying and correcting errors seriously. In particular cases, if you can’t see where an interesting result was subjected to criticism, then you should assume that the results is either (a) not interesting or (b) not valid. If you find it interesting yourself, then it’s up to you subject the claim to public criticism and see if it holds up.

    • I also get the sense that the effect of personalities is much stronger in soc. sci. than sci. It is harder in hard sciences for any one person to retain a monopoly on a paradigm.

      Ideas get commoditized much faster. I think this is good.

      • I shudder to think what SETI would have come up with by now if they had been using social scientific noise filters on their radio telescopes. There’d probably be two different research groups saying completely contradictory about the civilization that both teams are certain exists around Proxima Centauri. Because these claims would be “incommensurable” with each other, and with astronomy in general, no one would push back on them, and so you’d be free to believe that the aliens are blue-skinned or two feet tall as you please. Of course, a “recent study” would periodically discover that they are taller or bluer than you thought! After a decade of this sort of thing, complete with earnest coverage on NPR and in TED talks, the notion of ETI would become completely devoid of interest.

        • I honestly think that such an alternative timeline (the timeline where SETI installs noise filters) would probably lead to *better* outcomes than the “original timeline” (that is, the timeline we’re in).

          –When SETI announces proof of ETI, the media will have a field day with it, using it to push narratives about humanity’s place in the world. Eventually, the media will get bored of that narrative and move onto some other hot topic, but humanity’s own self-understanding will be subtly affected, in ways that would be good or bad. Possibly a positive placebo effect may result.
          –If you say enough nonsense, at least some of it will happen to be true by sheer coincidence (see the “Library of Babel” short story, where you can have books that can tell intelligible narratives, even if the books themselves are composed of random combinations of letters). If rigor gets reintroduced in the SETI studies, at least a few effects might wind up getting replicated anyway…and so the original “finder” of that effect will take full credit for “discovering” (i.e, p-hacking) the effect. So it’s possible that some of the research papers might accidentally find real information about ETI.
          –ETI will soon be completely devoid of any interest, but that would be because the “hard” part of ETI (finding it) would have been declared “solved” and thus not worth worrying about. The mere existence of ETI is “transformative”. What this ETI actually *is* is…well, kinda irrelevant. The whole hype cycle (“Oooh, we found aliens”…”Oh, let’s learn about aliens”…”Wait, what’s the point of learning about these aliens if they’re so far away and by the time we ever actually visit them, they’d all probably be dead”…”Okay, we’re bored, let’s watch American Idol instead”) would happen even if SETI followed rigor.
          –But SETI’s noise filters is much *cheaper* than rigor! We get the joy of “discovering” aliens without spending a lot of time and effort actually finding clear proof of them. The resources that are saved in the pursuit of ETI could be redirected to other matters (though at least some of it may be “wasted” in the form of grants to the two research groups).

    • ” I think we should stop taking scientific communities that don’t have vibrant discussions and a good track record of identifying and correcting errors seriously.”

      Like when a collaboration bans someone from their private statistical methods email group for make overly provocative comments?

      • Well, yes, that would give the individual who was banned a reason not to take it seriously. But I’m actually thinking about the conspicuous absence of public disagreement. If a result catches your eye, look for people who publicly questioned it. If you can’t find any, then you’ll have to critique it yourself. Otherwise it hasn’t really been tested. If the authors ignore your questions (or ban you!) then we’re back where started, and that would also explain the original absence of public disagreement. So, mainly, to avoid wasting a lot of time, I’d like us (and the media) to just ignore results that don’t already have a public record of debate.

  11. The idea about research slipping through and being accepted because there is no major interest on the other side is really interesting and something I hadn’t thought about before. This post really changes the way I think about research. I just reviewed a paper and was thinking about this kind of thing, whereas I never would have before. Excellent post!
    Rick

    • Rick:

      You’ll be amused to know that I first started thinking about this issue of “research opposition” many years ago, when working on the radon problem! The EPA had made these very strong recommendations about radon measurement and remediation. In the opinion of Phil and myself, the EPA was hyping the risk and recommending more remediation than necessary (an issue we later discussed in technical terms in our 1999 Statistical Science paper). Phil and I decided that one problem was that there was no pro-radon lobby. Usually when the EPA is pushing, there’s an industry lobby on the other side pushing back. But radon gas is a product of nature, so the usual industry lobbyists weren’t really interested in spending their effort to defend it.

      • The real estate lobby had a strong interest in radon. They couldn’t come out as pro-radon, but they were very strongly (and influentially) opposed to identifying any areas as being more dangerous than any others. The homes where there is definitely a substantial risk of lung cancer from exposure to radon decay products are almost all in a small fraction of the country, but the hell if they were going to let the EPA say so. Such a pity.

  12. I think it is important that this discussion is about “opposition” and not “diversity” of viewpoint. Johnathan Haidt and many of those that are unhappy about the current combative state of academic and political discourse think diversity of opinion is the solution. It may be necessary but not sufficient. But opposition must, I think, be married with a capacity for skepticism from the other side. Opposition is needed to breakdown the barriers put up by biases ranging from small to systematic. Skepticism is needed to be able to actually allow opposition to change minds. Diversity of opinion without skepticism and real opposition is simply a precursor to relativism.

    Too bad the EPA didn’t seem to realize the opposition value of the radon paper that you and Phil Price wrote. Maybe as an agency they should have a bit more skepticism about their own views.

  13. I am NOT a Breitbart reader but their website now has stumbled into the complex debate on proper data hygiene and methods. Apparently the site and its followers are strong supporters of the Scientific Method:

    http://www.breitbart.com/tech/2017/03/29/j-scott-armstrong-fraction-1-papers-scientific-journals-follow-scientific-method/

    Ironically the article covers a lot of the same themes echoed on this site on data, methods and research opposition. Of course with the caveat it is from Breitbart.

Leave a Reply

Your email address will not be published. Required fields are marked *