Ambiguities with the supposed non-replication of ego depletion

Baruch Eitam writes:

I am teaching a seminar for graduate students in the social track and I decided to dedicate the first 4-6 classes to understanding the methodological crises in psychology, its reasons and some proposed solutions.

In one of the classes I had the students read this paper which reports an attempt to reproduce an “ego depletion” effect. Here’s the paper, by Hagger et al.

The paper has been cited well given its age and has of course been taken up by eager skeptics which have proclaimed the end of the theory etc. Here is one example.

I never understood, or have been attracted to, ego depletion as a theory but that isn’t the point here. the point is whether to chosen effect exists. whatever it may mean.

reading the replication paper i was rather surprised to see the rejection rate of participants—it averages at about 29% with some labs rejecting up to 50% of the participants. See
this summary file.

now i guess substantial rejection, can occur—i often reject extreme observations as i have a fear of getting my effects because of outliers—but rejecting an average 1/3 of the sample, and selectively so (the weaker performers in the depleting task were rejected) can be argued to seriously undercut the chance to find a ‘depletion effect’ (if there was one).

that isn’t all. the authors do include in ‘Appendix B’ the results of the analyses with the rejected participants. indeed for the ‘key’ measure—a not very straightforward measure of the variance in response times—analyzing the data of all participants leads to stronger evidence of there being no effect of ‘depletion’ at all.

But for the much more straightforward effect on response times, at least in my naive mind (= being tired makes you respond slower)—including all participants actually shows a very clear effect of ‘depletion’.

i thought this to be rather embracing especially as the paper is clearly presented as a failure to replicate and wrote the lead author—-who responded:

Many thanks for your email. The depletion effect size was ‘there’ RT with all of the participants included and, although, the although the 95% CIs did not contain zero the effect size is pretty small. The scientific community can draw their own conclusions from this. Also, it does seem that there may be some moderators at work. Junhua Dang did an interesting re-analysis demonstrating that those who put in high effort on the letter ‘e’ task exhibited greater depletion. Again, small effects. See attached.

i am not sure about the ‘scientific community drawing its own conclusions’ from the end of Appendix B.

and a final twist. Roy Baumeister, who is the developer and main proponent of the depletion theory, wrote a rebuttal letter in which he mentions various things but nothing about the actual data and the rather unacceptable rate of rejection.

my conclusions—to me this kind of work is the mirror image of the flashy claims made on the basis of weak or nonexistent data—here people, unwittingly maybe, choose not to look at the data but rather support the more lucrative conclusion. isn’t this just another way of doing bad science?

My reply: I’m not sure why Eitam says that the conclusion is “lucrative.” I guess Baumeister can be making some money off of ego depletion, somehow—and I have no problem with that; I make money off my discoveries too!—but I don’t see how this is lucrative for Hagger et al.

On to the larger point, this seems like the usual story in which there’s a push, on both sides, to give a deterministic statement, in this case, “ego depletion is real” or “ego depletion doesn’t replicate.” I’d like discussions to more toward more acknowledgment of uncertainty and variation.

21 thoughts on “Ambiguities with the supposed non-replication of ego depletion

  1. Whatever you think of the main analysis in Hagger et al., the pre-registered analysis plan was vetted by proponents of the original effect, and the decision to focus on that analysis in the paper was also determined by the pre-registration, I believe. So it’s unlikely to have been influenced by considerations of lucrariveness. I also would like to see more data before jumping to the conclusion that a failed replication is more lucrative than a successful one (in this case or in general). I’ve heard arguments (and seen reactions) in both directions.
    The question of whether and when to exclude participants, and what this does to internal validity, is an important one. But I think that if the wrong analysis was chosen here, it’s pretty unlikely that was due to motivated reasoning. Which is a good reminder that transparency (e.g., pre-registration) is a separate issue from validity.

  2. I think lucrative here means that the replicators get attention and potentially career advancement for having investigated and found an “important” result. They are challenging established wisdom and finding “definitive” evidence of crisis in their field etc etc.

  3. I thought this paper did an amazing job acknowledging uncertainty and avoiding the trap of null hypothesis testing: http://www.psy.miami.edu/faculty/mmccullough/Papers/Bayes-Meta.pdf

    The authors try to establish a region of practical equivalence (ROPE) for ego depletion.
    “We came to our ROPE of µ 24k; probably too high to bother).

    “We are not sure if the debate about the depletion effect is at such an impasse, but it might be: If proponents’ prior beliefs can be described in terms of the original Hagger et al. (2010) estimate—p(µ) = N(0.67, 0.03)—but the true effect is actually described by our biascorrected estimate (i.e., μ = 0.00), then a replication effort on the order of 36 teams each collecting approximately 320 people per group (total N = 23,040) will have no chance of correctly changing such confident proponents’ minds (Figure S1).”

    • Formatting error, somehow…

      The authors try to establish a region of practical equivalence (ROPE) for ego depletion:
      “We came to our ROPE of mu < 0.15 in part on the basis of the estimated real world influence of mu = 0.15… There, an effect of mu = 0.15 corresponds to 38.4ml additional beer consumption—about 2.6 tablespoons"

      And rather than try to come up with a Final Answer on ego depletion, they estimate the sample size required to settle the debate:
      "We are not sure if the debate about the depletion effect is at such an impasse, but it might be: If proponents' prior beliefs can be described in terms of the original Hagger et al. (2010) estimate—p(mu) = N(0.67, 0.03)—but the true effect is actually described by our bias-corrected estimate (i.e., mu = 0.00), then a replication effort on the order of 36 teams each collecting approximately 320 people per group (total N = 23,040) will have no chance of correctly changing such confident proponents’ minds (Figure S1)."

      • Peter, its one thing is to try an uber perform with estimating a near zero (i.e. supporting the more lucrative “no replication”) and another thing is to push a straight forward evidence for “depletion” (whatever that may mean) to Appendix B. I was surprised by the latter, which is why I wrote Professor Gelman.

        • Baruch:

          I still don’t see why you say that “no replication” is “lucrative.” Who are those people making all this money from non-replications? It’s my impression that the big bucks are in making strong positive claims. Niall Ferguson, Malcolm Gladwell, Dr. Oz, etc.: they don’t get paid $$$ because of their robust skepticism, that’s for sure!

        • Andrew, You’re comparing the exponents of the absolute most lucrative positive claims to everyday “no replication” guys. Do you think Hagger et al would have gotten as much recognition if their replication was successful? I don’t know, but I would imagine at most, it would show up as a footnoted citation, in a catalogue of replications of various stripes. The negative replication, your correspondent tells us, makes it into syllabi.

        • Here is the definition:

          lu·cra·tive
          ˈlo͞okrədiv/
          adjective
          adjective: lucrative

          producing a great deal of profit.
          “a lucrative career as a stand-up comedian”
          synonyms: profitable, profit-making, gainful, remunerative, moneymaking, paying, high-income, well paid, bankable; More
          rewarding, worthwhile;
          thriving, flourishing, successful, booming
          “a lucrative business”
          antonyms: unprofitable

          ————–

        • Have you seen the latest multi-lab effort to replicate the depletion effect? It was presented at SPSP 2018, and it was the first time the results were presented to anyone (including the labs that participated in the effort).

          The methodology was given input by Vohs, iirc, and experts in ego depletion posited a hypothesis about what the parameter value should be.

          Based on a Bayesian meta-analytic hypothesis test: 1) Assuming their hypothesis is CORRECT (that the effect is positive and around .3, if I recall correctly), the meta-analytic median of the posterior is about d=.06. Point zero six. But that’s conditional on it being positive. 2) The data were ~ 4x more likely under a nil-null point mass hypothesis than under the expert-informed alternative. 3) Marginalizing across both, the meta-analytic estimate is about d=.016.

          So – All that to say; ego depletion is extremely tiny if it exists, and it’s 4x more likely that it doesn’t exist at all; at least, within the expert-informed paradigm used.

        • That wouldn’t make me fall of my chair (if there is such a phrase in English) but still this isn’t the point. The point is that both “camps” are not (always) about doing science/discovery/knowing what is the case but rather aim and strive for the most lucrative conclusion. I think this is consistent with your usual line of argument Andrew although I feel that the recognition that people’s motivations are similar at both sides of the replication isle is getting very little, if any, attention.

        • I disagree with this. The replication failures are indeed making headlines, but it’s because they’re exposing bad science. “One side” explores a silly question with a noisy method, attempts the method and analyses until something sexy pops out, and builds an entire career on data-driven-by-agenda, rather than data-driven-by-authentic-pursuits.

          Not everyone is like that – I don’t think people often PURPOSEFULLY tweak/hack their method until a confirmatory result is found, or at least not maliciously. I think they just fail to understand the rules of probability. They can convince themselves that they are right by just trying a bunch of times and counting the p < .05 results.

          The replicators don't do this. They attempt a very rigid, robust replication of well-"established" effects, and find that they consistently do not work, and that the previous papers may just be the result of random noise as dictated to occur by basic probability theory. The replicators seem quite happy when a result EITHER replicates OR doesn't replicate. They just want a fairly strong data-driven assessment of an effect we've taken for granted. If it doesn't replicate, then we need to start over; if it does replicate, then that's fantastic.

          And for what it's worth, I don't think replicating stuff is 'lucrative' at all. Journals and scientists seem to look down upon replications, because they aren't novel or original work. If someone tried to make a career out of just replicating others' studies, I suspect they would not make it very far.

        • I don’t see more rigor in this specific replication (specifically, in not recognizing and emphasizing a clearly valid result which obviously goes against their publishable (ok, not lucrative, you convinced me) conclusion but instead exiling it to Appendix B and not referring to it in the main text and in no way qualifying their conclusion).

          Stephen — imagine someone publishing a replication in which this (weird and doubly valid) cv measure points in the “positive” direction but the more straightforward measure of RT points in the “negative”, with many of the studies dropping 30%-50% of the participants. No way i would publish this as a “finding”. This is because a replication is measured by not a single DV but rather by these which are considered valid, preferably by order of relevance.

          Scientists are, maybe unfortunately, people. On both sides of the aisle. To me, this replication report is but one demonstration that the payoff matrix, now slightly changed, may adversely affects peoples work regardless of which camp they are on.

        • “Have you seen the latest multi-lab effort to replicate the depletion effect? It was presented at SPSP 2018 (…)”

          Not sure if i am missing something, but i also heard of this (at the time) and was appalled by the whole “let’s present some stuff during a conference but not really give any verifiable details yet” (in the form of a pre-print and/or article).

          If i understood this correctly, this practice is somehow a “normal” part of conferences. I totally don’t get that, find it useless, unscientific, and potentially very harmful. I remember at the time lots of people tweeting about the result, but nobody knew any specific details, and if they did, they were not verifiable.

          Conferences seem a joke to me as they are, but to (still) allow researchers to present prelimenary and unverifiable results/conclusions is simply totally incomprehensible to me…

        • The project was not fully complete yet. Iirc, there were 36 labs that participated in the replication effort, and they had only analyzed 34/36 of the data (the last 2 labs’ data were not available yet).

          Nevertheless, I thought their presentation *did* cover the methods and analysis in good enough detail. The problem is that the results were analyzed using a method most people in psychology are completely unfamiliar with (A Bayesian meta-analytic hypothesis test, using BMA across fixed and random effects meta-analytic models, informed priors for hypothesis testing, and a Bayes factor comparing the mean estimates to a nil-null point mass hypothesis). The persons who conducted the analyses were unable to attend the conference at the last minute, and the person presenting it didn’t know enough about the details to articulate on some of the nuances.

          I wound up writing a blog post explaining the analytic details, since I do actually understand the analysis.

          You also have to understand that this conference takes place in Jan-March, but people submit proposals in early Summer of the previous year. They hadn’t yet finished, but probably thought they would be. They were nearly finished (all but two labs reporting in). Given the circumstances, I don’t think it’s as terrible as you make it out to be.

        • “You also have to understand that this conference takes place in Jan-March, but people submit proposals in early Summer of the previous year. They hadn’t yet finished, but probably thought they would be. They were nearly finished (all but two labs reporting in). Given the circumstances, I don’t think it’s as terrible as you make it out to be.”

          Thank you for provinding some details concerning the presentation and/or lack of verifiable papers/data/etc.

          I can understand that some may view these details as some sort of “valid excuse” (or insert a better term here), but i view them as exactly pointing to the reasons why i think these “prelimenary” results and/or the way conferences are run are unscientific and/or unhelpful.

          Submitting stuff to (want to) present when it is not finished yet (and can not be verified by others) at some fancy conference just seems silly to me…

  4. I am afraid we would end up learning nothing of the so-called replication crisis. The positivist believers in conterintuitive tiny effects would be only supplanted by the equally positivistic wild bunch of supporters of zero-effects. Editors who suggested to drop the non-significant result to achieve a better story would be supplanted by editors who think that p = .03 signals fraud. I am still waiting for some epistemologically informed reflection on what does it mean to do research on humans on what make them humans: brains, culture, psychology, relations.

Leave a Reply to Sameera Daniels Cancel reply

Your email address will not be published. Required fields are marked *