Prediction Market Project for the Reproducibility of Psychological Science

Anna Dreber Almenberg writes:

The second prediction market project for the reproducibility project will soon be up and running – please participate!

There will be around 25 prediction markets, each representing a particular study that is currently being replicated. Each study (and thus market) can be summarized by a key hypothesis that is being tested, which you will get to bet on.

In each market that you participate, you will bet on a binary outcome: whether the effect in the replication study is in the same direction as the original study, and is statistically significant with a p-value smaller than 0.05.

Everybody is eligible to participate in the prediction markets: it is open to all members of the Open Science Collaboration discussion group – you do not need to be part of a replication for the Reproducibility Project. However, you cannot bet on your own replications.

Each study/market will have a prospectus with all available information so that you can make informed decisions.

The prediction markets are subsidized. All participants will get about $50 on their prediction account to trade with. How much money you make depends on how you bet on different hypotheses (on average participants will earn about $50 on a Mastercard (or the equivalent) gift card that can be used anywhere Mastercard is used).

The prediction markets will open on October 21, 2014 and close on November 4.

If you are willing to participate in the prediction markets, please send an email to Siri Isaksson by October 19 and we will set up an account for you. Before we open up the prediction markets, we will send you a short survey.

The prediction markets are run in collaboration with Consensus Point.

If you have any questions, please do not hesitate to email Siri Isaksson.

20 thoughts on “Prediction Market Project for the Reproducibility of Psychological Science

  1. “whether the effect in the replication study is in the same direction as the original study, and is statistically significant with a p-value smaller than 0.05.”

    No. That is not the definition of a reproducible, stable, replicated result. It can be achieved with 50% chance of success by spending money.

  2. I have twittered on this. When I first heard it I thought it was possibly a joke. It strikes me that the scientific credentials of the enterprise is already sufficiently shaky to avoid making a game out of it. If they believe their own theories about a “reward structure” influencing results, then, even if the better doesn’t bet on a study they are doing, there’s a potential influence because everyone knows who is coin what. One may expect more Schnall-type outrage, warranted or not. I realize this is being run by economists,and, having spent 5 years partly in an econ dept., I know that’s their mindset, but it would bother me if I was promoting the reproducibility project as a serious scientific endeavor. Should we bet on the results of upcoming clinical trials for ebola? whether a latest patient will survive?

    • Maybe researchers will risk their careers to throw the studies so their buddies can make $50. Or maybe it’ll just show that betting using personal priors is a more reliable indicator of what’s reproducible than frequentist dogma.

      For 25 studies at the 5% percent level, frequentist claims to objective truth rest on there only being a few failed replications. What if there are 30%, 50% or 70% failures and the betting based on the personally determined priors of the market participants gets far closer to the real failure rate?

      What’s your strategy for rationalizing away that one?

      I’d go with “frequentists have had a monopoly on teaching undergraduate statistics for 80 years, but they just need to teach it a little bit better and all these problems will go away”. That’s my fav.

      • Re: “For 25 studies at the 5% percent level, frequentist claims to objective truth rest on there only being a few failed replications.”

        Man, you know that’s not the case. Why not give them the actual argument?

        FWIW, I think Mayo’s point is that this will look very skeezy to outsiders, despite researchers’ stated desire to seem professional, remove incentives for shoddy research practices, etc. Less of a statistical issue than “oh look, psychologists undermining themselves, again.”

        The gift card doesn’t really help.

        • I let Frequentist slide when their 95% intervals are wrong 30,50,70% percent of the time when Frequentists stop bragging about the frequency properties of their methods and the “gaurnatees” those properties provide.

        • Here is R. A. Fisher’s quote from is famous design of experiments book:

          “The value for which P=0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation ought to be considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant. Using this criterion we should be led to follow up a false indication only once in 22 trials”

    • Mayo:

      I have no reason to think this prediction market is a joke. I also am not a fan of the practice of not trying out a possibly good idea out of fear of “outrage, warranted or not.” I am on record with concerns about prediction markets (see here for my reactions to a terrorism futures market that was to be run by a convicted terrorist) but I don’t think prediction markets are necessarily a bad thing nor do I think they are necessarily anti-scientific or unscientific. You might be bothered by it, and that’s fine—I agree that if enough people are bothered by these markets, it’s a concern—but I don’t see that you (or I, for that matter) have any authority to declare it as “not a serious scientific endeavor.” And, yes, why not bet on the results of upcoming clinical trials? Companies and governments are already “betting” on such things in practice by making investment decisions. I don’t think prediction markets are magic, and, based on my layperson’s understanding of the troubles of our financial system, I do think markets should be regulated, but I don’t see it as automatically a bad thing to introduce a formal betting market. The stakes here are low and, as I’m sure you’re aware, researchers are in practice “betting” on their studies all the time, for high stakes on the order of a good job if their study succeeds, no good job if their study fails. It’s my impression that these replication studies are much less subject to these sorts of pressures because they are preregistered.

    • This actually seems like a fairly interesting project.

      To me the question they seem to be answering is not “will this study replicate?” but “can researchers tell which studies replicate?”

      I feel that getting more information on the second question is a worth-while endeavor, and fairly relevant if we want to criticize those who publish seemingly obvious “spurious” results. Looking through the abstracts of the replicated paper, it wasn’t at all obvious to me which results would be easily replicable, and those that would not be. I’m not sure if a researcher with domain-specific knowledge would have an easier time. I’m looking forward to seeing if a pool of researchers can usually come across the right answer.

      While their definition of replication could be made broader, I think replicating the original study as faithfully as possible (with a potentially larger sample size) does give a good indication of whether or not the original result was due to noise. This doesn’t make the result a robust finding, but it does get rid of one potential source of error.

      I’m not so worried about the potential financial ramifications of the study. The researchers are asking participants to put in a good chunk of time to thinking about, and playing with the market. If each participant spends about 5 hours working on it, the payoff is about $10 an hour, which is well within standard experimental subject rates. I’m not sure they need to set up a market for this. Maybe a simple survey would have worked? They’re doing both, so time will tell which one is more likely to be correct.

  3. Not being an ethicist, I don’t have a precise ethical account distinguishing when prediction markets are appropriate, and when not and why. I don’t think they’re bad (I play the stock market myself, and in this respect, do bet on clinical trials. But i would consider it ghastly to hold a public vote on whether a particular patient survives.) I didn’t explain that the concern in this case is simply that I feel it is hurtful to the fledgling effort, which really needs to have well-scrutinized and vetted standards for what counts as a so-called successful/failed replication. The previous rounds demonstrated that these issues are quite shaky, being developed as they go along, and not at the stage to bear the weight being placed on them. I don’t see evidence that the replications constitution genuine “checks” or methodological advancements on the earlier work. The gaps between statistical and substantive, the unclarity about effect sizes to be shown, and the handling of rival methodologies and challenges have scarcely been unexamined. It’s not prediction markets that are unscientific, it’s that by adding to the sports/circus-like atmosphere, this particular enterprise grows more frivolous. The fact that individuals are personally impacted here is also a large part of why it seems in poor taste. But I spoze if that’s how psych wants to be viewed, it’s their choice.
    http://errorstatistics.com/2014/06/30/some-ironies-in-the-replication-crisis-in-social-psychology-1st-installment/

  4. True. But later he changed his mind (due partly to the bitter dispute with Neyman and Pearson).

    In Statistical Methods and Scientific Inference (1956) he famously derided the fixed level of

    significance as “absurdly academic, for in fact no scientific worker has a fixed level of

    significance at which from year to year, and in all circumstances, he rejects hypotheses; he

    rather gives his mind to each particular case in the light of his evidence and his ideas” (p.

    42). Instead, he argued that researchers should communicate exact p-values. Then again, Fisher

    was not the most consistent writer…

    • He may have changed is mind about using a fixed level, but he didn’t change his mind about the frequency interpretation of probabilities which is what was being referred too.

      There’s also a hilarious (in retrospect) Fisher quote where he says basically you can’t go much wrong rejecting null hypothesis for having a small p-value. The reasoning was that if the null is false, you were 100% right to reject it. If it’s true, then you’d only falsely reject less than alpha% of the time. So that alpha% represents a lower (expected) bound on the error rate for him.

  5. Mayo:
    You write:
    “When I first heard it I thought it was possibly a joke. It strikes me that the scientific credentials of the enterprise is already sufficiently shaky to avoid making a game out of it.”

    Then you backtrack on the scientific part:
    “It’s not prediction markets that are unscientific, it’s that by adding to the sports/circus-like atmosphere, this particular enterprise grows more frivolous. The fact that individuals are personally impacted here is also a large part of why it seems in poor taste. But I spoze if that’s how psych wants to be viewed, it’s their choice.”

    One can argue about the merits and ethics of prediction markets, but it does not seem particularly scientific to judge “psych” from one instance of a prediction market that is run by economists from the Stochholm Shool of Economics.

Leave a Reply to Mayo Cancel reply

Your email address will not be published. Required fields are marked *