“Dream Investigation Results: Official Report by the Minecraft Speedrunning Team”

It’s almost Christmas, which makes us think of toys and presents, like . . . videogames for the kids and young adults in your life.

And we have a story for you, all about ethics in video game speedrunning.

Matt Drury writes:

Recently a top player of Minecraft has been exposed as a cheater using a very thorough and well presented statistical analysis of their luck during their play sessions. I was really impressed with the analysis, their dedication to fairness and applying multiple corrections. There’s even implicit attention to the garden of forking paths.

Here’s their paper on the approach and results.

They also published a video with an overview of the methods and results (less details of course).

I know nothing about Minecraft except that it’s a videogame that’s popular with kids. But I guess this might interest some of our readership?

I asked a local expert, who characterized the above-linked paper as “trivial but impressive.” The local expert was not so impressed by the rebuttal offered by the player accused of cheating.

P.S. More here on how this sort of dispute can be resolved.

57 thoughts on ““Dream Investigation Results: Official Report by the Minecraft Speedrunning Team”

    • Minecraft has achievements and has credits that run after the ender dragon is defeated. This gives the game two main categories to speedrun: 100% and any% respectively (there may be other arbitrary categories). A series of any% speedrun (attempts) done in the 1.16.1 version of the game were investigated, leading to the reports discussed here.

  1. If only the stuff people call “science” were to the standards of science this exhibits.

    Also, seriously, Java uses an LCG when mersenee twister has been available since the 90s? And it seeds using the time and some constant instead of grabbing a crypto random seed? Those two fixes are trivial. Even with the crypto random generator I’m sure they could do this thousands of times a second in a second thread… Meh lazy

    https://www.geeksforgeeks.org/random-vs-secure-random-numbers-java/

    • In other applied statistics, most of the interesting work comes out of the fact that the true family of functions is unknown and whatever parameterization we come up with as a hypothesis is almost surely an incorrect approximation, and still must be divined by a magical process of thinking about domain knowledge. In this case, the true family of functions can be found through the code, an ideal null hypothesis is constructed thereby, and uncertainty is created by a true (pseudo) random number generator. The probability here isn’t open-ended modelling problem, but a textbook problem with a correct answer.

      The most interesting analysis here from a critical reasoning perspective is looking at Java’s random number generator. That’s some attention to detail — nothing gets past these people. This honestly deserves to be in coursework as a didactic example.

      • First―before you feel attacked―I agree with you. I am merely commenting because your otherwise excellent argument neglects to explicate [to the broader public] what is meant by “the interesting work”. Anecdotally speaking, I am no statistician but do I understand it correctly to contextualize statistics to be a tool that can be used to uncover the behaviour of our physical reality where no exact theorem exists? If so, I agree to that point also. If not, do you mind elaborating?

  2. I don’t think statistics can prove that the speedrunner cheated, or really prove much of anything in the real world. There’s so many confounding factors and unknowns that can change the analysis so much. Even Dream’s expert, who has a PhD in stats, and the other anonymous “experts” disagree.

    Since stats can’t prove anything people should just come to their own opinions on who to believe. I don’t get why so many people are focusing on the statistics to determine whether or not he cheated.

      • I’m not saying statistics is useless – there’s obviously cases where it’s useful (mostly problems that are easy to model). But on problems that are complicated enough that assistant professors and PhDs are disagreeing about the conclusions, how can you come to a conclusion just from the statistics?

        • Thomas:

          I’ve not looked at all at this Minecraft thing so I have no comment on that. But, speaking generally, yes, sometimes Ph.D.’s can disagree about conclusions but the statistics can be clear from the outside. For example, the ESP guy and the pizzagate guy were Cornell professors with Ph.D.’s, but it’s clear that their statistical analyses are just wrong.

        • Perhaps a bit of a bold request here, especially during the holidays, but would you care to look into “this Minecraft thing,” or perhaps send it along to someone credible willing to speak publicly on the matter?

          I think a lot of people in the situation are looking for a verifiable person with a degree to listen to here. All we have are a semi-anonymous moderation team and a fully anonymous “astrostatistics and astrophysics doctor.” Of course, not everyone with a Ph.D is correct, and they are sometimes willing to let personal bias or conflict of interest get in the way, but to the layperson all they see now are unverified people on the internet claiming opposite things about one of the most rapidly growing content creators on YouTube.

          I know you just describe Minecraft as a game popular with kids, but Dream’s videos regularly get around 50 million views, and many of those viewers are willing to defend him because they can’t understand the statistics, and see no one with verifiable credentials on the matter saying which statistical analysis is more reasonable.

          I understand this request might seem unreasonable; it’s just my own social media feed seems unable to escape the cycle of “well how can you trust that paper, we don’t even know if the person who wrote it understands statistics!” And I’d kindly like someone who understand statistics more than this Comp Sci undergrad to weigh in.

        • Tom:

          This is just so far outside of my areas of expertise and interest that I have no plans to look into it. My reason for posting was that I thought it could interest some of the blog readership, not necessarily the same readers who are interested in posts on baseball, football, and chess.

          I expect that, one way or another, this will get sorted out within the Minecraft community, in the same way that various open questions have been resolved within scientific communities, via some combination of analysis, discussion, and replication. But I expect the controversy will never go away. Consider some examples from different areas of science and society:

          Cold fusion. People were skeptical even at first, and now only a few physicists think it’s real. But there are some who continue to work in the area.

          Embodied cognition. Was believed by mainstream psychology, now I think is a minority view, but it still has some loud defenders as well as a pretty big group of researchers who, whether or not they believe in it, don’t want to think too carefully about the implication of the failed replications for scientific knowledge more generally.

          Hot hand. Most social scientists thought the hot hand was a fallacy, then the claimed fallacy was revealed to itself be based on statistical fallacies, now I’d say that people are divided on whether there is a hot hand or how important it is.

          Global warming. Believed by vast majority of scientists, but some skeptics remain, and this is also connected to politics.

          Beauty and sex ratio. Most people don’t care about this one at all, but I suspect that many people who do care are believers, despite the lack of any good evidence. There’s some selection bias here, because the sorts of people who care about this in the first place will include many people who are committed to the underlying theory.

          Claims of massive fraud in 2020 election. Lots of people still express belief in this, despite the lack of any good evidence.

          The point is, all these topics have some remaining controversy, for better or worse. Controversy can go away within informed subsets of the population but remain elsewhere. To some extent, issues can be resolved not just by reanalysis but by new data. In this case, I guess it would be for this person to do another speedrun under controlled conditions? Again, when it comes to videogames I have no idea.

        • Just reminding that never is a long time and neither does it take technological advances into consideration. Peace.

        • Frithjof,

          Never is indeed a long time, but given that lots of people still believe in astrology, ghosts, the literal truth of the Bible, etc., I think there could well be believers in cargo cult science in a couple thousand years. It might be a niche belief, though. Fads come and go. We don’t seem to hear much about Bigfoot or the Loch Ness Monster anymore. But the idea that you can cause major changes in people’s beliefs through subliminal smiley faces, for example, I think that appeals to many people in the same way that a belief in casting spells appeals to people: it’s a source of power and also a way of understanding some of the world’s mysteries. I guess there must be some literature on what sorts of beliefs remain popular over time and what sorts of beliefs fade away.

        • Thomas, this is not a case of verified professors disagreeing, this is a case of one, unknown, not verified, person claiming to have a PhD versus multiple, verified, people with PhDs. For any reasonable human the conclusion is quite clear.

        • Except the problem isn’t complicated as you are making it. You’re just delusional and willing to be completely blind as to what people are saying. The issue isn’t that they disagree if the number was off by a little. The issue was how Dream’s rebuttal was done, the methods used are absurd to make him look good. Your entire “proof” consists of “but he said he’s legit and knows what he is doing and can’t possibly be wrong!” Have some class.

    • The stats show definitively that the events could not have happened if minecraft weren’t altered from its default settings. Even “Dream’s” expert (who has a PhD in astrophysics) actually agrees. The analysis of other very good speedrunners in the original shows that the model is appropriate, and that good speedrunners typically get “luck” below the 99.9th %tile. Dream’s “luck” diverges widely from the others in a way that is totally implausible absent cheating.

      The rebuttal gave it closer to 1 in a million instead of trillions, so it’s like saying “you technically didn’t kill him, it was the bullet” a distinction without a difference. Also in this case because it **really is** a random number generator, the frequentist analysis is absolutely relevant, probably more relevant than the Bayesian one.

        • Exactly, when we actually investigate random number generators (which the minecraft results ARE) then frequentist statistics is the exactly correct way to show that the RNG is behaving as programmed. The inappropriateness of using null hypothesis testing in science is largely that most of science **is known before hand** to not be a well validated RNG, and so you first assume a thing you KNOW is false, and then prove that it’s false… which gives you zero more information than you had before.

          That’s why not-rejecting a null hypothesis is the more interesting result in science, it tells you that you could treat the science “as if” it were a RNG for certain purposes.

      • You’re cherrypicking from the report, and skipping some other additional assumptions.

        > That is, external evidence that the probabilities were modified at this specific
        point would be needed to produce a significant probability of cheating.

        > I disagree that the
        situation suggests that [cheating] is an unavoidable conclusion

        Also, he’s an assistant professor. So on one hand we have an assistant professor, and on the other we have anonymous PhDs on the internet.

        As a fan, how can I come to a conclusion about this from the math alone? You say he cheated, the verified assistant professor disagrees that it’s the only conclusion. Regardless of the points that he agrees on, there’s clearly multiple valid conclusions from experts.

        • > As a fan, how can I come to a conclusion about this from the math alone?

          The math in the rebuttal is quite clearly wrong. So you can come to this conclusion by understanding the math. But this raises an interesting question — for any sufficiently technical subject, most people are simply not going to learn enough to understand all the jargon. Not just statistics—in physics, if a PhD makes claims about the EM drive with a pile of nonsense integrals and differential equations, and other PhDs draw a correct pile of integrals and show how it violates gauge symmetry, most people cannot know the difference, no matter how nonsense the former might be. Hell, if a guy claims to square the circle, and someone else shows it’s impossible, most people cannot critically evaluates the proof. In the end, it’s trust all the way down. So no matter how detailed and correct they’re being with the math here, if the function is rhetorical, for the purpose of convincing the people who matter, who don’t know a binomial coefficient from a parenthesis, it might as well be a random pile of symbols and jargon. They might as well have made it all up.

        • As an aside, even assuming this guy is an astrophysics PhD, please don’t assume that’s equivalent to expertise in statistics. Physicists tend to have very little statistical training, and get by on the parsimonious models and easily estimated distributional forms. In my undergrad degree, it was suggested that I take zero courses in probability and statistics. You end up with physics people unaware of very basic 20th century tools like maximum likelihood estimators

          http://www.stat.cmu.edu/~cshalizi/2010-10-18-Meetup.pdf#page1

        • Mate, you’ve already come to a conclusion, and you’re sticking to it for dear life. Andrew is not an anonymous internet PhD. Andrew in fact has a proven PhD in statistics from Harvard, which is where Dream claims his anonymous professor did his. Andrew and the guy who did Dreams paper might well know each other, but here we have someone who is proven beyond doubt to be legitimate.

        • Andrew said “I’ve not looked at all at this Minecraft thing so I have no comment on that.”

          The only comment he made about the validity of the claim is quoting an anonymous “local expert”.

        • Could you link to where the identity of the person Dream hired is verified? Because people (namely, Dream fans and Dream himself) seem to love pointing to having an expert with a PhD on their side despite the fact that the paper and website are both listed as being anonymous. You put an awful lot of faith in someone with no name and no way to verify their credentials.

        • You fail to realise that the stopping rule does not apply and there’s no way someone with a PhD could make that kind of mistake. Unless you really are that dense to be unable to realise that the stopping rule doesnt apply because there were multiple streams?

        • I’ve responded to other people with the same statement, namely that the credentials of the expert have been verified by multiple independent sources, including the authors of the original accusation.

          Tbh, I think it’s quite revealing that your first reaction when somebody credentialed comes to a different conclusion is that they must be lying. Perhaps there are just multiple reasonable conclusions?

        • No, the math is quite clearly flawed in the second analysis, which is why they’re coming to the conclusion that it’s flawed. By reading it. There’s only one person here who is instead redirecting the argument to be about author credentials rather than about the content of the analyses themselves.

          Either way, it’s irrelevant, because even the analysis from the person that Dream hired still shows that his drop chances being modified is basically a certainty. The semantics or credentials don’t really matter at that point.

        • > …there’s no way someone with a PhD could make that kind of mistake.

          You do realize that PhD stands for “Doctor of Philosophy”, right? Because both of my parents are PhD’s and they would have seriously hard time solving a simple quadratic equation.

    • It’s ridiculous that you’re calling Dr. Andrew an anonymous internet Ph.D. Clearly you’ve come into the comments with an agenda, I’d like to remind you that emotionally charged subjective opinions from science deniers have no place talking in circles where things are discussed objectively.

      Also, the cheater’s “expert” probably doesn’t even exist or is willfully using incorrect Maths (as evidenced their paper) to reach a biased conclusion in favour of his sponsor. Statistics are used across every discipline from the mundane to life-saving tech; and a lot of the time it isn’t hard to figure out when it is being misused. To say “stats can’t prove anything” is an objectively ignorant comment.

      • Andrew said “I’ve not looked at all at this Minecraft thing so I have no comment on that.”

        The only comment he made about the validity of the claim is quoting an anonymous “local expert”.

        In addition, the identity of the expert has been confirmed by multiple independent sources, including the author of the original statement: https://www.reddit.com/r/DreamWasTaken2/comments/kjuqft/speedrun_discord_mod_confirms_the_credentials_of

        You’re wrong on both facts.

        I worded my original statement poorly. Obviously there are cases where stats can be useful. But in cases like this where so many different assumptions can be made it’s hard to come to any definitive conclusion from the stats alone.

        • The usage of stats in this situation is perfectly valid. Unlike in other real life applications, the probability of an event occurring will stay the same. ie ender pearls will always have a 4.73% chance of dropping during a barter. The cases where stats should not be used is when determining the probability of an event occurring is inaccurate and always changing. Using stats in cases like this, with a set probability are perfectly valid.

          And also, just because the man dream hired does have a credible phd, the content of his paper is extremely flawed. Not only is the content flawed but he is also financially motivated to side with Dream. His extremely flawed implementation of the stopping rule makes it clear that he either has very little statistical knowledge, very little domain knowledge, or is financially motivated to produce an intentionally flawed report. Even with the extreme biases, the odds still come out to 1 in 10 million which is very significant.

        • This over and over. The very first error Thomas makes is to conclude that somehow Dream’s own consultant analysis doesn’t agree with the first analysis. They BOTH show that Dream cheated.

        • Thomas said,

          “But in cases like this where so many different assumptions can be made it’s hard to come to any definitive conclusion from the stats alone.”

          Using statistical inference only makes sense when there is a lot that is unknown, and it can’t remove all of the uncertainty, so we can’t rationally expect it to give “definitive conclusions”.

        • I would rather trust Dr. Andrew’s notion of a “local expert” than Dream’s, whose claims are from a person who is purportedly on trial for a deceptive practice by the way. Might make more sense to be as unequivocal as possible especially in these sorts of situations, don’t you think?

          Please link where the original authors have verified the qualifications of Dream’s author because the anonymity also happened to also be a big contention by the mod team as far as I know.

        • > But in cases like this where so many different assumptions can be made it’s hard to come to any definitive conclusion from the stats alone.

          This is not one of those situations—one group is right, and one group is wrong. On the trades, the stopping rule on individual runs does not affect the distribution because data were pooled across multiple streams, including unsuccessful runs. Only the stopping rule on streams itself affects the likelihood function, which causes only the last record stream to be not exchangeable, and that’s accounted for in the first analysis. There’s also a coding error that I found just skimming the rebuttal — the way they’ve implemented it, giving 8 pearls in a barter has probability 0 while the comment states it should be 4-8 pearls.

          1. This is not a situation where you can go “there are different but both plausible assumptions which lead to different answers”—there are clear mathematical and computational errors in one paper.
          2. “Expert” in one field =\= expert in another. An astrophysics PhD is not an expert in statistics. People seem to have this idea that all fields with mathematical training are essentially interchangeable.
          3. It’s moot anyways because as others have noted. even the rebuttal concedes that Dream more likely than not cheated, just objects that it’s not proof to a legal standard of beyond a reasonable doubt and makes a bunch of technical errors
          4. This is the most important to me; do you really think that as long as you have n>=1 person with a PhD in any mathematical field disagreeing with a statistical analysis, then statistical analysis is pointless? Because there is ALWAYS at least one PhD who disagrees, ESPECIALLY if you’re allowed to pay them to.

        • Thomas

          There is one simple statement I can make when replying to your comment. In the case of statistics, the credentials of a person writing a report does not matter when the contents of the paper is objectively correct. However, this statement goes both ways, even if a person is the most qualified to analyze the situation, when the paper is objectively wrong it does not give it any more credibility. Having reviewed both the 29 page document of the cheating allegations and the 19 page rebuttal, it is clear that the 29 page document is far more accurate than the 19 page document. If you would like to further analyze the situation I implore you to fully analyze both papers when you have the time. The 19 page document written by the PHD assistant professor actually contains a multitude of mathematical and coding errors that can be spotted. Such as using the 4-7 pearl drops instead of 4-8, including the 5 previous streams etc. When talking about reddit, a known verified user known as u/mfb- has taken apart this paper. Before calling them anonymous, they are verified because they had to submit their credentials to the moderator of the subreddit to get the role, so if you want credibility, he has more than what the assistant professor has. (Also the person that confirmed it was a discord chat mod, not an actual part of the moderator team)

          In short, you cannot simply throw around credibility without analyzing the paper itself.

          When it comes to statistics there is a point of which something can be called statistically impossible, and having read both papers and knowing the context of the situation it is easy for the moderator team to not only deem the run as illegitimate, but the 6 streams as illegitimate. This is because all 6 of these streams are consecutive streams done in several days, whereas the first 5 streams were done months before these. Don’t get me wrong, the 5 streams could be included as part of extra analysis, but not as the main analysis because that is not the objective of the investigation.

          To be perfectly honest, I do not think you will like this blog a lot. This blog is more used for an academic discussion, analyzing mistakes people have made in statistics and praising those who have done well, not simply using credentials to overpressure the opposing side. When it comes to credentials it is easy to trust someone more qualified when the person reading does not understand the context and the content, so if you can please do take the time and look at the reports carefully. If you have made your own decision that Dream did not cheat in this video game, please provide proper reasoning.

        • Their legitimacy is being compared on the basis of the quality of statistical analysis that they present, which the browsers of this blog are *qualified* to judge. Both papers come to conclusions that indicate extremely low probabilities, one is just higher by some orders of magnitude (albeit not enough).

          You’re taking the completely wrong approach if you want to get anywhere here. Understand statistics before you attempt to provide commentary on their value.

Leave a Reply to Daniel Lakeland Cancel reply

Your email address will not be published. Required fields are marked *