Crowdstorming a dataset

Raphael Silberzahn writes:

Brian Nosek, Eric Luis Uhlmann, Dan Martin, and I just launched a project through the Open Science Center we think you’ll find interesting.

The basic idea is to “Crowdstorm a Dataset”. Multiple independent analysts are recruited to test the same hypothesis on the same data set in whatever manner they see as best. If everyone comes up with the same results, then scientists can speak with one voice. If not, the subjectivity and conditionality of results on analysis strategy is made transparent. For this first project, we are crowdstorming the question of whether soccer referees are more likely to give red cards to dark skin toned players than light skin toned players.

The full project description is here.

If you’re interested in being one of the crowdstormer analysts, you can register here.

All analysts will receive an author credit on the final paper. We would love to have Bayesian analysts represented in the group. Also, please feel free to let others know about the opportunity, anyone interested is welcome to take part.

I have no idea how this will work out but it seems to be worth a try.

17 thoughts on “Crowdstorming a dataset

  1. I think the overall idea is sound. However, the implementation is lacking. The referenced project prescribes a research question/hypothesis that should be tested. What if my analysis comes to the conclusion that the question is bollocks or that the data do not bear on the question?

    Also they write:
    “The available dataset provides an opportunity to identify the magnitude of the relationship among these variables. It does not offer opportunity to identify causal relations.”

    How do we know that? Is there some confounding factor that has been intentionally omitted from the data set? Or are analysts not permitted to use models that utilize causal knowledge or estimate the causal effect?

    Furthermore, discussion of theoretical background is framed in strong causal terms. Eg:
    “Research and theory on the roots of perceptual biases in cultural socialization suggests growing up in a society that favors light over dark skin should ingrain such prejudices in individual members of that culture.” So the theory predicts the causal relations after all. Why not test them?

    It would be more interesting to posit the task as a prediction problem. That is, ask analysts to build systems that predicts the red cards accurately and give them lots of variables that they can mine (could we get the player pictures, instead of using the coded skin color?). A good analysis would give us an idea whether the player skin color is a good predictor. Ok some folks might use black-box approaches such as SVMs or neural nets which are good at prediction but difficult to decipher. So one may constraint the modeling options from the outset. Even better would be if the outcome would be something more closely related to the research question. They are not interested in red cards, but rather in the assimilation of stereotypes.

    • unless they’re really just interested in the best (generalizing to unseen data?) characterization of the association between two vars, this is probably just the usual reluctance to admit that it’s all about cause and effect.

    • I have no knowledge of Raphael Silberzahn so I don’t know whether he has any statistical knowledge or expertise. There is nothing in his statement that indicates he does have any statistical knowledge. It may be that he thinks hypthesis testing is what statisticians do. One of the things I learned from 40 years of statistical consulting is that the client’s initial question (and the reason for seeking a consultation in the first place)is not the question that they should be asking. So one of the skills required of a consulting statistitian is to be able to dig deep, interrogate the client (in a nice way) and identify the real problem that needs to be solved. So, I would noty be bothered or constrained much about what Raphael Silberzahn says. If I were to participate in this project I would analyse the data and report as I saw fit. I haven’t looked at the data yet but I guess I would havev lots of questions before doing anything.

      • Nice point Peter. A paper I like on this is by David Hand. See

        http://statlab.bio5.org/foswiki/pub/Main/PapersForClassCPH685/hand-deconstructin.pdf

        He would probably say that the first step for the statisticians working on this project and Silberzahn (and the others listed on the project) to clarify the different questions. Maybe there is more information once you registered, but from the brief bit I read it was not clear to me what the purpose is.

        I think this is a case where the authors are presumably actually interested in causes, but know that causal conclusions would require perhaps more assumptions than they feel comfortable taking. Of course, all stats requires assumptions, and we all know Tukey’s (1962) maxim on the importance of addressing the right question.

  2. One confounding factor is where players were born and where were they playing before playing in the current league. Certain football cultures are more “conducive” to red cards. Just have a look at the African cup or the Copa America. I don’t think the research question is bollocks, but if you find some effect, the size should be very small.

    “The available dataset provides an opportunity to identify the magnitude of the relationship among these variables. It does not offer opportunity to identify causal relations.”

    I think they want to say that if they find a positive relationship between skin tone and odds of being red-carded, they are not saying that it is due to racism on the part of the referees.

    • “they are not saying that it is due to racism on the part of the referees.”
      If they aren’t asking a causal question, a simple bivariate analysis of skin color and red cards would suffice.

  3. I generally agree with the critical tone of previous commenters. If I take research question 1 – “whether soccer referees are more likely to give red cards to dark skin toned players than light skin toned players” – at face value, then none of the available control variables (age, position, etc.) should be used. If I tried to approximately estimate a causal effect, I would want to use many of them. But the authors seem to explicitly rule this out, as mentioned above (“does not offer opportunity to identify causal relations”). So this would seem to boil down to a fairly simple analysis – a variant of counting, really – that does not yield interesting results.

    • I like the idea in principle, but I don’t think the research question can be answered from this data. They can only answer the question of whether:
      “soccer referees *were* more likely to give red cards to dark skin toned players than light skin toned players *playing in the first male division for England, Germany, France, and Spain during the 2012-2013 season*”

      This would seem to be answered by counting, as you say. Essentially we can estimate the magnitude to which each of these variables affected number of red cards, and look for interactions in the hope of building a model. This analysis should be exploratory rather than aimed at drawing conclusions.

      There are a number of other problems. I am confused as to what the population is supposed to be for this study. The research question suggests it is all soccer referees and players ever. Extrapolating outside the “sample” would require confirmation from other seasons and countries, and I doubt the effect is stationary. Deming’s enumerative vs analytic distinction may be helpful here:
      https://www.deming.org/media/pdf/145.pdf

      They also say:
      “Although concluding the null is always difficult, our large sample size gives us much greater leeway than usual with regard to concluding no evidence of bias.”

      There are two points of confusion here. First, there is a difference between concluding “exactly zero relationship between skin tone and red cards given the model” (statistical hypothesis) and “no bias” (research hypothesis). Second, there is a difference between both of those and concluding “no evidence of bias”. So the first and second parts of the sentence are talking about different things.

      In the case of concluding “no bias”, this is not a matter of being “difficult”, it affirming the consequent. Perhaps there was bias but it was unimportant relative to some other factor, so it is unapparent in the data. In the case of concluding “no evidence of bias”, OK but we need a working definition of what will count as evidence.

      Perhaps we could possibly conclude there is evidence that the effect of bias was small relative to at least one other known/unknown factor? I don’t think I’m nitpicking, that project description made me feel confused.

      • “Perhaps we could possibly conclude there is evidence that the effect of bias was small relative to at least one other known/unknown factor?”

        Does anyone have a reference that talks about this type of conclusion? In the context of strawman NHST it appears to suggest that there is actually more information contained in a “non-significant” result than significant. This makes some sense on the face of it since it is with respect to a point null hypothesis, while the “alternative hypothesis” is everything else.

  4. Pingback: Friday links: top journals vs. thumbs, history of peer review, 40 years of Darwin’s finches, squnks, and more | Dynamic Ecology

  5. Pingback: Stan comes through . . . again! - Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science

Leave a Reply to Friday links: top journals vs. thumbs, history of peer review, 40 years of Darwin’s finches, squnks, and more | Dynamic Ecology Cancel reply

Your email address will not be published. Required fields are marked *