Erikson Kaszubowski writes in:

I missed your call for Stan research stories, but the recent post about stranded dolphins mentioned it again.

When I read about the Crowdstorming project in your blog, I thought it would be a good project to apply my recent studies in Bayesian modeling.

The project coordinators shared a big dataset (with 124,621 cases) and each research team had to independently analyze the data and answer two research questions:

1) Are soccer referees more likely to give red cards to dark skin toned players?

2) Do referees from countries with high skin tone bias are more likely to be biased towards dark skin toned players?Given the data structure (each case is a player-referee dyad, with variables about how many games occurred between them and more) and inpired by the recent reading of ARM, I thought a multilevel binomial-normal regression could be a good model to analyze the data.

I initially created a model using a different Bayesian software, but it only worked in small samples of the dataset. When I tried to analyze the whole thing, this other program couldn’t get off the ground. So, I decided to give Stan a try… And it worked like a charm!

The project article is still being written, but all analyses are already published in the Open Science Framework. Here’s the link for my analysis.

A short report, source codes and Stan chains are all there, in case anyone is interested.

I know the model isn’t such a great novelty and there is plenty of criticism to be done about what I did. But we can say, at least, that when people first crowdstormed a dataset, Stan was there!

Thank you and all the Stan team for such a great tool!

I haven’t looked at this in detail so don’t take this post as an endorsement of this particular model, coding, or data analysis—but it does demonstrate the success of our goal of allowing people to fit models directly, with a minimum of fuss, so that users can focus on the statistical modeling, not on the computation.