Why not spend your February modeling cherry blossoms?

I am emerging—momentarily—from teaching to announce (this post is by Lizzie) …

The 1st International Cherry Blossom Prediction Competition!

We are pleased to announce a new international prediction competition “When will the cherry trees bloom?” Help scientists better understand the impacts of climate change (and we have prizes)! The competition is open to all.

Us competition organizers are providing all the publicly available data on the bloom date of cherry trees we could find. Competitors will use this data, in combination with any other publicly available data, to create reproducible predictions of the bloom dates at four locations around the globe.

The competition is open throughout February 2022 and seeks statisticians and data scientists of all levels, from experts to students just beginning to use statistical software. Complete submissions include a short narrative and a link to a publicly accessible Git repository.

For complete details or to contact the organizers, please visit https://competition.statistics.gmu.edu. A recording of the kickoff event is available on the competition website.

A big thanks to the American Statistical Association, Caucus for Women in Statistics, and George Mason University’s Department of Statistics and the Columbia Department of Statistics for their support, and partnerships with the International Society of Biometeorology, MeteoSwiss, USA National Phenology Network, and the Vancouver Cherry Blossom Festival—as well as Mason’s Institute for a Sustainable Earth, Institute for Digital InnovAtion, and the Department of Modern and Classical Languages.

Organizers: Jonathan Auerbach and David Kepplinger (George Mason University) and Elizabeth Wolkovich (University of British Columbia)

15 thoughts on “Why not spend your February modeling cherry blossoms?

  1. Isn’t this contest dependent upon more luck than skill? For instance, assuming that there is a dependent variable that predicts peak blossom time, in the data set, and this variable is very weather sensitive. If, then, the weather conditions diverge significantly from the historical means, in the data set, than the model won’t be predictive.

  2. Seems strange to me for a statistics department to be promoting prediction of a point estimate for a single date (checked out their reporting format) when the only truly useful and valid prediction would be for an interval of dates.

    • Hi Brian,

      Point well taken. We thought about asking for prediction intervals. We ultimately decided it would be too confusing for undergraduate students.

      • Hi Brian, Following up on Jonathan’s comment (yes, days later … teaching feels especially hard this term) … as Jonathan said, we discussed it! We thought it would be super cool, especially if we could visualize those intervals as they came in; but that point estimates are a good place to start. If we keep it going (which we hope to do), perhaps we can have different levels that people can enter — point estimates or intervals (we’ll just have to think about how to adjust things to encourage submissions to the intervals).

        • This comment is not directed at this competition, per se. But the discussion about what undergraduates can handle is one that strikes a chord. I have long objected to the way textbooks are designed with fairly clean data sets to emphasize particular techniques. The thinking is that you need to start somewhere and real world messy data sets should come after the initial mastery of fundamental topics. There is much to be said for this approach. However, the downside is that students come expect straightforward clean data and have trouble when it isn’t. Similarly, if you learn to look for point estimates, it may be harder to gain an appreciation that ranges are what you should look for. Given how much many discussions have focused on the shortcomings of dichotomous thinking, perhaps we should start asking where such thinking was learned along the way. I think there is a real possibility that if we started with students looking for ranges, they might end up less focused on point estimates.

  3. Pretty cool. In terms of theme in any case.

    I forwarded to a few students and one question came back in various forms:
    Can the contestants write a paper using the data (to submit) or perhaps use it in a term project or so. I have sent it to those who I’d guessed be most interested (3) and my guess is that the questions were motivated because they may want to pursue graduate study and hence want to try to submit something. So rather pecuniary incentives, the prospect of publishing (even the tiniest chance) seems relevant. I guess this says something about the admission process these days (no way I would have even considered this as an undergrad).

    • Hi Louis,

      Yes, we encourage contestants to share and publish their ideas! We also encourage teachers to use the data in their classes. All data are publicly accessible, we just ask users to cite the original source.

    • Yes! Just agreeing with what Jonathan said, the data are there to be used (and please properly cite the original source). We aimed to get all the data we could and the community working on these data was super helpful in sharing them. We only have a few datasets we never quite tracked down… perhaps another year.

      [Am I so old that I got into grad school without any publications? I actually got my first postdoc without any publications too…. wait, no need to answer this. I also picked out who to email to ask if they wanted PhD students by going to a university library and flipping through paper journals, so I have answered my own question.]

Leave a Reply

Your email address will not be published. Required fields are marked *