This post is by Lizzie.
This year’s International Cherry Blossom Prediction Competition has closed and we can now wait to see how contestants’ and the bot’s predictions fare. Results are up on the website, with average predictions around late this month (DC, Vancouver, Liestel) to early April (Kyoto and New York City). The bot predicts DC and Vancouver to be earlier than the average of the contestants’ entries and later for New York. For Kyoto and Liestel (which, incidentally have the most historical data) the bot and average of the contestants’ entries are quite close.
Thanks to Yu-lin Hsu for being our “AI-handler” (using ChatGPT o3-mini-high), co-organizers Jonathan Auerbach and David Kepplinger — who do most of the work compared to me — and all our great sponsors, supporters and contestants.
We’ll find out the winners once the trees blossom!
(Incidentally, I saw the plum trees starting to bloom last week on the road I cycle on to get to work. It felt early! I have spent the last couple years thinking they would be early so perhaps the year I am distracted is the one year they might sneak up on me.)


Ow, I remember this cherry blossom stuff from a year ago, or a few years ago on this blog. I made a poem about it at the time, and posted it, but I can’t seem to find it now.
That made me wonder, and after that wondering it made me realize that whether or not that poem is still depicted and/or can still be found may not matter much.
I reasoned that the poem may have already had a purpose and/or may have already shown its full potential during that possible limited time. Just like cherry blossoms perhaps.
Lizzie:
I don’t like the above graphs, for two reasons. First, the day of the week is irrelevant–it’s a waste of a dimension. Instead the data could be conveyed with a simple one-dimensional time series, which has the advantage of taking up much less space and then you could easily many such series in a vertical display. Second, the graphs draw attention to the “shape” of the months on the graph and how they fit together like puzzle pieces which is another irrelevant factor that distracts from the patterns of interest. Meanwhile, the outcome of interest is awkwardly displayed as shading, which is hard to read quantitatively. Youall can do better!
On the website it says “The calendars below show the days the contestants predict the peak bloom date will occur.”. So, perhaps the calendar-focus in the website and/or the prediction-part and/or the framing of the research resulted in a (possibly unnecessary) focus on the word calendar which may have influenced the choices regarding the graphs.
I’m not 100% sure but it seems to me that the little AI robot face is different on the graphs compared to the text on the website. If so, perhaps that should be changed so that the image is the same in the graphs and in the text on the website where it is explained that the image in the graph represents AI.
Of course there are examples such as economic data where day of the week is important, so I’m not saying that the calendar representation is never a good idea!
The shape-of-the-month artifact reminds me of something I’ve seen in other applications, where the main function of a graph is to inadvertently show something other than the data of interest. Another example are so-called cartograms, for example maps of the U.S. where each state is swollen or constricted so that its area on the graph is proportional to the state population. I understand the goal of equal-area representation, but the unfortunate result of the cartogram is to draw attention to the state population densities–a puffed-up New Jersey, a shriveled Montana, etc.–rather than to whatever variable is actually being plotted.
I don’t know much about graphs, or predictions, or cartograms.
Perhaps they could keep these calendar graphs and add to them a line below them, with the days/months from left to right and the different colored/shaded blue squares depicted on the line. I am not sure if that makes sense, and what such a depiction is called in the world of depicting data in some form.
Perhaps that supplements the calendar graphs with a depiction that shows the range (?) and frequency (?) of the predictions in a different manner.
I don’t know if I understood things correctly though, and it might be best for me to stick to writing cherry blossom related poetry. Nonetheless, I wanted to share if possibly useful in some way, shape, or form.
At first I was going to agree with you. But then I asked myself, ‘What would a non-statistician think?’ For a statistician it might be an unorthodox shape, but it is a calendar after all. The layout chosen may be more appealing to the non-statistician audience because it is associated with something they are familiar with.
There is also the question of the purpose of the graph. There can be no argument that the distribution of blossoming date is difficult to be read off that graph. But perhaps the purpose is for people to find the time when they can look forward to seeing the cherry blossoms? In such a case, a calendar would be more intuitive for people who are not otherwise familiar with graphs of distributions.
Although a histogram would have told us more about the distribution of predictions, I would not be so quick to dismiss the calendar graphs. The message it conveys may be exactly what its creator intended.
I agree that the purpose may not be statistical literacy but more entertainment or enthusiasm – but I don’t see any purpose served by the days of the week. That “information” destroys any sensible reading of the graph and doesn’t do anything for any other purpose. In fact, it doesn’t look like a calendar to me. If you want a calendar, why not exclude Jan and Feb and have 2 months that are laid out like a calendar with shading representing the frequency of the blooms (more of a horizontal than a vertical format). I still don’t think the days of the week are useful, but if you want a “familiar” calendar it would serve that purpose.
I thought about this horizontal format of a calendar as well, if I am understanding you correctly. I think that might make it different and more difficult to show the data because the days and months show differently in the horizontal version. I am not sure though and I can’t picture it at the moment. I think that’s perhaps why this vertical calendar format was chosen as it might show the range and frequency more optimally. But I am not sure.
It is interesting though looking at the comments here how depicting data is hard to do sometimes. I can see it being an interesting topic to teach, and how you could show different graphs/depictions to show something and how or why something might be better or worse. I also reason there is likely some subjectivity involved in all of it.
I’m thinking just something like this: https://cdn.calendaroptions.com/images/large/march-april-2025-calendar.png with shading for the frequency of the first blooms. The standard calendar has the advantage of familiarity which the histogram would not, but with the meaningless day of week display and the missed opportunity to educate the viewer about distributions.
Yes that’s what I had in mind as well, but I then wondered whether the data looks differently using that horizontal version. For instance, in your linked to example of a horizontal version the point from the end of march to the beginning of april shows differently than the vertical version. I think the range and frequency might show more optimally on a vertical calendar version, but I am not sure. Would be interesting to hear from the people who made the graphs if that was a factor that was taken into account when choosing this vertical version.
I agree there are drawbacks here in the display (the biggest for me is the point that two consecutive dates can appear far apart), but I am also not sure exactly how to do better since I find communicating calendar dates to a broad audience difficult. People often do not like the displays that I do and how people interpret calendars is not obvious to me (I usually use day of year and treat it in a linear direction, from left to right).
I think that some of the cherry blossom festival folks really do want to know if peak bloom is near/on a weekend.
It’s surprising that adults – scientists no less? – find a linear arrangement of days confusing. To me the obvious choice is to display days linearly from day 70? after the solstice, with similar shading. Linear arrangement of the days alows comparison of each city. You could show two lines for each city: historical bloom frequency and the predictions of the current contest. Then the width of the shaded zone of historical blooms would correspond approximately to the shift in first bloom over the period of the measurements.
The calendar days fop the current year could be displayed at the top of the chart. They don’t matter for previous years, right? If people are so baked about seeing the weekends, you could mark weekends with a vertical shaded band.
Also, since cherry blossoms are pink…why blue? :)
Maybe the intended use of the calendars is to allow nonstatisticians to look up the likelihood of peak bloom on a particular day?
Speaking of plum blossoms, here’s a gallery of photos taken Feb 28 (or maybe March 1; my camera’s on local time but Lightroom is often 12 hours behind…) that includes three (of 9 photos so far (I haven’t gotten out much this year (it’s a long story))) plum photos.
Here, the US college alum clubs have all agreed that Sat. March 29 (or Sun. 30 if Saturday rains), is cherry viewing day. (Which is to say, day of the week is important to some folks, since getting everybody together on anything other than a weekend only works if you all work for the same organization…)
https://pbase.com/davidjl/photos_2025
Go plum blossoms!
And I see in inadvertently repeated your point because I failed to read all the comments first.