The above is Aki’s decomposition of the birthdays data (the number of babies born each day in the United States, from 1968 through 1988) using a Gaussian process model, as described in more detail in our book.

It’s a bit surprising that significant dips on Labor day and Thanksgivings do not have bumps before or after. Probably they are eaten up by the smoothing procedure; maybe because both LD and Thnxgiving are on the fixed days of the week.

Yes, as we discuss in the book, the model could be improved by replacing the daily spikes by little functions with “ringing” so that a dip on a particular day corresponds to smaller increases on the days right before and after. In the above graphs, I think that some of the daily effects have been inappropriately absorbed into the seasonal effect.

Good point. I think it would make sense to put the top graph (trends) on an absolute scale (perhaps #births per day, as you suggest) and the others on relative scales. Also, looking at the description in the book, it appears that we fit an additive model directly on the data, but now I’m thinking it would make more sense to work on the log scale.

D.O: Bumps before and after Labor day and Thanksgiving can be seen if we plot each year separately. In that plot you could also see that the effect for Labor Day and Thanksgiving is about the same size as for Independece day. Andrew preferred this plot where we show the effect for fixed yearday, and so the effect of fixed weekday is spread in this plot.

Rahul: Scale is not arbitrary. I first used absolute scale, but since I was interested in comparing the sizes of the different effects, it required extra mental effort to calculate whether the relative changes are big or not. I used % scale, because it looks prettier than having decimals (0.8 0.9 1.0 1.1). This scale has also benefit that when I made similar figure for Finland, I could immediately see that the size of the relative effects were similar. During these years on a average Friday there were about 10,000 births. We could have the absolute scale on the right.

Andrew: the data is count data, but with so high mean counts it can approximated very well with a Gaussian model. Log scale is not needed to ensure positivity and would transform the distribution away from Gaussian.

But wouldn’t the log scale help when considering the long term trend (which moves about 20% from min to max)? Put it this way: suppose there is a fixed multiplicative effect of day of year or day of week or whatever. In the additive model, this will show up as a larger effect in 1976 (when the total #births is lowest). And, indeed, if you look at the day-of-week effects, the curve for 1976 is pretty high. It’s not the highest—1988 is the highest, presumably because there were real changes during this period with more scheduled births—but it’s up there, perhaps an artifact of the additive model for what fundamentally is a multiplicative process.

Regarding the Gaussian approximation, I wonder if there would be a way to do a multiplicative model by fitting an additive model on the log of the raw data and just adjusting the data variance accordingly. So the computation would be just as easy, it’s just that instead of approximating the binomial density with a Gaussian, we’d be applying the Gaussian approx to the density of the log of a binomially-distributed random variable.

I find it interesting that the actual number of births is highest in late September. Does that imply that while there may be fewer mothers giving birth on Christmas Day, prospective parents are busier conceiving on (or around) Christmas Day? :)

[…] Valentine’s day and fewer on Halloween, I think the right way to go is to do an analysis of all the days of the year rather than picking just one or two […]

[…] in the observed time series. For example, in the birthday example (BDA3 p. 505 and here), we can say that we have learned about the structure if we can predict any single date with […]

[…] decided to fit a Gaussian process model (following the lead of Aki in the birthday problem) with separate time series for each state and each region, with the country partitioned […]

It’s a bit surprising that significant dips on Labor day and Thanksgivings do not have bumps before or after. Probably they are eaten up by the smoothing procedure; maybe because both LD and Thnxgiving are on the fixed days of the week.

D.O.:

Yes, as we discuss in the book, the model could be improved by replacing the daily spikes by little functions with “ringing” so that a dip on a particular day corresponds to smaller increases on the days right before and after. In the above graphs, I think that some of the daily effects have been inappropriately absorbed into the seasonal effect.

One minor question: Why normalize to an arbitrary 100 scale? Wouldn’t the graph be a tad more informative if you kept actual “num. of birth units”.

e.g. How many actual births do happen on a average Friday?

Rahul:

Good point. I think it would make sense to put the top graph (trends) on an absolute scale (perhaps #births per day, as you suggest) and the others on relative scales. Also, looking at the description in the book, it appears that we fit an additive model directly on the data, but now I’m thinking it would make more sense to work on the log scale.

Or exploit the unused right hand y-axis. You could relabel that in #births?

D.O: Bumps before and after Labor day and Thanksgiving can be seen if we plot each year separately. In that plot you could also see that the effect for Labor Day and Thanksgiving is about the same size as for Independece day. Andrew preferred this plot where we show the effect for fixed yearday, and so the effect of fixed weekday is spread in this plot.

Rahul: Scale is not arbitrary. I first used absolute scale, but since I was interested in comparing the sizes of the different effects, it required extra mental effort to calculate whether the relative changes are big or not. I used % scale, because it looks prettier than having decimals (0.8 0.9 1.0 1.1). This scale has also benefit that when I made similar figure for Finland, I could immediately see that the size of the relative effects were similar. During these years on a average Friday there were about 10,000 births. We could have the absolute scale on the right.

Andrew: the data is count data, but with so high mean counts it can approximated very well with a Gaussian model. Log scale is not needed to ensure positivity and would transform the distribution away from Gaussian.

Aki:

But wouldn’t the log scale help when considering the long term trend (which moves about 20% from min to max)? Put it this way: suppose there is a fixed multiplicative effect of day of year or day of week or whatever. In the additive model, this will show up as a larger effect in 1976 (when the total #births is lowest). And, indeed, if you look at the day-of-week effects, the curve for 1976 is pretty high. It’s not the highest—1988 is the highest, presumably because there were real changes during this period with more scheduled births—but it’s up there, perhaps an artifact of the additive model for what fundamentally is a multiplicative process.

Regarding the Gaussian approximation, I wonder if there would be a way to do a multiplicative model by fitting an additive model on the log of the raw data and just adjusting the data variance accordingly. So the computation would be just as easy, it’s just that instead of approximating the binomial density with a Gaussian, we’d be applying the Gaussian approx to the density of the log of a binomially-distributed random variable.

[…] See more of his charts here. […]

[…] See more of his charts here. […]

I find it interesting that the actual number of births is highest in late September. Does that imply that while there may be fewer mothers giving birth on Christmas Day, prospective parents are busier conceiving on (or around) Christmas Day? :)

What, no error bars or bands?

.. and no line connecting Sunday with Monday — the biggest difference out of all adjacent days?

Are the numbers for leap year day births normalized for the infrequency of that day?

y

Can you see any blips for 9/11 or that big North East Blackout of 2003 etc.? I wonder.

[…] “Happy birthday” (via http://statmodeling.stat.columbia.edu/2013/12/1 …) […]

[…] more of his charts here. Happy Birthday Statistical Modeling, Causal Inference, and Social Science (19 December […]

[…] Valentine’s day and fewer on Halloween, I think the right way to go is to do an analysis of all the days of the year rather than picking just one or two […]

[…] in the observed time series. For example, in the birthday example (BDA3 p. 505 and here), we can say that we have learned about the structure if we can predict any single date with […]

[…] decided to fit a Gaussian process model (following the lead of Aki in the birthday problem) with separate time series for each state and each region, with the country partitioned […]