Just to disillusion you about the reproducibility of textbook analyses

Posted on November 1, 2009 2:07 AM by Andrew

Guilherme Rocha writes:

I am using the 2nd. Edition of your “Bayesian Data Analysis” book in a Bayesian data analysis course at Indiana University.

I am preparing the “kidney cancer” data on section 2.8 for class.

I have a comment and a few questions.

1) First, the comment. I have noticed that, in the gd85to89.txt file, the state of IDAHO is spelled as IDADO. This may cause some difficulty if one is using the maps library in R (at first I thought Idaho was missing from this file).

2) After fixing that, I tried to reproduce figures 2.7 and 2.8 but couldn’t. I am wondering if I misunderstood what the data are in the raw data files.

I am guessing gd80to84.txt and gd85to89.txt are data regarding kidney cancer occurrence by county in 1980-1984 and 1985-1989 respectively. I tried to get the raw cancer rate as 10^5*(dc in gd80to84 + dc in gd85to89)/(pop in gd80to84 + pop in gd85to89). I get the same pattern but not the same counties.

Here are the questions

2a) How are the cancer rates leading to figures 2.7 and 2.8 computed?

2b) What is dcC? It seems to be around 10^5*(dc/pop) in each file. I am wondering if this is the age corrected rate… If so, how are they computed?

2c) What is aadc?

My reply:

This reminds me that I should document the data better. The quick story is that I was analyzing adjusted data as if they were raw counts. I made various reasonable adjustments which I now forget. When I have more time I will have to go back and clarify this. Somewhere I have computer files (probably S-plus code) so I should be able to do this!

3 thoughts on “Just to disillusion you about the reproducibility of textbook analyses”

David Kane on November 1, 2009 9:04 AM at 9:04 am said:

Best would be an R package with all the data and code . . .
Ken Kleinman on November 1, 2009 4:35 PM at 4:35 pm said:

Hi Andrew–

I recommend looking into Statweave or the older SASweave for your next book. These are literate programming (http://en.wikipedia.org/wiki/Literate_programming) tools which integrate text, code, and output. Using SASweave (which does this trick for R as well as SAS) Nick Horton and I were able to compose a book (http://www.math.smith.edu/sasr/) with extensive examples using both SAS and R with a bare minimum of cutting and pasting.

In effect, you write a LaTeX file with some special codes that encapsulate R code. You then pass the file through a special reader which extracts the code, runs it, captures the output, and reassembles the text, code, and output in the appropriate places. It's frankly amazing– if you want to change an example slightly, you just change the code and re-run: no cut-and-paste mishaps are possible.

Statweave and SASweave are free and were created mostly by Russ Lenth, at UIowa. Statweave works with SAS, R, and Stata, and will create OpenOffice.org documents as well as LaTeX and possibly more: http://www.cs.uiowa.edu/~rlenth/StatWeave/. SASweave does SAS and R through LaTeX: http://www.cs.uiowa.edu/~rlenth/SASweave/. Neither installs particularly smoothly, in my experience, but either saves hundreds of hours.
Bill Mill on November 2, 2009 8:33 AM at 8:33 am said:

Source control for data and all source is a must! It's worth it for just this sort of situation, so you can go back and see what happened and how it developed.

Comments are closed.