Chris Winship and Ethan Fosse write with a challenge:

Since its beginnings nearly a century ago, Age-Period-Cohort analysis has been stymied by the lack of identification of parameter estimates resulting from the linear dependence between age, period, and cohort (age= period – cohort). In a series of articles, we [Winship and Fosse] have developed a set of methods that allow APC analysis to move forward despite the identification problem. We believe that our work provides a solid methodological foundation for APC analysis, one that has not existed previously. By a solid methodological foundation, we mean a set of methods that can produce substantively important results where the assumptions involved are both explicit and likely to be plausible.

After nearly a century of effort this is a big claim. How might we test it? In mathematics, if someone claims to have proved a theorem, the proof is not considered valid until others have rigorously analyzed it. Our request and hope that researchers will interrogate our claim with similar rigor. Have we in fact succeed after so many years of efforts by others?

My own articles on age-period-cohort analysis are here, here, and here. The first of these was an invited discussion for the American Journal of Sociology that they decided not to publish; the second (with Jonathan Auerbach) is our summary of what went wrong with that notorious claim a few years ago about the increasing death rate of middle-aged white Americans, and the third (with Yair Ghitza and Jonathan Auerbach) is our very own age-period-cohort analysis of presidential voting.

I have not looked at Winship and Fosse’s work in detail, but I agree with their general point that the the right way forward with this problem is to think about nonlinear models.

It’s friggin’ depressing. Despite being simply wrong, the Case-Deaton thing has become common knowledge that no one doubts.

Sigh.

what is the Case-Deaton thing?

https://statmodeling.stat.columbia.edu/2017/07/11/criticism-economists-journalists-jumping-conclusions-based-mortality-trends/

It’s the second of Andrew’s papers, the one he describes as “the second (with Jonathan Auerbach) is our summary of what went wrong with that notorious claim a few years ago about the increasing death rate of middle-aged white Americans”.

This is the first I’ve seen if the Ghitza/Gelman/Auerbach paper. Very interesting.

One question – is the Gallup data used aggregated accross all demographics, or is it split by demographic? Presumably, the different groups will have different impressions of presidential performance.

If only aggregated approval data was used, that could perhaps explain the model’s lesser explanatory power for minorities.

Joseph:

Unfortunately we are not analyzing raw approval poll data, so it’s just national approval, not approval within subgroups.

I like the simplicity of using what Gallup publishes (rather than attempting to work from raw data). Was it a choice to work from national approval, or is it that Gallup doesn’t publish results by subgroup (or if they do now, they haven’t always done so)?

I would think, for example, that the approval for Obama moved pretty differently among the different groups. I’d think that the strong D approval at the beginning of his term persisted among minorities, whereas it rapidly dropped and ended up pretty negative among southern whites. But that’s just a guess.

Obama’s probably not the most consequential for the analysis, being so recent, but I figure there are other points in history where approval moved significantly differently by subgroup.

I’m amused by the comparison to checking mathematical theorems, as though this is something alien to statistics. Mathematicians don’t (generally) announce a challenge to other mathematicians to check their proofs–the proofs just get checked as part of peer review and by their colleagues post-publication. So this enterprise, which they are framing as a grand challenge, is really just open-source peer review. In which case, what we need is a single document with all proofs and/or all simulation code and results. Instead, they seem to have provided a list of articles and an R package.

Maybe by framing the review process as a grand challenge, they hope to increase attention, interest and participation. That’s pretty clever, actually. But evaluating an R package using unspecified criteria isn’t proof-checking, it’s debugging.

Michael:

I haven’t looked into these particular papers, and I agree that part of this is a publicity stunt—which is fine! I do publicity stunts too! I like publicity stunts! Also, though, checking a statistical method is slightly different from checking a mathematical proof, as there are also issues of how the problem is set up, how reasonable is the model, etc. Rather than calling this proof-checking or debugging, I think the best term is open peer review, as you say.

We have of course received reviews from journals and colleagues. Our hope, however, is to have some people who are really smart with strong technical backgrounds assess our techniques to ensure they are sensible. Our thinking is that the readers of Andy’s blog fit this profile exactly.

We’ve provided an R package not so people will check our code — that is our responsibility. The R package is provided so as to encourage people to try out our methods on their own data and assess their usefulness. If the math is correct but the methods are of little use, then there is considerable more work to be done.

Michael said,

“evaluating an R package using unspecified criteria isn’t proof-checking, it’s debugging”

Speaking as a mathematician: Proof-checking is a form of debugging — not debugging a program, but looking for “bugs” in a proof and correcting them if possible.

The question with the R-program is not whether the code is correct. The question is whether the type of analyses that program can produce results in substantially interesting social science results. Lots of statistical methods out there, all mathematical correct with bugless code. Question is whether they are good for doing social science.

I’ve followed Winship and Fosse’s work over the years, and have learned a great deal from their approach. These papers really drive home the point that there is no statistical solution to the APC problem, but that there needs to be external information to identify all effects.

I think there is a small technical issue with how the linear and non-linear effects are separated; Fosse and Winship used unweighted orthogonal polynomials where weighted orthogonal polynomials should be used. I’ve written this up in short paper, along with some other suggestions: https://osf.io/preprints/socarxiv/xrbgv/

R doesn’t have a built-in function to compute weighted orthogonal polynomials so I wrote one up in this R package:

https://github.com/elbersb/weightedcontrasts#readme

The package also contains two vignettes that reanalyze two examples from the APC literature.