“The Long-Run Effects of America’s First Paid Maternity Leave Policy”: I need that trail of breadcrumbs.

Tyler Cowen links to a research article by Brenden Timpe, “The Long-Run Effects of America’s First Paid Maternity Leave Policy,” that begins as follows:

This paper provides the first evidence of the effect of a U.S. paid maternity leave policy on the long-run outcomes of children. I exploit variation in access to paid leave that was created by long-standing state differences in short-term disability insurance coverage and the state-level roll-out of laws banning discrimination against pregnant workers in the 1960s and 1970s. While the availability of these benefits sparked a substantial expansion of leave-taking by new mothers, it also came with a cost. The enactment of paid leave led to shifts in labor supply and demand that decreased wages and family income among women of child-bearing age. In addition, the first generation of children born to mothers with access to maternity leave benefits were 1.9 percent less likely to attend college and 3.1 percent less likely to earn a four-year college degree.

I was curious so I clicked through and took a look. It seems that the key comparisons are at the state-year level, with some policy changes happening in different states at different years. So what I’d like to see are some time series for individual states and some scatterplots of state-years. Some other graphs, too, although I’m not quite sure what. The basic idea is that this is an observational study in which the treatment is some policy change, so we’re comparing state-years with and without this treatment; I’d like to see a scatterplot of the outcome vs. some pre-treatment measure, with different symbols for treatment and control cases. As it is, I don’t really know what to make of the results, what with all the processing that has gone on between the data and the estimate.

In general I am skeptical about results such as given in the above abstract because there are so many things that can affect college attendance. Trends can vary by state, and this sort of analysis will simply pick up whatever correlation there might be, between state-level trends and the implementation of policies. There are lots of reasons to think that the states where a given policy would be more or less likely to be implemented, happen to be states where trends in college attendance are higher or lower. This is all kind of vague because I’m not quite sure what is going on in the data—I didn’t notice a list of which states were doing what. My general point is that to understand and trust such an analysis I need a “trail of bread crumbs” connecting data, theory, and conclusions. The theory in the paper, having to do with economic incentives and indirect effects, seemed a bit farfetched to me but not impossible—but it’s not enough for me to just have the theory and the regression table; I really need to understand where in the data the result is coming from. As it is, this just seems like two state-level variables that happen to be correlated. There might be something here; I just can’t say.

P.S. Cowen’s commenters express lots of skepticism about this claim. I see this skepticism as a good sign, a positive aspect of the recent statistical crisis in science that people do not automatically accept this sort of quantitative claim, even when it is endorsed by a trusted intermediary. I suspect that Cowen too is happy that his readers read him critically and don’t believe everything he posts!

17 thoughts on ““The Long-Run Effects of America’s First Paid Maternity Leave Policy”: I need that trail of breadcrumbs.

  1. Yes, the skepticism in the comments is a healthy sign (although I’ll point out that some are just based on beliefs stemming from the commenter’s personal experience, just a different source of noise). But this study raises a far more serious issue in my mind. At 52 pages, it is a typical economic study – done with some care, quite complex, and totally inadequate in its documentation and description. If the author had done what Andrew is suggesting (which I would also like to see), perhaps this is a 500 page study. It is not at all clear to me how we move forward from a point where the data and methodology have become this complex, while at the same time being totally unsatisfactory.

    I don’t propose any answers. I keep calling for people to release their data and for editors and referees to require better documentation and description. But at the same time, the reality is that a lot of work does go into these studies – they are not careless for the most part – yet they fail miserably at providing the kind of information that would cause me to have much faith in their results.

    • I have that same thought every time Andrew (or others) describe any sort of fairly complex Bayesian hierarchical model building and checking procedure. Even if an analyst/author wanted to provide complete documentation starting from the raw data and concluding in the published results, the time and effort requirements would be staggering.

      From my training in a previous lifetime (and previous century) as a software engineer I am quite familiar with the underlying principle. In any moderate to highly complex engineering project, the ratio of design time to time required for documentation is always far more than 1:1. Often more than 4:1 or 5:1, even. I suspect a fully reproducible statistical analysis will exhibit the same pattern.

      Let’s say it took Andrew and his collaborators 50-100 hours or more to develop, check and refine a model. And then lets say that a thorough writeup of the results (not methods, just results) requires several pages of text, half a dozen full-page tables and a dozen or more figures. To do that in a fully reproducible manner might be a full time effort for months.

      Then as you point out, a massive effort would be required for anyone wanting to follow all the breadcrumbs and reproduce every step of the process. That’s before they go off on their own initiative and explore alternative forking paths or what have you. The entire undertaking would founder under its own top-heaviness, surely.

      • Brent:

        This is why I think one of the most important frontiers of research in statistics is to develop general tools for understanding multiple models fit to the same data. In the above example, we can interpret plots of data summaries as some simple model, which then gets extended into the full regression presented in the published paper. The “trail of breadcrumbs” can, I think, be formalized as some series of comparisons.

        In the meantime, I don’t think we can do much with these elaborate estimates that are presented alone with no direct comparison to data.

        • Oh I totally agree that a highly processed estimate produced with zero transparency is next to useless. Just suggesting that there will always be modeling strategies that are simply impractical to do in a fully reproducible manner.

          Maybe a really clever utility function could be constructed to capture the tradeoff between the obvious values of reproducibility and future data pooling versus bringing more arcane techniques to bear on especially intractable questions.

      • I don’t really agree. With modern tools, especially Rmarkdown/Jupyter notebooks, I find it quite convenient to write my analyses as reproducible and documented from the start. My experience is that it actually lets me work faster in the longer run, as I discover bugs sooner and when inevitably someone finds an error in the original data or a reviewer asks for modifications, it is no big deal. Yes, polishing the dev version to something publishable as appendix still takes effort (rewriting stubs and keywords into full paragraphs, nice labels for plots, …) but it is nowhere near crushing.

        I understand this approach requires reasonable coding skills and you need to invest in learning some tricks of the trade, which might not be accessible to everybody, but I believe it is totally a reasonable goal.

    • I don’t know. I got my PhD in economics a few years ago, and after sifting through several of these kinds of working papers, I think just getting authors to do a few scatter plots or other visualizations of raw data would be a huge improvement and only add a few pages to the paper. When I see descriptions of the data (either table or graph), I am almost always left feeling that the information could have been presented more effectively or that the most pertinent raw data comparisons were not presented. As for documentation, I think just posting code would be a huge step forward. It can be a bit of a hassle to get it into a form suitable for other people to read, but if economists want to think of their work as “scientific” (as many think it should be), maybe they should start doing their work as if someone were actually going to try to replicate it rather than treating it as an afterthought.

      • Jfa:

        Along with this, I think it would help if journalists would stop playing the game, stop promoting big claims presented without clear documentation. I’d say that scholarly journals should stop accepting such papers for publication, but that’s gotta be impossible, given that the reviewers and editors are pretty much the same people who write these papers.

  2. Some things to consider:

    Unpublished working paper.

    Job market paper.

    Key p-values are .003 and .002, which correspond to z-statistics of 2.8 and 2.9. (Table 4 of paper.)

    Observations: 500,000 to 1,000,000 (Table 2).

    Educational achievement is related to approximately 46,802 factors.

    I doubt scatter plots would show much given the weak z-stats, especially if the scatter plots show a million observations.

  3. I don’t know how to get all the way to the Promised Land.

    But I know the first step: Establish a norm by which articles (whether working papers or published) which do not publish their code are assumed to be nonsense. (They might not be nonsense! But the norm should be to assume that they are.) There is never a good excuse not to share your code.

    The second step (somewhat harder) is to establish a norm by which articles which do not share their data are assumed to be nonsense. This is tougher since there are situations in which the raw data can not be shared.

    • I think code sharing is not so easy as it looks, as modern programming relies a lot on external libraries and configurations that may change, and running some one else’s code is not so trivial…

      But it is still not an excuse not to publish code.

  4. “So what I’d like to see are some time series for individual states and some scatterplots of state-years.”

    Does that mean that the data used for analysis are cross-sectional, and not longitudinal?

    If so, wow.

Leave a Reply to Andrew Cancel reply

Your email address will not be published. Required fields are marked *