“Much of the recent reported drop in interstate migration is a statistical artifact”

Greg Kaplan writes:

I noticed that you have blogged a little about interstate migration trends in the US, and thought that you might be interested in a new working paper of mine (joint with Sam Schulhofer-Wohl from the Minneapolis Fed) which I have attached.

Briefly, we show that much of the recent reported drop in interstate migration is a statistical artifact: The Census Bureau made an undocumented change in its imputation procedures for missing data in 2006, and this change significantly reduced the number of imputed interstate moves. The change in imputation procedures — not any actual change in migration behavior — explains 90 percent of the reported decrease in interstate migration between the 2005 and 2006 Current Population Surveys, and 42 percent of the decrease between 2000 and 2010.

I haven’t had a chance to give a serious look so could only make the quick suggestion to make the graphs smaller and put multiple graphs on a page, This would allow the reader to better follow the logic in your reasoning.

But some of you might be interested in the substance of the paper. In any case, it’s pretty scary how a statistical adjustment can have such a large effect. (Not that, in general, there’s any way to use “unadjusted” data. As Little and Rubin have pointed out, lack of any apparent adjustment itself corresponds to some strong and probably horrible assumptions.)

P.S. See here for another recently-discovered problem with Census data.


  1. Mike Stucka says:

    IRS migration data typically lags a couple of years, but you can at least run some double-checks with that data. You also get some great county-level looks at the data, unless there's fewer than 10 people making such a move.

  2. k says:

    That's close to something I "discovered" last year regarding the GARCH model…actually anyone could do this:

    I attempted a replication of Bollerslev's article that introduced the GARCH model; the relevant data is easy to get online.

    The results were strikingly different in that the GARCH model tells us very little. It turns out that the Census or the Bureau of Labour Statistics or some other such authortiy redefined the variable used to estimate the GARCH model Bollerslev proposed which is the implicit GNP price deflator in the time between Bollerslev showed his example (1987 I believe) and when I downloaded the data (which was last year). This was that they now use a Fisher price index to compute the deflator.