Skip to content

Does traffic congestion make men beat up their wives?

Max Burton-Chellew writes:

I thought this paper and news story (links fixed) might be worthy of your blog? I’m no stats expert, far from it, but this paper raised some alarms for me. If the paper is fine then sorry for wasting your time, if it’s terrible then sorry for ruining your day!

Why alarms – well for the usual 1-2-3 of: p-hacking a ‘rich’ data set > inferring an unsubstantiated causal process for a complex human behaviour > giving big policy advice. Of course the authors tend to write this process in reverse.

I think the real richness is the multitude of psychological processes that are inferred for their full interpretation (traffic delays cause domestic violence but not other violence, and more for short commutes than long ones because long ones are less ‘surprised’ by delays etc).

This paragraph is perhaps most illuminating of the post-hoc interpretative approach used:

Next, we examine heterogeneity in the effect of traffic on crime by dividing zip codes along three dimensions: crime, income and distance to downtown. In each specification we subset zip codes in the sample by being either above or below the sample median for each variable. Crime and income are correlated, but there are zip codes that are high crime and high income. Table 4 shows that traffic increases domestic violence in predominantly high-crime and low-income zip codes. We also find that most of the effect appears to come from zip codes that are closer to downtown, which may arise for two reasons. First, households living closer to downtown are more likely to work downtown, and therefore we are assigning them the appropriate traffic conditions. Secondly, a traffic shock for a household with a very long commute may be a smaller proportion of their total commute and a traffic shock might be more expected.

My reply: It’s always good to hear from a zoologist! I’m not so good with animals myself. Also I agree with on you this paper, at least from a quick look. It’s not hard for people to sift through data to find patterns consistent with their stories of the world. Or, to put it another way, maybe traffic congestion does make (some) men beat up their wives, and maybe it makes other men less likely to do this—but this sort of data analysis won’t tell us much about it. As usual, I recommend studying this multiverse of possible interactions using a multilevel model, in which case I’m guessing the results won’t look so clean any more.


  1. zbicyclist says:

    The two links (news story and paper) are run together and didn’t work for me. The paper link is
    The news story is here:

    This paper cries out for the use of a holdout sample.

    I am unconvinced by their robustness checks in section 5.3. The first check just shows that the effect can be removed by messing with the data (in this case, using the previous day to predict). That doesn’t show their effect exists, only that it can be removed.

    Their second robustness check is better, they look at zip codes that are closer to the on-ramps, and find the effect stays, and gets a bit stronger. That’s more convincing (table 9), but since the data are a subset of the overall data, not that convincing.

    They seem to have over 300,000 observations, so the basic hygiene of Training and Test (holdout) shouldn’t have been that hard to execute. It seems to me to be inevitable (even commendable) that researchers will go down a variety of forking paths in exploring a large, rich data set when they don’t know exactly the dimensions of the effect they are looking for. That’s realistic. The problem is not using some of the basic data mining hygiene.

  2. Ayse Tezcan says:

    how about we draw a causal diagram with all the covariates for possible confounders, intermediates, and selection biases, then, attempt the analysis? sometimes, we may end up not even proceeding with the study when we find no possible causal pathway between exposure/predictor and outcome based on literature and expert opinion. this seems like another example of asking questions because one has an access to some dataset, which can be valid as long as we remove possible biases and the causal question makes sense.

  3. Terry says:

    Some red flags in the paper:
    1. t-stats of main results are under 3.0 (Table 1), despite an enormous number of observations (300,000).
    2. Some robustness checks are not statistically significant (Table 4).
    3. It is odd that AM traffic is almost as strongly related to domestic violence as PM traffic (Table 3). Shouldn’t the psychological effects of AM traffic be greatly attenuated by the evening?

    And a strong point:
    1. Thoughtful robustness checks where the effect goes away, e.g., no relation of AM crime to PM congestion (Table 3).

    The results of the first robustness check, though, are curious (the “placebo test” where the lag of domestic violence is regressed on traffic (Table 6)). Results are reported for four lags, and none are significant, so the results are taken as supportive. But, all four coefficients are positive (t = 0.04, 1.60, 0.64, and 1.38), some are close to statistical significance at p = 0.10 (which the authors award an asterisk), and they seem to be jointly significant (a sign-test of the four is significant at p = 0.10).

    So there may be a positive pattern here, if only a weak one. Should the main results, therefore, be tested to see if they are significantly different from the placebo regressions? If so, the result is likely to be not significant or only marginally significant.

    • Martha (Smith) says:

      The “Median splits” (“In each specification we subset zip codes in the sample by being either above or below the sample median for each variable.”) are another red flag.

      If you’re not familiar with the problem, see

      (The basic problem is that you’re throwing away information — basing your analysis on a coarser — i.e., “noisier” — measure than you actually measured. Remember Andrew’s frequent ranting about noisy measures?)

      • Andrew says:


        Lots of solid research projects include unfortunate data processing decisions so I wouldn’t say that the use of median splits automatically implies that a project is seriously flawed. But I agree that if you’re gonna discretize a variable, it’s a bad idea to split at the median. Much better to split into three parts and discard the middle section (or just code as -1, 0, 1); see this paper which is one of my sentimental favorites because of its simplicity

        • Martha (Smith) says:

          “I wouldn’t say that the use of median splits automatically implies that a project is seriously flawed.”

          Nor did I say that — I just said that they are a red flag — i.e., cause for caution.

  4. Michael says:

    What would a multilevel model look like in this case?

Leave a Reply