Improvement of 5 MPG: how many more auto deaths?

This entry was posted by Phil Price.

A colleague is looking at data on car (and SUV and light truck) collisions and casualties. He’s interested in causal relationships. For instance, suppose car manufacturers try to improve gas mileage without decreasing acceleration. The most likely way they will do that is to make cars lighter. But perhaps lighter cars are more dangerous; how many more people will die for each mpg increase in gas mileage?

There are a few different data sources, all of them seriously deficient from the standpoint of answering this question. Deaths are very well reported, so if someone dies in an auto accident you can find out what kind of car they were in, what other kinds of cars (if any) were involved in the accident, whether the person was a driver or passenger, and so on. But it’s hard to normalize: OK, I know that N people who were passengers in a particular model of car died in car accidents last year, but I don’t know how many passenger-miles that kind of car was driven, so how do I convert this to a risk? I can find out how many cars of that type were sold, and maybe even (through registration records) how many are still on the road, but I don’t know the total number of miles. Some types of cars are driven much farther than others, on average.

Most states also have data on all accidents in which someone was injured badly enough to go to the hospital. This lets you look at things like: given that the car is in an accident, how likely is it that someone in the car will die? This sort of analyses makes heavy cars look good (for the passengers in those vehicles; not so good for passengers in other vehicles, which is also a phenomenon of interest!) but perhaps this is misleading: heavy cars are less maneuverable and have longer stopping distance, so perhaps they’re more likely to be in an accident in the first place. Conceivably, a heavy car might be a lot more likely to be in an accident, but less likely to kill the driver if it’s in one, compared to a lighter car that is better for avoiding accidents but more dangerous if it does get hit.

Confounding every question of interest is that different types of driver prefer different cars. Any car that is driven by a disproportionately large fraction of men in their late teens or early twenties is going to have horrible accident statistics, whereas any car that is selected largely by middle-aged women with young kids is going to look pretty good. If 20-year-old men drove Volvo station wagons, the Volvo station wagon would appear to be one of the most dangerous cars on the road, and if 40-year-old women with 5-year-old kids drove Ferraris, the Ferrari would seem to be one of the safest.

There are lots of other confounders, too. Big engines and heavy frames cost money to make, so inexpensive cars tend to be light and to have small engines, in addition to being physically small. They also tend to have less in the way of safety features (no side-curtain airbags, for example). If an inexpensive car has a poor safety record, is it because it’s light, because it’s small, or because it’s lacking safety features? And yes, size matters, not just weight: a bigger car can have a bigger “crumple zone” and thus lower average acceleration if it hits a solid object, for example. If large, heavy cars really are safer than small, light cars, how much of the difference is due to size and how much is due to weight? Perhaps a large, light car would be the best, but building a large, light car would require special materials, like titanium or aluminum or carbon fiber, which might make it a lot more expensive…what, if anything, do we want to hold constant if we increase the fleet gas mileage? Cost? Size?

And of course the parameters I’ve listed above — size, weight, safety features, and driver characteristics — don’t begin to cover all of the relevant factors.

So: is it possible to untangle the causal influence of various factors?

Most people who are involved in this research topic appear to rely on linear or logistic regression, controlling for various explanatory variables, and make various interpretations based on the regression coefficients, r-squared values, etc. Is this the best that can be done? And if so, how does one figure out the right set of explanatory variables?

This is a “causal inference” question, and according to the title of this blog, this blog should be just the place for this sort of thing. So, bring it on: where do I look to find the right way to answer this kind of question?

(And, by the way, what is the answer to the question I posed at the end of this causal inference discussion?)

13 thoughts on “Improvement of 5 MPG: how many more auto deaths?

  1. I haven't looked at this research in a number of years. But here's a few tips. I remember that it's a big mistake to assume that a larger car (these days typically SUV) is safer for the driver. Their probability of death in an accident doesn't really go down. What changes is that the probability that they kill goes up.

    There are several reasons for this. One is that many SUVs on the road still don't yet have to follow the much more stringent safety standards of cars as they are qualified as trucks. Another is that all that weight that you think is protecting you is primarily behind you in an accident. That's not good.

    Another thing you haven't considered yet is how safe a car makes you feel. The Honda Accord was carefully studied before and after the addition of airbags because the car didn't change otherwise across the two model years. The rate of accidents went up after the airbag was mandatory (which it tended to do across the board). It's well known that SUV drivers (especially female) buy them to a great extent because they make them feel safe.

    (I remember the head of the NHTSA being interviewed about the accident rate going up with the change to airbags and he said, "if you wanted the accident rate to go down then you should have put a sword in there".

  2. It seems to me that the real problem here is to define what is the potential outcome/counterfactual.

    Do you want to know how many people will die for each mpg increase in gas mileage, or the causal effect of lighter cars?

    These are different questions. In the later, regression analysis will only answer, all else equal (or conditional on the values of the 'controls'), what's the predictive effect of lighter cars.

    If, for instance, lighter cars are bought by mid 40's women , than the effect of lighters cars will be almost zero in practice, even if the average treatment effect is high! The reason is that effect varys by group of users, and the major impact will be among users where the effect is low. So, does it matter to know the causal effect? Or is it more important to know the predictive effect?

    Maybe there is some natural experiment out there, but I doubt it. If the weight of a car is correlated with speed and price, then I can't figure out how to disentangle the effect of lighter cars on accidents.

    It's not possible to find any instrumenal variable, since consumers (and the industry) will decide simultaneously about price and speed.

    Good luck,


  3. In economics, you would look for an instrumental variable. For example, one common technique is to use state-specific laws. If they were introduced at different times, and they affected car characteristics, and you believed that their introduction was exogenous, then you could estimate an effect that is, in theory, free of confounding effects (of course, if it's not really exogenous, then you have problems). Some people have used the weather, recessions, all sorts of "random" events as instruments. Not all stories are plausible, of course.

  4. A few different approaches could be considered.

    First, recognize the differences in the data. Deaths' data has different biases from accidents' data. So it might make sense to build different predictive models on these different data sources and try and understand what they are saying. Through the normal techniques like coefficient interpretation, but also using tree-techniques and examining partial dependence plots.

    Second, trying to avoid overfit. Build on deaths' data from one state, say, and validate on other states. Keep only those explanatory variables that work on multiple similar samples. In other words, go towards a parsimonious model structure.

    My 2 cents.

  5. I would also point out that there is a very important different in what is your causal contrast. If you want to know what the change in safety is for one driver who switches to a smaller car that is likely to be completely different than the estimate of what would happen if we lightened all cars and made them all more fuel efficient.

    The main concern with a light car is that it does badly when it hits something like an SUV (or a solid object but let's stick with car-car episodes for the moment). SUVs, themselves, are inherently dangerous as well due to rollovers.

    So you have a couple of interesting possibilities. One is that an equilibrium where everyone drove lighter vehicles would be much, much safer. But how to reach that equilibrium is unclear.

    The other is that shifting to more efficient vehicles may reduce fatalities that don't involve another car. Generally speaking, when you have an externality (like driving an SUV making other cars less safe) one would think that taxes would be the way to handle this.

  6. I have a vague memory of a study in the past that showed how fatalities were increasing steadily with the mass adoption of SUVs (both from rollovers and from the increased lethality of collisions from higher bumpers and larger vehicles), but that there was a "saturation" effect — eventually SUV-SUV collisions became sufficiently common that this second effect diminished in importance.

    From a game theory standpoint, I suppose we started to see signs of a new equilibrium over the horizon, where everyone drives racing tanks lined inside with pillows.

  7. Cash For Clunkers II: Trade in your existing car for a heavily discounted new car, but you don't get to pick which and your mileage is tracked. Now you can control for whatever population statistics you want (with a few exceptions around the sorts of people who'd participate in the program).

    It's always nice to just have the constraint of an open text field, instead of trying to actualize an idea :)

    More realistically, what about simulations/physical modeling? Set up cameras in representative locations. Write software to flag sudden lateral movement or stopping. Watch the flagged clips, and measure time from first-evasive-action to time-of-collision-if-evasion-fails. Then measure speed and distance from collision when evasion begins. That should be enough to estimate distribution of drivers reactions, and distribution of available time. Now you can plug in how far a given model can slow down/move out of the way at a given speed. Finally, you use crash tests to estimate probability of death given a crash. Messy, sure, but especially when validated on held out data and used in conjunction with statistics that are available, it seems useful.

  8. You need to build 4 models. [1] A model of the car manufacturers and car market. [2] A model of your negotiation for purchasing a car, and the resources you can bring to bear to the transaction, and the economic constraints you are under for money, time, etc. [3] A model of your hierarchy and compromises between your values and goals. And [4] a model of your rational decision making processes.

    Lets call these models [1] "Car-Make", [2] "Car-Buy", [3] "My-Values", [4] "My-Rational"

    Per Pearl, since the models "Car-Buy" "My-Values" "My-Rational" need to allow intervention experiments, they have to be expressible as directed acyclic graphs (DAGs). So "Car-Buy" "My-Values" "My-Rational" may be sub-optimal – they are best candidate (or candidates) out of all models that can be expressed as a DAG.

    Literature exists to help with the construction of "Car-Make" "Car-Buy". "Car-Make" will model different ways goals can be achieved: you can make a car safer with more weight, or you can make a car safer with superior engineering and judicious use of materials, etc. "Car-Buy" could be informed by a few issues of Consumer Reports, etc.

    "My-Values" and "My-Rational" will be constructed with a combination of introspection, objective evidence, tests, and quizzing others. If "My-Values" and "My-Rational" sooth your ego, they will probably perform badly for the task. You want to take every opportunity to have a model be truthful above being a mere advertisement of you being a nice and super nifty guy.

    It would seem that "My-Rational" might involve an infinite regress – you have a model, a model of the process that you judge models, a model of the process that you judge models of models, etc. But this flatters the human mind. Per Herbert Simon, Gerd Gigerenzer, Peter M. Todd, Bounded rationality, Ecological Rationality, your really existing system of "My-Values" "My-Rational" is bristling with irresistible fast and frugal heuristics. You can discover these from poor decisions you make time and time again – fast and frugal heuristics work very well in their preferred ecological setting, but are susceptible to failure modes in other settings because they are not ideally rational. (Sure sign of an irresistible fast and frugal heuristic: these are betrayed by poor decisions made time and time again, that are not followed by a relentless implementation of disciplines and restraints to prevent those poor decisions from being made in the future.)

    A fast and frugal heuristic expressed as a preference for blondes is not revealed by a string of successful relationships with blondes, because this might be due to good qualities innate in all blondes. A fast and frugal heuristic expressed as a preference for blondes would be revealed by a string of difficult relationships with blondes, and forever indulging a habit of buying drinks for blondes at bars.

    There is a tendency to be self-serving in introspection, so objective evidence and third-party observations and judgements are crucial.

    Now you can simulate the outcome of a particular choice of automobile, by combining all 4 models into 1 large model and using intervention experiments on the model, suggested by the particular choice of automobile.

    Not fully succumbing to the infinite regress, but incorporating a model of how you judge models, could be helpful (call it "Judge-Models"). That way, if multiple suitable models can be imagined and the top candidate does not immediately reveal itself, you have an analytic recourse to determine best fitness. As above, your really existing "Judge-Models" is also bristling with irresistible fast and frugal heuristics, which you can discover… etc…

    It would be silly to say a satisfactory decision cannot be made without this level of rigor. But the rigorous fully generalized system can suggest adequate "quick and dirty" substitutes, surely.

    I am a faithful reader of Gelman's blog, but I am constantly irritated by his willingness to fashion models of everything _except_ "My-Values" "My-Rational" "Judge-Models", which is the same as crossing the moat and killing the dragon and entering the castle, but refusing to climb up the stairs to the princess in the tower – just sitting there on the first step.

    Without discussion of "My-Values" "My-Rational" "Judge-Models", you have done so much preparation for a decision about an intervention (calling inaction its own kind of intervention)… but then dropped the bride at the threshold.

    Supplying "My-Values" "My-Rational" "Judge-Models" violates the stereotypical separation of concern between the academic and the decision maker and the action taker, so the reluctance to discuss them is completely understandable, and my irritation is unreasonable, I know.

  9. If considering policy, you could think about adding speed reductions to the mix. Assuming weight reductions reduce safety by providing less material to absorb kinetic energy, this scales as the square of speed, so a 10% reduction in weight could be offset by a 5% reduction in average speed. If fuel consumption scales with air resistance, this 5% reduction in speed would then give you an additional 10% reduction in fuel consumption, neglecting the fact that you can then reduce engine size and therefore weight…

    In the long term, I would expect to see an extremely efficient transport fleet come in part from lighter, smaller, slower vehicles. In the short term, I note that US official speed limits are lower than most of those in Europe, so you should already be getting some safety benefit compared to e.g the UK.

  10. It seems to me that the problem here is that the question is not well-formulated.

    I think that rather than trying to directly correlate vehicle weight with injuries, the place to start is to ask "how do injuries occur?" From a model of injury causal factors (impact with B-pillar, impact with steering column, acceleration against seat belt, etc.), you might be able work into causal ties between injury factors and various design features. I'm sure the injury model will be complicated.

    I think that you'll find that weight and safety are only poorly correlated through the design and construction of a vehicle. As an example, you might consider this 2003 report from the UCSUSA, where they "redesigned" the SUV to be both lighter and safer.

  11. To give a bit more background:

    There are a lot of people claiming that if cars get lighter, there will consequently be more auto fatalities. These people claim that this is "simple physics": the occupants of a lighter car will experience higher forces than the occupants of a heavier car.

    There are also people claiming that if cars get lighter, they could be made safer, less safe, or the same average safety, depending on how the manufacturers come up with the weight savings and on whether they improve other safety features.

    The key federal decision-makers seem to be siding with the first group. They are supported by some data analyses that show that, even controlling for things like driver age, crash speed, etc., fatality risk decreases with vehicle weight. (And, by the way, they look at all fatalities, e.g. pedestrians or victims in other cars or whatever, not just fatalities in the car whose weight they're looking at— i.e. they are looking at "societal risk" not just "driver risk"). "See," they say, "heavier cars are safer, just like physics dictates."

    But there are analyses on the other side, too. Even after you control for driver age, crash speed, etc., you find that some car models are much safer than others. The safest light cars are much safer than the least-safe heavy cars. "See," they say, "light cars can be as safe or safer than heavy cars."

    I am strongly in the second camp. I agree that "all else equal" heavy cars are safer, but all else is not equal, and lighter cars can be safer than most heavy cars that are currently on the road.

    So: thanks for the suggestions about modeling or observing crashes, but there's lots of that going on already and it doesn't really address the issue. The decision-makers think there is plenty of information in the past decade of real-world crash statistics for them to understand the relevant issues, and I think they're right.

Comments are closed.