A political scientist writes:
You might have already seen this, but in case not: PNAS published a paper [Officer characteristics and racial disparities in fatal officer-involved shootings, by David Johnson, Trevor Tress, Nicole Burkel, Carley Taylor, and Joseph Cesario] recently finding no evidence of racial bias in police shootings:
Jonathan Mummolo and Dean Knox noted that the data cannot actually lead to any substantive conclusions one way or another, because the authors invert the conditional probability of interest (actually, the problem is a little more complicated, involving assumptions about base rates). They wrote a letter to PNAS pointing this out, but unfortunately PNAS decided not to publish it.
Maybe blogworthy? (If so, maybe immediately rather than on normal lag given prominence of study?)
OK, here it is.
Conditional probability means that it’s actually possible for a particular policing decision rule to be simultaneously efficient in the sense of “maximizes criminals caught per officer/resource” and racially biased in the sense of “given that you are x race, you are more or less likely than someone totally equivalent in behavior of another race to be stopped by police.” It’s a gnarly problem that’s subtle, but not hard to understand when pointed out.
I think Rudin gave the example of hunters choosing between two fields, one which yielded meat 60% of the time and one which yielded meat 40% of the time. The smart thing to do is to choose the former 100% of the time, but that’s not “fair” (if you cared about it) to the various buffalo.
While you are right, in the real world, we have no clue what the actual proportion of criminals in each groups are. Eg. we could be “maximally efficient” and target the group with more criminals, but then the probabilities change over time. With these sorts of dynamics, not necessarily smarter to choose the one group. Better to split resources (and more ethical too).
Indeed, even in a realistic foraging/hunting example, the events of interest are not independent nor is the underlying system even stationary. This is not to say that we cannot use formal models to understand complex and important social phenomena, just that these models need to be complex too. And, of course, that the researchers using them need to be honest about their limitations.
It reminds me of a Q&A exchange I once heard at a set of talks regarding statistics education in psychology. The speaker emphasized the need to build data analysis models to purpose, with the aim of accounting for (and estimating) as many plausible sources of variability as possible. Obviously such a practice, which Andrew here has also advocated, would entail drastic changes in how we teach stats. Someone in the audience asked something to the effect of, “but surely we can’t expect every social psych student to learn R programming?” The speaker’s response was, “why not?”
After all, we expect physics students to learn lots of ancillary technical skills (e.g., various mathematical and computational tools) that aren’t really “physics”. And social science is, in many ways, much harder than physics—less precise experimental control, more obstacles to data collection, etc.—but we are not equipping our students with the tools to build models that are sufficiently complex for the phenomena of interest.
“And social science is, in many ways, much harder than physics—less precise experimental control, more obstacles to data collection, etc.—but we are not equipping our students with the tools to build models that are sufficiently complex for the phenomena of interest.”
This seems like a different discussion, not one for this post. You seem to argue that it is reasonable policing policy to be racially biased, even conditioning on other relevant information, i.e. that it is reasonable that P(shot|black,X)>P(shot|white,X). Maybe so. The question here is whether, as a matter of fact, P(shot|black,X)>P(shot|white,X). This question contains no value judgment.
Clearly, both the factual and the prescriptive question are important. It is therefore also very important to keep the discussions separate. Otherwise there is the risk of cycles like these:
A:”Police policy is not racially biased.”
B:”Yes it is.”
A:”Well, it should be, anyway.”
B:”No it shouldn’t.”
A:”Well, it isn’t, anyway.”
This may seem contrived, but looking at the comments, we are now at step four of that cycle. :) (To be fair, Mummolo and Knox did not claim police policy is racially biased. I am just making a point.)
My point is not to argue that it’s reasonable for policing policy to be racially biased, but rather that the question requires a clear definition from the outset. Or in other words, whether or not policing policy is racially biased depends on which of two seemingly reasonable definitions you use.
Personally, I think that if you are more or less likely than an otherwise-equivalent from another race to be stopped / searched / arrested, then that is a violation of justice and something needs to be change, whether or not that policing behavior is narrowly “rational” given their incentive set.
leaving a comment just so I can track other comments, because I want to hear what people have to say on this issue.
Embarrassing to see people learn complex statistical tools and then fail to use a conditional probability correctly.
The problematic statement, “Racial disparities are a necessary but not sufﬁcient, requirement for the existence of racial biases”, was under “What These Findings Do Not Show,” where the authors argued that existence of disparity does not necessarily imply existence of bias.
You don’t need to know statistics to know that statement is logically problematic. A simple counter example is a racial quota system. Practically speaking in the specific application of police shooting, racial biases almost certainly would result in racial disparity. What’s puzzling for this particular article, since they claimed no overall evidence of racial disparity, was why the authors even made this kind of argument re “what these findings do not show.” If they believed in this statement and its logical conclusion that no disparity -> no bias, shouldn’t it go under conclusion? There seemed to be a mixture of sloppiness and overreach.
On second thought, I take back “racial biases almost certainly would result in racial disparity” in police shooting. It’s too broad and imprecise a statement.
“Racial disparities are a necessary but not sufficient requirement…”
What this is saying is that if there’s a bias it will result in a disparity, but there are other things that could result in disparity too so disparity by itself doesn’t automatically mean bias…
It’s trivially true as long as you’re measuring disparity correctly (should be in terms of rates relative to an appropriate estimate of the un-biased rate) but the problem is complex enough that even defining “an appropriate estimate of the un-biased rate” is hard.
Using the conventions from the rebuttal letter, where “X” is all the other variables (including racial demographics of the country, of police encounters, of crimes, etc.), isn’t it unreasonable for the rebuttal to propose counterexamples where the base rates (of e.g. encounters, see the 90/10 example in the rebuttal) are inconsistent with demographics in X that have already been conditioned upon?
In other words, in Bayes rule:
p(A | B, X) = p(B | A, X)*p(A | X)/p*(B | X)
the stuff that one puts into X not only determines the likelihood function p(B | A, X) but also determines what base rates p(B | X) you get to match it with. I think the original authors would claim that the stuff they’ve put into X renders p(B | X) roughly constant across all races B, and the rebuttal states (but does not show) that they have not done so. Now, I don’t know if that is true (i.e. is p(white | X) = p(black | X) for the X they included), but it seems it would be straightforward to at least bracket those numbers rather than just make up a demographically implausible p(white | X)/p(black | X)=90/10 as in the rebuttal example. The rebuttal acknowledges that it is important to consider these base rates, but the example seems, demographically, to have brushed under the rug that the original authors have definitely put things in X that make such a strong counterexample implausible.
The counterexample is an illustration of the logical fallacy, not of the magnitude of the problem in the actual analysis. I did not read the original article, but it seems from the discussion that the authors were not aware of the logical issue. Otherwise they presumably have argued explicitly that p(white | X) = p(black | X) for the X they included. Clearly, showing that is incumbent upon them, not their critics.
There is one issue that I struggle with here on a conceptual level (and as a complete outsider to this debate). All the probabilities are conditioned on X, some vector of variables relevant to the question. But, since X takes many values, an object like p(white | X) is a function (of X) and equality like p(white | X) = p(black | X) is not one claim, but many claims, one for each value X can take. Similarly for the actual claims about racial bias, i.e. P(shot|black,X) versus P(shot|white,X). How does one evaluate such claims? What if the direction goes one way for some X, but another for other X, or there is equality for some X?
I was wondering about this.
It seems highly likely that black police officers encounter black civilians more often than white police officers because black police officers more often police black communities. So if the analysis does nothing to correct for this, the results will be astoundingly wrong.
But Johnson et al. have some variables that appear to reflect county demographics, most notably “County % Black homicide”. Are these variables meant to address this issue? If so, why do Mummolo and Knox ignore these variables? They make it sound like Johnson et al. missed a ridiculously obvious confounding factor (they call the alleged error a “fallacy”). But, did Johnson et al. actually commit a fallacy my ignoring this issue? If they didn’t miss it, and if, in fact they attempted to address the issue, isn’t it incorrect to use the word “fallacy”?
Is this relevant? Johnson et al. write that:
This sounds a lot like Mummolo and Knox’s 90/10 hypothetical. So, is it really accurate to say that these “biases are entirely concealed” by the Johnson et al. paper?
Perhaps the original authors would like to respond. Have they responded in any way yet?
Is this what Johnson et al. actually do? If so, it sounds astoundingly wrong and (I would guess) would produce facially absurd numerical results.
Or, does the Johnson et al. analysis compare .64 to the .90 base rate for whites and the .36 to the .10 base rate for blacks? If so, the Johnson et al. analysis would correctly conclude in this hypothetical that there is racial bias?
Police shootings happen rarely (yeah yeah should be zero, back to the real world world in which every American has virtually unrestricted access to fire-power the Founding Fathers had no way to imagine).
Any mention of what the victims were doing when they were shot? Just wondering, seems like it would matter. Like, were they pointing a gun at the officer, or just peacefully, law-abidingly minding their own business, never did anything wrong, like most victims (according to the victims, although they might be biased).
In this question I think some ethnography could be informative, if it turns out that black and Hispanic, and Asian officers(etc etc, don’t forget the Eskimo officers) blow away fewer people (in relation the number of contacts, adjusted for type and so on). It would be interesting to get the officers insights. Like, why did the yellow officer not blow away the green 7-11 robber when the white officer did, and so on. Something like that. What were these guys thinking when they pulled the trigger, in other words.
Perhaps disarming the police would be a good idea. Perhaps allowing people to obey only the laws they personally like would be another. Better, more sensitive, acceptable, and appropriate training for officers would be another. For example, just because a robber is pointing a gun at you doesn’t mean you have to kill him. You could “de-escalate” verbally. Or perhaps, admninister a non-lethal “flesh wound”. Or use bean bags. Or this: if an unarmed child is minding his business, never did anything wrong in his life, don’t kill him just because you are a racist and he is a child “of color”, even if the unarmed child weighs 290 lb, is 6’4″, is 18, and has just robbed a convenience story, and is reaching for your weapon. I don’t know if that would work, but it should be put to a vote. The community should decide.
On the other hand, America was founded on the idea of taking what you want by force (or any other way), and keeping what you have by force (or any other way), so nothing is really that new.
I realize this is a stats blog. Just a little perspective.
I am very confused by this. “Encounters” are surely the result of earlier choices made by officers (where to patrol, whom to stop, etc.). So it seems that Pr(Shot|Race) is easily calculated by tabulating the number of people shot and then just taking Census numbers on racial group size.
That is what puzzles me. How could you NOT do a correction for this? Indeed, the Johnson et al. paper explicitly talks about this issue and adds some variable to the analysis, which produces the analysis shown in Table 2. Yet Hummolo and Knox make it sound like Johnson et al. completely ignored the issue and made a super fundamental error, such as confusing Celsius and Fahrenheit.
Johnson et al. write that:
So rather than being a question of fundamental Bayesian logic, this sounds more like a question of whether the adjustment for the problem is adequate, which is an empirical question, not a logical question.
I’m not following his argument. I took his example and ran a regression of shot~black. It shows that black is positively related to being shot even if the base rates of race p(race) are very different.
I’m probably missing something. Any ideas?
temp_df = rbind(
data.frame(black = 1, shot = c(rep(0,5),rep(1,5))),
data.frame(black = 0, shot = c(rep(0,81),rep(1,9) ))
reg = lm(shot ~ black, temp_df)
ah nvm. i think he’s saying this would be ok showing p(shot | x,black)>p(shot | x, white). but thats not what the authors of paper did.
Yes, the criticism is correct and good researchers shouldn’t incautiously reach conclusion in this way.
Unfortunately similarly egregious reasoning errors aren’t that uncommon in studies of bias and I have serious worries about corrections being made only in one direction (I did find the evidence of less respectful treatment for black motorists by police pretty persuasive but shootings is a tougher issue). The net result is to make even someone who thinks the claims of bias are a priori plausible (like myself) feel uncertain about trusting the scientific claims.
Personally, I think that for socially contentious issues we need to appoint designated naysayers (people who would be tasked with attacking the published study so we don’t have the problem of people avoiding raising issues for fear that they people will draw conclusions about their politics). However, until such a solution is found it’s an interesting question as to how journals, fields etc.. should respond when valid criticisms are raised but only against results in one direction.