They want “statistical proof”—whatever that is!

Bert Gunter writes:

I leave it to you to decide whether this is fodder for your blog:

So when a plaintiff using a hiring platform encounters a problematic design feature — like platforms that check for gaps in employment — she should be able to bring a lawsuit on the basis of discrimination per se, and the employer would then be required to provide statistical proof from internal and external audits to show that its hiring platform is not unlawfully discriminating against certain groups.

It’s from an opinion column about the problems of automated hiring algorithms/systems.

I would not begin to know how to respond.

I don’t know how to respond either! Looking for statistical “proof” seems like asking for trouble. The larger problem, it seems, is an inappropriate desire for certainty.

I’m reminded of the time, several years ago, when I was doing legal consulting, and the lawyers I was working for asked me what I would say on the stand if the opposing lawyer asked me if I was certain my analysis was correct, if it was possible I had made a mistake. I responded that, if asked, I’d say, Sure, it’s possible I made a mistake. The lawyer told me I shouldn’t say that. I can’t remember how this particular conversation got resolved, but as it happened the case was settled and I never had to testify.

I don’t know what “statistical proof” is, but I hope it’s not the thing that the ESP guy, the beauty-and-sex-ratio guy, the Bible Code dudes, etc etc etc, had that got all those papers published in peer-reviewed journals.

In response to the above-linked op-ed, I’d prefer to replace the phrase “statistical proof” by the phrase “strong statistical evidence.” Yes, it’s just a change of words, but I think it’s an improvement in that it’s now demanding something that could really be delivered. If you’re asking for “proof,” then . . . remember the Lance Armstrong principle.

101 thoughts on “They want “statistical proof”—whatever that is!

  1. It obviously means someone checked whether < 0.05. In this case it would be proof if you saw p > 0.05 when the null hypothesus is no correlation between race, gender, etc and employment gaps.

    This is the standard way statistics are used so I dont see why we should feign ignorance about it. The neyman-pearson part of the reasoning behind the NHST hybrid tells you to act as if the result is proof, ie inductive behaviour.

    • “It obviously means someone checked whether < 0.05."

      That's a pretty sophisticated procedure to ask scientists to follow, much less journalists and lawyers.

      Yesterday The Director of the Value and Systems Science Lab at the University of Washington stated that "the evidence is clear: race, ethnicity, poverty…can dictate" COVID19 health outcomes because different races are impacted at different rates.

      Done. We don't need no stinkin' NHST!!

      • I am certain that statement is based on NHST. I am also certain it is true, the null hypothesis of no correlation/difference is always false. So the NHST added no information.

        • “I am certain that statement is based on NHST”

          Don’t think so.

          Article title
          (paywalled unfortunately):

          “Revise covid-19 relief-fund decisions to target disparities”

          We’re not targeting causes, just disparities.

          Article text:
          “[X] and [Y] individuals are infected at higher rates than [Z] individuals…Discrimination, racism and distrust of the medical system lead to worse outcomes for racial minorities”

          Note the transition from groups X and Y, for which the rates really are different, even if race isn’t the causal factor, to all “racial minorities”, for which the rates aren’t different for many races.

          I’m not harping on the racial issue in particular. But this is a great example of the standard of “evidence” that discrimination hawks are pushing for in all forums regarding discrimination. It should concern everyone, because as a society, we’re all better off if we find the actual causes for problems and address them. But this kind of argument is mostly used simply to confiscate more pie for certain groups of people.

        • I cant see the nyt article but the links will eventually lead back to some paper where they did nhst to make that claim.

          Anyway, the end result of all this is that it becomes more expensive to hire people so politically connected corporations become more powerful and the poor get poorer.

        • I feel like the article is pretty clear that it’s not saying what you’re claiming it’s saying. It is explicitly not saying the race causally determines health outcomes from covid-19 infection.

          > The reasons go beyond biology. Discrimination, structural racism and distrust of the medical system lead to worse outcomes for racial minorities. Poor individuals are more likely to live in overcrowded accommodations, reducing their ability to socially distance. The nature of their work in areas such as public transportation may increase COVID-19 exposure and worsen health outcomes. And if they contract COVID-19, Americans in poor communities have less access to intensive care services than individuals in wealthier communities.

        • “Discrimination, structural racism and distrust of the medical system lead to worse outcomes for racial minorities”

          I think that’s pretty clear? No? I’m not sure where you think I’m going wrong.

          It continues with a description – hardly exhaustive – of how *poor* people are impacted, but it’s not clear how “Poor individuals” and “racial minorities” are related.

  2. The “unlawful descriminating” clause in that sentence could do the bulk of the work here. If the law says your algorithm won’t be more than N entities or p % different from official quotas, then it’s easy to prove one way or the other. (Just an example, in reality it may take hundreds of pages to describe the requirements.)

    But you may say that isn’t really statistics, it’s just accounting. I agree, but does our op-ed author make the distinction?

    Or maybe the law can be phrased in terms of a periodic random sample; so then you might be within legal limits in total, but with an unlucky draw, you will have broken the law. Is there something wrong with that?

  3. I think both comments above (Anon and Dz) underestimate the problem.

    Suppose I downgrade employees who have switched jobs often, because I figure if they’ve left other jobs quickly they might leave this one quickly too, and I don’t want to train an employee and then have them leave. So I decline to hire an employee and they sue me because they claim this rule will discriminate against protected group X. If in fact the burden is now on me to prove that it doesn’t, how would I do that? I’d have to get data about job tenure statistics for a bunch of different racial groups, and the data would have to be sufficiently detailed that I could apply my evaluation rule and see how much it differs by group. Good luck with that.

    • I think you’re overestimating the standard of proof. As far as I can tell, the platform doesn’t have to show that (for example) women don’t have shorter tenures than men, which logically means women are being discriminated against. It just has to show that the proportion of women who apply for jobs on its site who get hired is not less than the proportion of men. The law seems to be about the ends, not the means, so I could write an algorithm that (effectively) automatically downgrades all female applicants, but if women get a net upgrade for other characteristics in practice (more women have college degrees, for example) then it’s fine.

        • Hm. We have a different interpretation of the basic issue, I think. I’m inferring that the alleged discriminator (or “platform”) is a website that matches applicants with positions at many different businesses. It sounds like you’re talking about an offender that’s an individual, relatively small business, hiring people on its own or through a site like that. That’s a completely different, although also problematic, situation than the one in the op-ed. I can’t imagine there are too many businesses that have only a few openings a year, and only a few applicants per opening, but somehow find it economical to develop and implement their own algorithm for sorting through a dozen resumes.

        • Sometimes, for a small business or even a small college, the most important criterion in whether or not a job applicant is hired is whether or not they “pass lunch”.

        • True…but only for the employer doing the hiring. The algorithm isn’t going to lunch with people, and it’s the algorithm that’s on trial. Even if every single business on the platform refused to hire minorities, the platform would not be guilty of discriminating against minority applicants so long as its algorithm’s recommendations were proportionate to the racial make-up of the pool of applicants. The disparate impact would be happening outside of the platform’s control. In fact, if the platform could demonstrate that employers had lunch with a balanced proportion of minorities but decided to hire only white applicants, that would be the best defense possible against charges of discrimination.

        • 100 people apply for airline pilots, 99 of them are white 1 of them is black.

          The “algorithm” is simply “let everyone who applies have their resume sent to the employer”

          since 13% of the US is black not 1%, this is obviously an illegal algorithm.

          That just means the law is stupid.

      • Joshua,
        Good question.

        It’s a really hard problem. Suppose someone runs a small hardware store. The employees don’t turn over very much, so the store only hires one new person per year on average. The store has been in business for nine years and has only ever hired white male employees. Nine white hires, no blacks or hispanics or Asians or Native Americans or other races. But how many have applied, what’s the screening process, etc. etc… I’m inclined to say that discrimination can be judged on statistical grounds only if there’s a much larger dataset, like a whole chain of hardware stores or something. And yet, if you want to combat discrimination by this sort of legal means, at some point you can’t say “on average, this category of business is doing discriminatory hiring”, you have to find a specific business and put the hammer down. I honestly don’t know how to do that. Worth thinking about.

        • Phil said,
          “And yet, if you want to combat discrimination by this sort of legal means, at some point you can’t say “on average, this category of business is doing discriminatory hiring””

          Yet drugs are approved because “on average” the drug is considered to be better than the alternative (whether the alternative is another drug, a placebo, or no treatment) — even though some patients get worse with the drug.

        • Martha, that’s right! And I think there is a parallel there: when we look at a whole lot of patients, we can see that _on average_ the drug helps. If we look at a whole lot of hiring decisions, we can see that _on average_ there is discrimination. But it can be much harder to look at an individual patient and determine whether they were helped or hurt by the drug (well, depending on the drug), and it can be much harder to look at an individual hiring decision and determine whether it was influenced by discrimination.

        • Again, the issue is discrimination by a “hiring platform” using an algorithm to match employers with qualified applicants, presumably using on a score it assigns them based on their attributes. To use the drug analogy, the op-ed is talking about determining retrospectively the drug’s effect on all the people who ever used it (by subgroup), whereas you are talking about determining the effect on all the people who bought it at a particular CVS (by subgroup) and then making inferences to the rest of the population. Yes, that would be a bad idea.

          But, even if we were talking about a platform that only handles a few applications–maybe it’s still beta testing?–or a race/gender for which only a handful of applicants are on the site, there’s another solution. We could collect resumes from people across the country, both in and not in that group, and run them through the algorithm to see the proportions recommended for jobs. This is analogous to a real drug trial, where you test effectiveness in a setting that is slightly artificial but controlled and not retrospective.

        • Phil –

          No doubt. There has to be a larger dataset. Obviously, it’s hard to draw a fixed line to determine what would be a big enough dataset. How would you propose setting a sample size needed to determine “significance” in this case?

      • “What do you suggest as an alternative?”

        One approach is to ensure that the labor market is strong enough to create a competitive disadvantage for discriminating.

        If Jim discriminates against the more-qualified Ernesto and hires a less competent dude like Joe America, then Phil will grasp the opportunity to hire the highly qualified and conscientious Ernesto, thereby giving Phil a competitive business advantage. If Phil continues to do this, his business will excel ahead of Jim’s, he’ll expand faster, continue to hire the most competent people regardless of race, gender or SO. Meanwhile, Jim’s business will slowly decline into oblivion.

        • jim,
          Yeah, that’s a real effect. Jackie Robinson was waaaaay better than the average baseball player when he entered the league.

          Andrew and I attempted to take advantage of this principle back in 1990 or so. Andrew was visiting me in Lexington, Kentucky. Statistician visiting Kentucky, and I think he had just heard about the Dr Z system…of course we had to go to the horse races, although I don’t fully approve of the sport on animal cruelty grounds. Still, we stopped at the track on the way to the airport, long enough to see a few races and bet on them. We looked for Z System bets, I’m not sure we found any really good ones but we gave it a go. But also, we noticed that there was a female jockey in at least one of the races; there were a few at that time, but very very few. We figured that if a woman was making it in the sport, she was probably much better than the average jockey, so we bet on her. Sadly, I don’t recall whether she won, nor whether it was the great Julie Krone or someone else.

          But this is an extremely unreliable system to rely on in a battle against discrimination. Yes, if someone discriminates against blacks or women or whatever, they will be forced to take less competent workers on average, but the effect can be very small. Plus, to the extent that their other workers are in favor of the discrimination, they risk problems if they do bring on someone from one of the disfavored groups. Baseball provides a great example: some of Jackie Robinson’s teammates didn’t care how good he was, they just didn’t want to play with (or even against) a black man. And if coworkers aren’t an issue, customers can be.

          Basically I don’t think the magic of the free market is necessarily enough to overcome discriminatory practices. If it were, there would have been black people in pro sports decades before the color line was actually broken. And sport is a sphere where the quality of the potential employee is much easier to evaluate: a baseball team that didn’t hire Satchel Paige knew they were passing up one of the best pitchers in the game, whereas some guy who just doesn’t want to hire blacks, or women, or whomever, at his hardware store…well, even if the person he hires instead isn’t quite as good, how big is the difference really, and can the store owner even tell?

        • “the effect can be very small.”

          I guess the scale of the effect would depend on the degree of difference in skills, right?

          AMZN and MSFT seem to believe the effect can be large and are doing everything they can to get the top people from all over the world. The diversity at these companies is readily apparent on the streets downtown Seattle and the streets of Redmond.

          Of course, there’s only so much skill that can go into stacking cans of paint, so the effect would be modest at the local hardware or grocery. But if the dif in skills is that small, there’s no significant discrimination either. At least in our area, I find the personnel at local stores pretty much reflect the neighborhood.

          Oh, yeah, Jackie Robinson: that was what, almost 70 years ago? I think we’ve come a long, long way since then.

        • I gave Jackie Robinson as an example to illustrate your point to other readers, not really saying that’s the situation today.

          I agree we have come a long way in 70 years. There are no longer Jim Crow laws, redlining, and other legal or institutional overt racism.

          But lord knows there are still plenty of racists, plenty of employers who would never hire a black person, etc. I think the situation is worst for blacks but it’s not like there aren’t other groups that still face discrimination.

        • “But lord knows there are still plenty of racists”

          No doubt. And sexists, and people who don’t like gays and whatever. There will always be that, sad to say.

        • Let’s consider the small hardware store mentioned by Phil. Compared to let’s say a supermarket I’d rather choose to buy in a hardware store based on the quality of the customer service than on the quality and price of the goods. Good customer service can’t be achieved by good hiring but only by good training. Maybe you can train better people cheaper, so the store get’s an advantage. Compared to the costs of training this shouldn’t be important.

        • jim –

          Lot’s of questionable assumptions here. Embracing a racist attitude towards hiring could also give a competitive advantage – it could attract customers, make other racist employees working there to be happier and more productive, etc.

          It also only works along one, short-term dimension. The competitive advantages might take time to materialize, e.g., more diverse hiring (that truly reflects the quality of applicants rather than their color) could take a while to materialize, iow, fairness in hiring at the entry level – where for historic reasons there might more likely be a racial parity in applicants could take quite a while to migrate the advantages to a higher level.

          And such a practice would do nothing to change disparities already baked into the system. Disparity in applicants’ quality could reflect existing, historical disparities in access to training, access to education, access to experience due to previous discrimination in hiring.

          Part of the question being debated in society today is whether it is enough to declare the the playing field is level after hundreds of years of tilting the playing field to advantage one group over another.

          Asking that question doesn’t necessarily imply a particular answer, but it does suggest an obvious limitation of the system you’re suggesting as a solution.

        • “Lot’s of questionable assumptions here. ”

          I don’t think so! The “competition solution” has already been demonstrated. Sports diversified without any regulations or laws at all. The tech industry has done so as well – at least within the bounds of getting qualified applicants. When I was a kid in the late 70s, Redmond WA was a tiny hamlet where (white) people rode horses and voted Republican. Today it’s a sprawling and gleaming Microsoft city with tens of thousands of people from all over the world. Today my congressional district is represented by an Indian woman (Pramila Jayapal). The competition solution clearly does not suffer from “questionable assumptions”. It actually works.

          It’s the regulatory solution that is riddled with questionable assumptions. If you’re going to say some group should reflect “the population”, what population? The geographic region? if so, how far from the employer? Do people that live in the bayou count as part of “the population” that’s employable in Atlanta? The group of people with similar degrees? And similar experience? What constitutes “similar”? I have a geology degree but I’m a far better data/business analyst than most people with computer science degrees.

          “such a practice would do nothing to change disparities already baked into the system”

          “Disparities” aren’t illegal or unjust. What’s illegal and unjust is *racial* discrimination. Here in WA state we have a steady stream of immigrants from Mexico and south. They are very poorly educated. It shouldn’t surprise anyone that their kids graduate high school at lower rates and go on to college at lower rates than the general population. This is a “disparity” but it’s not due to racism or injustice. It’s due to a home environment with poorly educated parents. Not surprisingly, the parents with no education or skills other than labor don’t have jobs in the tech industry: they work in farming, landscaping, and shipping. Many of their children follow their parents’ occupations.

          “whether it is enough to declare the the playing field is level after hundreds of years of tilting the playing field to advantage one group over another.”

          That’s *your* question. It’s not *the* question. Law-abiding black folks don’t want to “pay,” through discrimination, because other black folks are ghetto gangstas and criminals. By the same token, I have *no* intention of “paying,” through some legalized form of discrimination, for the crimes of other white people in the past.

          “Embracing a racist attitude towards hiring could also give a competitive advantage – it could attract customers, make other racist employees working there to be happier and more productive, etc.”

          It could attract customers in certain circumstances and make other employees happy. But more productive that a company or business that hires more qualified people? Nope. Such a business could persist on a local level. But it can’t grow, and frankly since it’s going to be at the bottom of the wage spectrum, the benefits of stopping this kind of discrimination are very small. Whatever the case, you won’t see much of it at large corporate stores.

        • “Lot’s of questionable assumptions here. ”

          I disagree! :) The “competition solution” has already been demonstrated. Sports diversified without regulations or laws. The tech industry has done so as well – at least within the bounds of getting qualified applicants. When I was a kid in the late 70s, Redmond WA was a tiny hamlet where (white) people rode horses and voted Republican. Today it’s a sprawling and gleaming Microsoft city with tens of thousands of people from all over the world. Today my congressional district (adjacent to Redmond) is represented by an Indian woman (Pramila Jayapal). The competition solution clearly does not suffer from “questionable assumptions”. It works.

          It’s the regulatory solution that is riddled with questionable assumptions. If you’re going to say some group should reflect “the population”, what population? The geographic region? if so, how far from the employer? Do people that live in the bayou count as part of “the population” that’s employable in Atlanta? The group of people with similar degrees? And similar experience? What constitutes “similar”? I have a geology degree but I’m a far better data/business analyst than most people with computer science degrees.

          “such a practice would do nothing to change disparities already baked into the system”

          “Disparities” aren’t illegal or unjust. What’s illegal and unjust is *racial* discrimination. Here in WA state we have a steady stream of immigrants from Mexico and south. They are very poorly educated. It shouldn’t surprise anyone that their kids graduate high school at lower rates and go on to college at lower rates than the general population. This is a “disparity” but it’s not due to racism or injustice. It’s due to a home environment with poorly educated parents. Not surprisingly, the parents with no education or skills other than labor don’t have jobs in the tech industry: they work in farming, landscaping, and shipping. Many of their children follow their parents’ occupations.

          I just read an article about a black startup entrepreneur. Supposedly he ran into problems because he’s black. But later – much later – in the article it comes out that his parents are Hatian immigrants. Nothing wrong with that, but they were poorly educated and worked at modest jobs.

          Compare that to another famous entrepreneur: Bill Gates. Aside from being a genius, his father was a very successful corporate lawyer. They had money, but the big difference is that Gates had a highly trusted and experienced family member to help him learn the ropes in founding a corporation. This is the case for many entrepreneurs: they come from money, yes, but more importantly, they come from **education**.

          Differences in family education matter. And disparities in that situation aren’t necessarily injustices. Just disparities.

          “whether it is enough to declare the the playing field is level after hundreds of years of tilting the playing field to advantage one group over another.”

          That’s *your* question. It’s not *the* question. Law-abiding black folks don’t want to “pay,” through discrimination, because some black folks are gangstas. By the same token, I don’t believe that everyone should be “paying,” through some legalized form of discrimination, for the crimes of other people, past or present.

        • Jm said,
          “The “competition solution” has already been demonstrated. Sports diversified without regulations or laws. The tech industry has done so as well – at least within the bounds of getting qualified applicants. When I was a kid in the late 70s, Redmond WA was a tiny hamlet where (white) people rode horses and voted Republican. Today it’s a sprawling and gleaming Microsoft city with tens of thousands of people from all over the world. Today my congressional district (adjacent to Redmond) is represented by an Indian woman (Pramila Jayapal). The competition solution clearly does not suffer from “questionable assumptions”. It works.”

          This is just evidence that the “competition solution” has worked in some circumstances — this is no guarantee that it will work in all circumstances.

          Jim also said,
          “Disparities” aren’t illegal or unjust. What’s illegal and unjust is *racial* discrimination. Here in WA state we have a steady stream of immigrants from Mexico and south. They are very poorly educated. It shouldn’t surprise anyone that their kids graduate high school at lower rates and go on to college at lower rates than the general population. This is a “disparity” but it’s not due to racism or injustice. It’s due to a home environment with poorly educated parents.”

          I disagree that “disparities aren’t unjust” They are often unjust. For example, it is unjust that kids with poorly educated parents are less likely to graduate from high school and go on to college than kids with well educated parents. My ethics say that society has a moral obligation to at least try to give kids with poorly educated parents the same educational opportunities as kids with well educated parents.

        • Martha (Smith) says:

          “This is just evidence that the “competition solution” has worked in some circumstances — this is no guarantee that it will work in all circumstances.”

          It may not work in all circumstances. However, we tried affirmative action & bussing for decades and it seemingly accomplished little or nothing at great expense.

          “it is unjust that kids with poorly educated parents are less likely to graduate from high school and go on to college than kids with well educated parents.”

          Nothing will ever change that. Look: even a newly hatched chicken “imprints” on it’s parent. This is a fundamental aspect of the biology of higher level organisms (mammals, birds, some reptiles). (that’s probably why the effects of “quality preschool” disappear before high school). The bureaucratic education systems we have today don’t have a hope of moving that needle. To whatever extent they have been “successful,” mostly what they’ve done is change the markers on the dial.

          Maybe it’s not as bad as you think. If I came from a violence-ridden central American country and was able to scratch out a living in the US without having to worry about getting killed every day and knowing my kids would be safe, I might be pretty happy with that. I might think it was OK for my kids to have such an excellent opportunity as I had. I might not care at all about high school, much less college.

        • I said, “This is just evidence that the “competition solution” has worked in some circumstances — this is no guarantee that it will work in all circumstances.”

          Jim replied, “It may not work in all circumstances. However, we tried affirmative action & bussing for decades and it seemingly accomplished little or nothing at great expense.”

          This is just evidence that the methods of “affirmative action and busing” that we used didn’t work in all circumstances. That doesn’t seem like a good reason to give up completely. It seems rationale to look for what we did wrong in these attempts to improve inequities, and see if we can figure out better solutions.

          I said, “it is unjust that kids with poorly educated parents are less likely to graduate from high school and go on to college than kids with well educated parents.”

          Jim replied, “Nothing will ever change that.”

          We can’t change it totally, but we might be able to change it partially (and learning from previous mistakes is, as usual, an important part of changing things for the better.”

          Jim said, “Maybe it’s not as bad as you think. If I came from a violence-ridden central American country and was able to scratch out a living in the US without having to worry about getting killed every day and knowing my kids would be safe, I might be pretty happy with that. I might think it was OK for my kids to have such an excellent opportunity as I had. I might not care at all about high school, much less college.”

          The ” Maybe it’s not as bad as you think, ” statement sounds like you *think* you can read my mind. It also sounds (to me — presumably the expert in what’s in my mind) like you can’t read my mind.

        • jim –

          Martha touched on some of the same, and similar features of your logic that I would have touched on. Here’s a coupla more responses:

          > The “competition solution” has already been demonstrated. Sports diversified without regulations or laws.

          You are arguing as if the changes in sports took place in a context in which changes in laws and regulations didn’t take place. Obviously, that is not the case. We can’t wave some kind of magic wand to distinguish sports from the rest of society, or to create a world where we could experiment to see if sports would have changed in the same ways that it did absent widespread changes in laws and regulations. Those changes in laws and regulations that did take place likely helped to spur changes in a vast array of areas, including sports. There was likely spill-over. There was likely an “interaction effect.” We had hundreds of years of discriminatory laws in place and it was only when they were removed, and other laws put into place and enforced, that we saw much progress. “Competition” didn’t bring about those changes until laws and regulations where changed and put into place. That doesn’t mean that “competition” doesn’t have anything of an effect. It just means that the world doesn’t fit into neat little categories as you seem to me to be describing

          > The tech industry has done so as well – at least within the bounds of getting qualified applicants.

          That seems like a highly unqualified statement. Yes, some changes have taken place. Some things have not changed.
          And like sports, those changes that did (and didn’t) take place developed in an inter-connected context.

          > When I was a kid in the late 70s, Redmond WA was a tiny hamlet where (white) people rode horses and voted Republican. Today it’s a sprawling and gleaming Microsoft city with tens of thousands of people from all over the world.

          Just as much of the whiteness of many white communities was created and reinforced by discriminatory laws, so changes in those communities – to the extent that they have happened – happened because discriminatory laws were lifted, because affirmative action types of programs were initiated, and because other laws were created to fight discrimination.

          > Today my congressional district (adjacent to Redmond) is represented by an Indian woman (Pramila Jayapal). The competition solution clearly does not suffer from “questionable assumptions”. It works.

          Again, IMO, you’re operating from false assumptions that (1) you can quantify partial success (in some areas) as “it works,” (2) to the extent that it has “worked” it has “worked” independently of the influence of changes in laws and regulations and (3) all the changes that took place outside of those explainable by the changes in laws in regulations (where the causality is actually impossible to determine) took place only because of “competition.”

          > It’s the regulatory solution that is riddled with questionable assumptions.

          Questionable assumptions exist across the board. And further, just because questionable assumptions might be connected to “the regulatory solution” doesn’t explain whether or not they might also be associated with “competition” as a solution. There is no zero sum, or directly connected, or inversely proportional relationship .

          > If you’re going to say some group should reflect “the population”, what population? ../

          I didn’t say that. I don’t look at it in such a simple fashion. My point would be more along the lines of we should work to take steps to combat discrimination – and indeed, to help assure both equality of opportunity AND equality of outcomes. I don’t expect precise equality in either aspect. I wouldn’t reverse engineer from some level of inequality in either aspect to say that the conditions are unacceptable. But that doesn’t mean that to approach equality in both aspects shouldn’t be a goal or that we shouldn’t try to implement policies to help advance those goals.

          > The geographic region? if so, how far from the employer? Do people that live in the bayou count as part of “the population” that’s employable in Atlanta? The group of people with similar degrees? And similar experience? What constitutes “similar”? I have a geology degree but I’m a far better data/business analyst than most people with computer science degrees.

          These are all questions that are relevant and they should be considered. But we shouldn’t expect perfect answers. And we shouldn’t determine that because we can’t answer each question perfectly, therefore any attempt at answers is a failure and worthless.

          > “Disparities” aren’t illegal or unjust.

          Again, your logic is too broad. Some are both illegal and unjust. Some are merely unjust even if they aren’t illegal. Some are neither illegal or unjust.

          > What’s illegal and unjust is *racial* discrimination. Here in WA state we have a steady stream of immigrants from Mexico and south. They are very poorly educated. It shouldn’t surprise anyone that their kids graduate high school at lower rates and go on to college at lower rates than the general population. This is a “disparity” but it’s not due to racism or injustice.

          Again, you speak in black and white terms as if one can simply divorce disparities in outcomes from any aspect of illegality or injustice. Particularly considering the legacy of hundreds of years of laws that are now considered both illegal and unjust, I don’t really see why you’d even attempt to do that: It’s simply not possible.

          > It’s due to a home environment with poorly educated parents.

          Again, you’re making these blanket statements in a world that doesn’t conform to such generalities.

          I don’t see an purpose in continuing beyond this point. I suggest that you’re arguing against some script that you see before you – that you’re arguing against certain caricaturish blanket arguments (of some imagined lib, I guess) that haven’t been made here. I think it would be only if you limited yourself to arguments that I am making could our discussion be constructive going forward. One way to help set that up would be for you to quote things that I’ve written to respond to.

  4. > I don’t know what “statistical proof” is,

    The idea of “statistical proof” is evidence that people are really bad at risk assessment, and understanding conditional probabilities.

  5. I assume “proof” in this context is a legal, rather than empirical, term: if the judge finds in your favor, you have proven your case, meaning you’ve met a certain standard of proof–in this case, demonstrating disparate impact.

    But even in statistical terms, “proof” might not be all that meaningless: while you’re right that it’s impossible to “probabilistically prove” something, or to “inferentially prove” something, we can “descriptively prove” a claim. In this context, it appears the actual legal standard is to show that a disproportionate number of people in one subgroup have been hurt by the algorithm. In the old days, when records took up physical space and data was not yet the most valuable resource of a service provider, that might have required sampling and inference. But nowadays, it’s very plausible that a platform could provide census (lowercase) data to show exact numbers/proportions of people by subgroup (assuming they get that info from applications, site profiles, cookies, etc.). The site could thereby *prove* that there was no disparate impact using *descriptive* statistics.

    • The big problem is that no-one cares about causality.

      If there were N women and Pw percent of them weren’t hired, while there were K men and Pm percent of them weren’t hired, and Pw is greater than Pm this does NOT in any way mean that your automated algorithm discriminated against women. (or flip it around whatever).

      What does it even mean to discriminate against a group? I’m sure there’s a reasonable definition if we think carefully about it, but “Pw is greater than Pm” is not it.

        • Makes me think of 2112: “We are the Priests of the temples of Syrinx…Our great computers fill the hallowed halls” I wonder if Neil Peart recognized how close we are to having this come true.

        • I think any definition of fairness can NOT be built on frequency. Whether you are fair to a given person is entirely a function of whether the decision for that person was based on good reasons that are legal to use.

          So, if minority person A was denied a loan because this person’s existing debt payment is a very high fraction of their monthly income then that’s a good reason. If it turns out that many people in that minority group also have similar situations, then that minority group will be impacted heavily by the decision not to give loans to people who have a lot of debt. This is fair even though it’s not equal impact.

          You can argue that at least conditional on identical information available, there should be no large frequency differences between say minority people and majority groups… But the decision space is high dimensional. Let’s suppose you consider 15 difference factors, i’ll just make up a few:

          1) Local cost of living
          2) Local unemployment levels
          3) Personal income
          4) Personal asset values
          5) Type of job
          6) Number of household earners
          7) Taxes paid
          8) …..

          you get the idea. Then after you’ve got 15 of these, the chance you’ll have any two people within 10% of equal on all 15 statistics is going to be near zero. Basically in high dimensional space, no one is close enough to hear you scream.

          Conditional equivalence is going to be impossible to use as a methodology.

        • Daniel:

          The statistical analysis as legal evidence of racial bias emerged as a result of the long overdue enforcement of the illegality of overtly racist decisions. What clever managers did was develop covertly racist hiring criteria such as credential requirements. See Griggs v. Duke Power (1970) where a high school diploma was inserted into the hiring criteria that disproportionately impacted otherwise qualified black employees.

          When bias exists in both the assessment of performance (outcome) and in the assessment process (criteria) it will be observed and modeled as if it were “true” correlation or true conditional probability at the individual level. Simple race categories are not adequate to disentangle this type of racist behavior nor are statistical adjustments. I will acknowledge that the goal of fairness at the individual decision may be a worthy goal, but it is not always easy to come by. I would also argue that the problems racism has created in the United States will not be solved by fair individual level decisions.

        • It seems to me you’re claiming that the requirement of a high school diploma was not allowable because of the *intent*. The purpose was not a legitimate hiring concern like cost to train, turnover, or productivity, instead the purpose was to exclude black people because the hiring managers didn’t like black people.

          If you can show that some people conspired to choose something so that they could exclude black people, then I think that’d be a reasonable basis for a discrimination lawsuit.

          But what if people were legitimately trying to reduce the cost of training and the frequency of turnover, and then required a HSD because they believed with good reason it would reduce those costs. Would this be illegal because there were many black people who didn’t have an HSD?

          I think if you claim there were many “equally qualified” black people who didn’t have an HSD then your claim is really that HSD did not accurately predict those costs. Because to be equally qualified, you must be ultimately someone who requires no more training than the HSD person and someone who would stay on average as long as the HSD person and be as productive.

          So then, HSDs are not legal criteria precisely because they don’t address a legal goal, such as reducing costs or turnover or increasing productivity in other ways.

          But as soon as you can show a legitimate predictive power: people with HSDs on average have lower costs to train, higher productivity, lower turnover, etc, then I don’t see how we can claim that HSDs should be an illegally discriminatory criterion?

          If the intent is to harm then the fact that it has benefits to the company is irrelevant, we don’t let people go out and purposefully harm others. But if the intent was legitimately to benefit the company, and the action did in fact benefit the company, it’s hard to understand by what means you could claim it should be illegal.

          If we want to address the problem that black people don’t have sufficient education to compete with other people who have HSDs, we should do so by providing education that raises those people’s qualifications to perform jobs, and productivity, not by forcing people to take workers who legitimately reduce the company’s productivity relative to others.

        • Daniel:

          “But as soon as you can show a legitimate predictive power: people with HSDs on average have lower costs to train, higher productivity, lower turnover, etc, then I don’t see how we can claim that HSDs should be an illegally discriminatory criterion?”

          I am not claiming the requirement of a credential is illegal. I am claiming that when it is used as a false barrier to entry it provides cover for covertly racist and biased decisions. I am further claiming that the statistical models used to make the claims of reduced costs of training are often dubious given the racist confounds on both sides of the equation along with the small samples and often nonsensical ways in which “savings” are calculated (I still remember how the “cost” of a training pipeline was estimated that involved taking the average cost and then calculating savings based on that — it was ludicrous as the savings involved were a fraction of that value, a rounding error in the yearly budget).

          That said, your argument is consistent with your belief that fairness is rightly focused at the individual level. However, it sidesteps the reality that effects of historical and current racism exists at a group level. Economic deserts in every large city in the United States populated largely by black people emerged as a result of group level behavior of white people who did in fact conspire to exclude black people from participating in the economic system.

          If we educate every last high school age person from these areas — where will the savings be realized?

        • >white people who did in fact conspire to exclude black people from participating in the economic system.

          I agree with this. I think the answer is to transfer assets to black people, not to lower the efficiency of the economy by creating improper and illogical barriers to efficiency.

          BTW: I also agree with all the stuff you say about dubious claims of reduced cost of training etc. So eliminating artificial methods of discriminating is fine… it’s eliminating actual real methods of gaining efficiency that I object to.

        • Daniel:

          Where I suspect we may disagree is on what you call “illogical barriers to efficiency”. If a credential is required when the necessary skills can be gained in other ways — I would argue it is both an illogical barrier to entry and an illogical barrier to efficiency. This notion that some single set of predictors, including credentials, is a fundamental part of the problem that structurally supports the systematic bias without providing the efficiency you believe it does.

          Credentialing has it’s place where it is intended to protect the public from dubious practitioners though it has become a way to simply create false barriers to entry and revenue streams for software companies.

        • I don’t think we disagree at all. If a credential can be shown to increase efficiency then it’s not illogical, but if it can’t be shown to do so or it even harms efficiency, then I agree it’s illogical and I think it should be thrown out. my point is that if a business can show that it is substantially better off economically using some criterion, then the fact that it disproportionally harms minorities doesn’t qualify it as illegal discrimination… let’s go to an extreme to illustrate, suppose you live in a state where due to past discrimination Orange people are overwhelmingly illiterate. Now you are hiring copy editors. An illiterate orange person applies and you reject them based on a literacy test… the literacy test has an overwhelmingly disproportionate impact on orange people. By your standard it would be illegal… apparently your standard requires businesses to hire illiterate people as copy editors… or blind people asphotographers or people without arms as delivery drivers… etc

          Having a disproportionate impact on a given race is not by itself improper discrimination. It must be that you refuse to hire people who evidence show would adequately perform the task on the basis of a criterion that disproportionately harms the minority. in other words on the basis of a criterion that fails to distinguish an actual economic interest

        • I wrote that poorly, I shouldn’t have said “your standard” I should have said “the disproportional impact standard” as I don’t attribute it to you (Curious) at all.

        • Daniel:

          We agree about the logical implication of simple statistical differences for the establishment of actual prima facie evidence of discrimination. The challenge is that when we remove this as a possible source of prima facie evidence, we remove the ability to challenge anything but overt and out of the closet racial discrimination in the hiring process.

        • The challenge is that when we remove this as a possible source of prima facie evidence, we remove the ability to challenge anything but overt and out of the closet racial discrimination in the hiring process.

          My criterion is basically this: if it has a disproportionate impact on a given protected class, then you must *show that it serves an important nontrivial business purpose*. From a statistical perspective, that the rule has a well resolved positive economic effect on the business, ie. posterior probability that the effect is greater than some positive threshold is very high.

          Again, requiring people to have arms in order to be a package delivery driver is going to disproportionally harm people with major injuries… requiring functional eyes to be a photographer is going to disproportionally harm blind people, but both of these are rules that have clear large positive economic purposes. No one argues about them because the magnitude of the economic interest is so large you don’t need statistics to figure it out.

          The issue is going to come when the magnitude isn’t so obviously large.

        • Curious and Daniel:

          Interesting.

          In the link I posted the same issue may have been more vivid.

          Recidivism prediction based just on age and number of past offences contributes to almost all of the accuracy of prediction.

          Now if for simplicity, age just reflects brain development and within age increased number of offences just reflects economic deprivation, prediction of re-offence based on just those two variables may seen acceptable.

          However, for those incarcerated in the US, past discrimination very likely has lead to the an increased percentage of blacks. So is even the use of age acceptable? The number of past offences maybe more complicated.

          Balancing the harm of more re-offenses versus continued systematic disadvantaging …

        • Daniel said,

          “If the intent is to harm then the fact that it has benefits to the company is irrelevant, we don’t let people go out and purposefully harm others. But if the intent was legitimately to benefit the company, and the action did in fact benefit the company, it’s hard to understand by what means you could claim it should be illegal.”

          This criterion (“the intent was legitimately to benefit the company”) seems to be a criterion that does not expect adequate diligence in choosing the criterion for hiring/not hiring. What I’m thinking here is that “must have a High School Diploma” might be a “broad brush” criterion for “being qualified for the job”, but that there might be another criterion that is at least as good (or maybe even better) for predicting good performance in the job — and if this “alternative” criterion happens to shut out fewer people from the minority group in question, then the hirer should be obliged to use the less discriminatory but equally (or more) effective alternative criterion.

      • The law, as written, does care about causality. If a plaintiff can provide causal evidence that racism/sexism was a causal factor–perhaps by pointing to leaked emails in which the people running the platform say they’re trying to discourage minorities from being hired–that would be sufficient under the law (according to the op-ed). But that’s a really, really high standard, and one easy to for the platform to duck, so the law provides another avenue for showing causality.

        In this particular case, causality is easy to determine. If fewer women on the platform (for example) get recommended for interviews on with businesses on the platform, and those recommendations are based solely on a score generated by the algorithm, then the algorithm has caused fewer women on the platform to get recommended for interviews. Then the question is, Whose fault is it that the algorithm recommended fewer women? It’s not the algorithm’s fault–it’s just code. It’s not the women’s fault, because gender is a protected class under the law. It’s not the employer’s fault, because they have no control over the algorithm and probably have been assured by the platform that the algorithm is fair.

        That just leaves the people who run the platform–the law says they have a responsibility to make a reasonable effort to monitor their algorithm’s performance and its impact, and to modify it to minimize that impact if possible. The key here is that discrimination is an outcome of the platform’s failure to follow the law, regardless of whether they are racist or sexist or just irresponsible.

        • Thinking in terms of frequency, we have:

          p(recommended | Relevant Stuff Score, Irrelevant Stuff) = p(recommended | Relevant Stuff Score) p(Relevant Stuff Score | Irrelevant Stuff) p(Irrelevant Stuff)

          Suppose we agree that the color of your skin is Irrelevant Stuff to being say a barber… Well

          p(Irrelevant Stuff) is a fact about the population, like what fraction of people have dark skin.

          p(Relevant Stuff Score | Irrelevant Stuff) is a fact about the population of dark skinned and light skinned people… there’s some fraction of them who have more vs less experience cutting hair.

          p(recommendation | Relevant Stuff Score ) is a fact about the recommendation system… it recommends people who have more relevant qualifications at a higher frequency.

          Suppose the system looks at just Relevant Stuff Score (ie. questions about knowledge and experience in cutting hair), and then on that basis makes recommendations for hiring.

          Suppose that p(Relevant Stuff Score | Black) has more mass on smaller scores… that is Black people generally have a lower frequency with which they have high experience and skill in hair cutting, whereas p(Relevant Stuff Score | White) has a generally higher distribution… more white people have experience with hair cutting.

          The algorithm looks *only* at the Relevant Stuff Score… but it *WILL* discriminate against black people on the basis of “disparate impact”. The disparate impact will be precisely p(Relevant Stuff Score | Black) / p(Relevant Stuff Score | White), in other words a fact about the two populations.

          Is that ok? In order to get “no disparate impact” we must demand the recommender system actively discriminate in favor of black people to make up for the fact that fewer black people have relevant skills in this area… Does that make sense?

          (Please it’s a hypothetical, let’s not talk about actual barber skills across races)

    • I agree that if there is a complete census of relevant data and subjects, then one can make some claims of discrimination or lack thereof.

      However, if the census data do not include all relevant factors, then the situation is more fuzzy — and uncertainty in a decision (one way or the other) needs to be accepted/respected as part of reality.

      In some cases, it might be sufficient to show that there was a good faith effort, diligently planned and carried out, to prevent discrimination. But even then there may be confounding factors that might produce results contrary to intent. These are indeed situations where it is (regrettably) not possible to conclude anything definite, and hence when it may (regrettably) be necessary to agree to disagree (which, I realize, the legal system does not seem to allow). This is all related to the idea of “bright line” laws or legal interpretations: https://en.wikipedia.org/wiki/Bright-line_rule

      • Nah. See my response to Daniel above–either the platform made a reasonable effort to detect and prevent disparate impacts or it didn’t. The law only requires that the platform do what is reasonable to anticipate, prevent or mitigate those impacts. Although it’s quite possible that what’s reasonable will hurt their business model or even shut them down. Parenthetically, it bothers me that Facebook’s excuse for not being able to control illegality/bullying/disinformation etc. on its site often is “It’s impossible to monitor a huge social network and provide a fair process for determining what ought not be allowed.” Yeah, and it’s impossible for a four-year-old to safely drive a car. So we don’t let them.

  6. This is a strange Op Ed. In discrimination cases, the plaintiff first makes out a prima facia case which is simple: They are a member of a protected class and they received some adverse outcome. Then, the burden shifts to the employer to show that they had some non-discriminatory reason for the decision, which is also easy, e.g., “I thought the person I hired was a better fit.” Then, the burden shifts back to the employee to prove that the non-discriminatory reason was just a pretext. That is hard. The author seems to think that automated hiring systems will count as non-discriminatory reasons for hiring, and thus rebut the employee’s prima facia case, and thus we need his new discrimination per se doctrine, which sounds loopy. However, I don’t see why courts should hold that using an automated hiring system is in itself constitutes a rebuttal of the prima facia case unless you know how the automated system works. If it is some black box system that tends to spit out results that heavily weigh against protected classes and no one can explain why, then I don’t think it rebuts the prima facia case. If someone can explain that the system takes into account a set of reasonable features and weighs them base on criteria that tested well on a set of data on who were the best employees, then that should be sufficient evidence. I fail to see the problem that the author sees. Black box algorithms could be just as racist or sexists as employers, but then they shouldn’t count as rebuttal evidence. We don’t need a new legal doctrine.

    • Steve said,
      “The author seems to think that automated hiring systems will count as non-discriminatory reasons for hiring, and thus rebut the employee’s prima facia case, and thus we need his new discrimination per se doctrine, which sounds loopy. However, I don’t see why courts should hold that using an automated hiring system is in itself constitutes a rebuttal of the prima facia case unless you know how the automated system works. If it is some black box system that tends to spit out results that heavily weigh against protected classes and no one can explain why, then I don’t think it rebuts the prima facia case. If someone can explain that the system takes into account a set of reasonable features and weighs them base on criteria that tested well on a set of data on who were the best employees, then that should be sufficient evidence. I fail to see the problem that the author sees. Black box algorithms could be just as racist or sexists as employers, but then they shouldn’t count as rebuttal evidence. We don’t need a new legal doctrine.”

      +1 (If I understand the last line correctly)

  7. I’m very happy to see the emphasis on the difference between binary predictions and a continuous probability estimate. This is a distinction that is widely missing in most discussions of fairness.

    I’m not excited that one of their fairness measures conditions on the outcome. I expect it should be easy to construct examples where the true underlying data generation method would be considered “unfair” using this metric. I also wouldn’t be surprised if the same were true for their BG-AUC measure.

  8. On the matter of statistical proof, I have a new textbook on Bayesian data analysis that notes, in the course of a discussion of the binomial distribution, that Laplace felt “morally certain” that more babies born in Paris were boy than girls, based on a statistical argument.

  9. No discussion of fairness in these systems is complete without noting that common definitions of fairness are logically at odds. A system that is fair under one definition (e.g. within group calibration, which means that if you bucket people within a group with a 10% chance of default, about 10% of them will default) will under generous assumptions, necessarily be unfair under another common definition (e.g. balance for positive class, for example same average credit score for people who don’t default on loans between groups)! Largely this is due to differences in base rates between different groups. There is very well developed theory on this, unfortunately frequently ignored in the fairness literature: https://arxiv.org/pdf/1609.05807v1.pdf

    My own view is that within group calibration is the fair way to go. But I’m not naive enough to think that agreement on this is going to happen.

    So, the spectre I see when a discussion as loose as the NYT one above, is that given flexibility on the definition of fairness, pretty much there is no way for an algorithm provider to win! Of course all of these arguments apply to human decision makers as well.

  10. Would you be more comfortable with “statistical evidence” (vs “statistical proof”)?

    In Australian law, whether something is proved or not is a matter for the judge or jury. It’s not for an expert to proof something. The expert can provide evidence, not proof. “Evidence” is anything that tends to support or oppose a claim.

  11. (Unavoidably political, sorry, but it actually just bugs me on purely logical and linguistic grounds…)

    Suppose I have use employment screening questionnaire that does not show applicants sex (or gender), and does not ask names. In fact it doesn’t include any information that correlates with sex, and the finest statisticians/machine-learning people/demographers/HR experts in the world cannot find any algorithm using these questionnires that predicts sex any better than chance. And when you look at the actual hiring algorithm’s record, you cannot find find any correlation whatsoever between my hiring decisions and the applicants sex or gender (which should be no surprise, since by assumption even the experts specifically trying to do that don’t succeed.) There’s no suggestion that my algorithm am using any information beyond the answers in the questionnaire.

    The puzzle: is it nevertheless possible that my use of these
    this algorithm and this questionnaire could be *breaking the law* by discriminating on the basis of the applicants sex? And _blatantly_ so? As I understand it the supreme court just said/showed that the answer is yes. The underlying logic suggest a whole class of other reasons to doubt that any mere statistical reasoning could affirmatively ‘prove’ nondiscrimination.

    • This is not merely hypothetical. I was once involved in a case in which a bank was accused of discrimination in mortgage lending. They were accused of denying mortgages to Blacks at higher rates and charging higher rates to Blacks whose applications were accepted. The problem was that the bank had no idea of the race of the applicant! (This was not strictly true. It would have been possible to draw some noisy inferences based on the name of the applicant and the location of the house, and plaintiffs argues that the bank did this, though no proof was ever forthcoming. In addition, applicants had to self-describe race on a form, but the bank argued that they refused to look at that form in the application process to avoid potential discrimination. Finally, the applications were mostly taken over the phone, so there was certainly another possibility that noisy inferences could be drawn, but the phone clerks and the assessors were different people and no notations were made on the file.) I thought of this case as almost the perfect test case for the proposition that adverse impact and illegal discrimination were clearly separable concepts. Modern cases involving computer algorithms which lack data on race should be similar. (Under pressure, the bank settled, so I never got my test case.)

      • Even more interesting (IMO) is the subsequent performance of the loans. Loans to Black and nonBlack applicants defaulted at roughly equal rates. This was oddly taken by the plaintiff as proof of discrimination: “See? Blacks are rejected at higher rates even though their default rates are no higher.” But in uneconomic discrimation, the discriminated-against group should have much lower default rates, since they are presumably being held to much higher implicit standards.

        • Exactly, in order to get the given default rate, more credit risky people in the minority group had to be rejected. This is evidence that the algorithm is working correctly. There *really are* more credit risky people in the minority group. If you’re not allowed to discriminate on the basis of credit risk, then basically you’re not allowed to be a bank. This is the business the bank is in.

    • > And when you look at the actual hiring algorithm’s record, you cannot find find any correlation whatsoever between my hiring decisions and the applicants sex or gender (which should be no surprise, since by assumption even the experts specifically trying to do that don’t succeed.)

      That shouldn’t be the case if you are using only the information in the questionnaire.

      Here’s why, suppose the job is some kind of warehouse worker, and the questionnaire asks something like “can you lift 50 lb boxes without difficulty?”. It *should* be the case that more women answer “no”, because more of them will have difficulty with that task.

      Similarly, suppose the questionnaire says “have you passed calculus II” then you’ll find that disproportionally minority groups will answer “no” because they have lower levels of educational attainment on average.

      So if you’re hiring precisely similar numbers of people in each racial/sex category, you can only be doing that by inferring the gender and undoing the “discrimination” caused by asking questions such as the above.

      • Daniel:

        This is where it gets tricky. While the direct connection between 50 lb boxes and the ability to perform a job is clear, the connection between passing calculusII (and other similar criteria) and job performance is far less clear for many, many jobs where such criteria are used.

        • This is why the Griggs test is flawed. I’m not sure the HS diploma requirement was actually *intended* as a method to avoid Black janitors at Duke Power, but I agree that if there were probative evidence to that fact, then the diploma requirement is mere pretext. The adverse impact burden shifting does too much… It is essentially impossible to show the business necessity of a job requirement after you have already applied that filter to your employees, since you can’t show how the rejected applicants would have performed. The best you can do is show that the accepted minority workers aren’t substantially *better* than similarly situated (vis a vis the requirements) nonminority workers.

        • Agreed, and above in our other thread you’ll see that I say that if it has a disproportionate impact on a protected class, then we must have proof that the connection between Calc II and the job is a valid one. I’m fine with putting the burden on employers to show that there is an important connection there. I’m not fine with defining the criterion to be “there is no disproportionate impact”. That’s just clearly wrong when you understand what it means.

        • I believe that such questionnaires simply don’t exist. Everything is correlated with everything to some extent. There are going to be only a tiny number of questionnaires that serve any useful purpose where males and females have very close to precisely the same frequency of giving the same answers.

        • Imagine that the questionnaire has ONE question. If it’s actually correlated with sex, we can fix that by saying ‘If you are a man, just answer ‘No’ with probability p, but otherwise, and in any case if you are a woman, tell me truthfully whether X’ – and choose ‘p’ appropriately to remove the correlation. So treated purely as a logic puzzle, rather than a useful tool, such questionnaires can surely exist. (And as a practical matter, I suspect
          we could make such questionnaires that are ‘good enough’ so long as nothing that is important to the job is really that sex-correlated – not true in every industry but surely so in most?)

          Now the question ‘X’ (sorry, this is political) might be ‘Are you homosexual?’, and imagine an evil employer who rejects anyone who says ‘yes’. But suppose he rejects a male homosexual (not even knowing that he’s male). If the world had been other as it it is, with the single change that that person is female, what happens? Well, one might argue (n.b. I wouldn’t), perhaps in that counterfactual world the person is still attracted to the same people (who are men), so they would answer ‘no’, and then – not then being homosexual – would be accepted. Voila, the decision to reject depended on whether the applicant is a man or a woman.

          I’m not a lawyer, but I don’t see how this doesn’t follow directly from the
          counterfactual based discrimination logic adopted by the majority in the recent Bostock case. It seems it’s now a title VII violation if ‘changing the employees sex would lead to a different decision by the employer’ even, apparently, if it isn’t intentional. And then the supreme court applies similar counterfactual as above, assuming specifically that in the counterfactual such sex-change might leave you attracted to the same gender as before. You could see ‘https://www.vox.com/2020/6/15/21291515/supreme-court-bostock-clayton-county-lgbtq-neil-gorsuch’ for one of many layman’s summaries.

          Whatever the merits of the conclusion, I personally think the logic is weird. But it seems like an extreme case of how a discrimination argument – that’s now law of the land – might in principle leave NO statistical footprints vis-a-vis the actually-prohibited discriminatory criteria (here being applicant’s sex).

        • > nothing that is important to the job is really that sex-correlated – not true in every industry but surely so in most?)

          I doubt this very much. There are all kinds of things that are correlated with sex and correlated with jobs, thereby rendering correlations between sex and job. Things like:

          1) Choice of major in college
          2) Choice of jobs to apply to
          3) Choice of hobby
          4) Choice of reading material
          5) Height, strength
          6) Dexterity
          7) Eyesight
          8) Hearing
          9) Experience with particular skills/tasks

          Let’s just give a simple example: we’re trying to hire someone to be a laboratory technician in a research biology lab, or a sonogram technician at a hospital, or a pharmacist, or a welding instructor, or a physiotherapist, or a boat repair person…

          do you really think that “having the skills required to be a welding instructor” is “uncorrelated with being male vs female?”

          Note, **correlation is not causation** there’s nothing about being female that necessarily in and of itself makes you less qualified to be a welding instructor… But if you randomly were to select a group of people who are all good candidates for welding instructor based on their knowledge, skills, and abilities as they are today… you’ll probably find that it’s highly correlated with their sex.

          This means the question “do you have at least 1 year of welding experience, and have you ever taught someone else to weld?” will be highly correlated with sex. So will “do you have experience doing cell culture and molecular biology techniques such as PCR?”

          Sure, these are kind of specialized skills based jobs, but I think it’s going to be the same for lots of less specialized jobs: skateboard and snowboard sales, firearms, fishing, and hunting equipment sales, haircutting and hair extensions, cornrow braiding, dry cleaning and alterations, shoe repair, wait staff at upscale restaurants… bank teller, television cable repair technician skills…

        • Daniel:

          You keep using examples of very specific criterion, which may be legal, but which are not consistent with the idea of being able to demonstrate a relationship between the a criterion and performance on the job. For example, why would 11 months of experience rule out someone from a job? Which it could be if it were the applicant were 1 item short of passing the initial round of screening. When crude measures are being used compensatory models are the only reasonable approach — note I said reasonable, not legal. Simply because something is legal, does not mean it is a good model of selection.

          The best job descriptions for positions spell out a number of different ways that certain skill might be demonstrated. The worst job descriptions make very specific requirements that are very unlikely related to job performance.

        • This is really orthogonal to my original point.

          But you are completely right anyway, but I really mis-spoke. What I had in mind (not that it’s so relevant) was the possibility of sex-correlated-indicators that are informative over and above the fact that I’m reading someone’s application in the first place. Agreed, I probably don’t need to read a single word of a welder’s job application to do far better than 50/50 chance at guessing whether the applicant is female.
          IF you are in and industry where sex discrimination is rampant, and IF there aren’t legitimate job requirement where the sexes inherently differ in ability, I don’t think that wanting screening questions that are more or less uninformative _in this sense_ is a silly thing to aspire to.

        • > You keep using examples of very specific criterion, which may be legal, but which are not consistent with the idea of being able to demonstrate a relationship between the a criterion and performance on the job. For example, why would 11 months of experience rule out someone from a job?

          I’m just trying to show that correlation in skills and abilities is rampant across the population, and that there are very reasonable concerns that an employer might have which would have adverse affects on one protected group compared to another when taken **as aggregate frequency with which the group experiences being filtered out on those criteria** thereby invalidating the idea that frequency with which something affects one group vs another can be a reasonable meaningful criterion for unfair discrimination.

          I don’t disagree with your point that *thresholds* are a bad idea, it just wasn’t the point I was trying to make.

          If I build a statistical model in which I ask you “how many months of welding experience do you have?” and “how many different people have you taught welding skills to before?” and add the two numbers together and just decide to interview the top scoring 10 people because I don’t have more time than that, I will dramatically discriminate against women in aggregate in terms of frequency which which they will get into the top 10. On the other hand, if I get two candidates with the same score, one is a woman and one is a man, the questionnaire does not have questions which specifically downgrade the woman’s score on the basis of her sex.

          I think it’d be hard to imagine a world in which a person trying to hire a welding instructor wouldn’t be allowed to ask those two questions. And yet it will be clear that *in terms of frequency* they will dramatically affect one sex more than another.

          Frequency can’t be the criterion for fairness.

        • Aleh: agreed that we want people to not discriminate *on the basis that someone is in a given group* (ie. we shouldn’t ask “are you a woman” on a welding job application). We also want people not to invent criteria which are principally informative of whether people are in certain groups rather than being directly informative about the ability to do the job (like, do you have hair longer than shoulder length, or do you wear jewelry, even though actually both of those could be safety issues for a welder, they’re not relevant to hiring, they’re more relevant to the safety training).

          That’s what I’m saying all along: it can’t be correlational and frequency of impact thats the relevant criteria for deciding unfair discrimination, it has to be about whether the criteria is relevant to doing the job, independent of the frequency in the population.

          I agree with Curious that we should demand people not create arbitrary thresholds and various other points that Curious has made. But at its core, asking relevant questions like about skills and experience, will be correlated with being in various protected categories. This can’t invalidate relevant questions like “how many months have you worked as a welder?”.

        • “For example, why would 11 months of experience rule out someone from a job? ”

          Because there is a sufficient number of candidates with substantially more that one year experience that also meet or exceed all other qualifications. The lower cutoff is the bargain basement experience level – if absolutely no one with more experience applies that level of experience **could** be acceptable, depending on all factors.

          If no one meets the qualifications, either the position will go unfilled or the job will be relisted with lower qualifications and the person with 11 months experience can apply.

          So it doesn’t really matter if it’s lowered one month, the person with 11mo experience would have to be extraordinary in every other capacity to be competitive. 11mo is not equivalent to 12 mo or 18 mo.

        • Jim, good point. I’m imagining if you are to teach welding, typical candidates may well have over 100 months of experience, so 12 month minimum is so far from competitive that the fact it’s a threshold and 11 months is not much different is really irrelevant.

          the big point I’m trying to make though is that everything you could ever legitimately ask about job qualifications is correlated with one or more protected categories. Unless your hiring decision is made by filtering the candidate pool to have the appropriate balance of all different races and creeds followed by selection with a random number generator… it will be illegal on any “disparate impact” criterion. disparate impact is the invention of people who don’t understand statistics

        • Jim’s rationale is a post hoc rationale and not a defense of a sensible approach to job analysis and mapping of skills sets to sets of criteria. It’s nonsense.

        • Jim’s rationale is the typical way sloppy job descriptions and skills requirements are generated and justified by people who seem not to understand how it serves to undermine their own goals.

        • If you are a hiring manager and you find yourself complaining that you don’t have enough qualified applicants from protected classes or complaining that there are enough “technically qualified” applicants to fill your many openings — Jim’s rationale justifying sloppy job description and criteria is likely why.

        • Curious:

          Many people have orders of magnitude MORE candidates than they could ever interview, they are the ones using these algorithmic systems to limit the 780 applicants down to the 15 they feel they have the time to interview. Any such massive level of filtering is going to disproportionally impact certain groups simply because the proportion of people who are highly qualified to do jobs is not equal across groups! This is the point I’m really trying to make. The ONLY way to have non-disproportional impact is to randomly select people from the SSN rolls, and then choose people by random number generator, that’s going to be uncorrelated with anything. Everything else will be correlated.

          In any case, I want to point out, since you seem quite upset, that I really do agree with you that we shouldn’t have stupid criteria that don’t serve a purpose other than to be exclusive… And continuous criteria should be used rather than hard cut-offs.

          But from a statistics/probability point of view making disproportional impact illegal means “making anything other than randomly selecting people using a random number generator illegal”. We obviously can’t be randomly selecting people to be airline pilots, boat mechanics, and trial lawyers. So I think we need a definition of fairness that works in the real world.

        • We obviously can’t be randomly selecting people to be airline pilots, boat mechanics, and trial lawyers.

          If social policy is determined by looking at averages and individual variation is treated like an “error”, then you eventually would get a world where everyone is pretty much the same clone of each other and tasks are randomly assigned.

        • Curious says:

          “Jim’s rationale is a post hoc rationale and not a defense of a sensible approach to job analysis and mapping of skills sets to sets of criteria. ”

          Curious: no one gives a hoot about “skill sets”. That’s ****FAR**** too complex for the average recruiter. They know ***nothing*** about the “skills” required for the job. The average recruiter is barely able to match job titles and count years of experience. OMG, and you’re asking them to generalize from three lines of a resume about someone’s “skills”? No chance in hell that’s ever going to happen. Never. Even the dean of an academic college can barely do more than count publications.

          Your perspective on what’s logical doesn’t account for the skill and knowledge level of the average business person or recruiter, much less the incentives that drive their choices or the problems they have to deal with.

        • “the big point I’m trying to make though is that everything you could ever legitimately ask about job qualifications is correlated with one or more protected categories.”

          Yep. In the end it would all be about manipulating the statistics to “show” what you want.

  12. I’m a little disappointed with the short no-answer because this is a huge issue in practice. Most of the ADMs used in public administrations throughout the world are based on the standard “train/test ML -idiom”. Neural networks and the like are often used. Now: if you consider something like the criminal justice system, yes, clearly, you will have to have “proofs”, whatever that means.

    A lot of emerging research in this area, I believe. Looking forward if you or someone else can find an actual paper to discuss (search terms: “bias”, “fairness”, etc.).

Leave a Reply to aleh Cancel reply

Your email address will not be published. Required fields are marked *