Retired computer science professor argues that decisions are being made by “algorithms that are mathematically incapable of bias.” What does this mean?

This came up in the comments, but not everyone reads the comments, so . . .

Joseph recommended an op-ed entitled “We must stop militant liberals from politicizing artificial intelligence; ‘Debiasing’ algorithms actually means adding bias,” by retired computer science professor Pedro Domingos. The article begins:

What do you do if decisions that used to be made by humans, with all their biases, start being made by algorithms that are mathematically incapable of bias? If you’re rational, you should celebrate. If you’re a militant liberal, you recognize this development for the mortal threat it is, and scramble to take back control.

You can see this unfolding at AI conferences. Last week I attended the 2020 edition of NeurIPS, the leading international machine learning conference. What started as a small gathering now brings together enough people to fill a sports arena. This year, for the first time, NeurIPS required most papers to include a ‘broader impacts’ statement, and to be subject to review by an ethics board. . . .

As Jessica discussed in her post, there are some interesting questions on how this will be implemented and, indeed, what are the goals of this new policy.

In his article, Domingos takes the discussion in two different directions, one having to do with politics and one having to do with the possibility of algorithmic bias. In this post, I won’t talk about the politics at all, just about the algorithmic bias issue.

It seems to me that Domingos got halfway to a key point—but only halfway.

The article is subtitled, “‘Debiasing’ algorithms actually means adding bias,” and its first sentence presupposes that decisions are “being made by algorithms that are mathematically incapable of bias.”

He writes of “complex mathematical formulas that know nothing about race, gender or socioeconomic status. They can’t be racist or sexist any more than the formula y = a x + b can,” but the key issue is not any claim that “a x + b” is biased. The point, I think, is that if “x” is bias, then “a x + b” will be biased too. In that sense, a formula such as “a x + b” can promulgate bias or launder bias or make certain biases more socially acceptable.

If x is biased, so that “a x + b” is biased, then you can debias the algorithm by first subtracting the bias of x. In practice it is not so easy, because it’s typically in the nature of biases that we don’t know exactly how large they are—if we knew, we’d have already made the connection—but there’s no reason why a “debiasing” algorithm can’t at least reduce bias.

OK, so here’s the half that I think Domingos gets right: “Bias” is relative to some model and some particular scenario. If “a x + b” is biased, arising from some biases in “a,” “x,” and “b,” we can expect these component biases to themselves vary over time and across contexts. A number, and thus the result of an algorithm, can be biased in favor of members of a particular group in some settings and biased against them in another. “Debiasing” is, by its nature, not automatic, and attempts to debias can make things worse, both directly and by adding another input into the algorithm and thus another potential for bias. One can draw an analogy to Keyensian interventions into the economy, which can make things better or can make things worse, but also complexify the system by adding another player into the system.

And here’s the half that I think Domingos gets wrong: He’s too sanguine about existing algorithms being unbiased. I don’t know why he’s so confident that existing algorithms for credit-card scoring, parole consultation, shopping and media recommendations, etc., are unbiased and not capable of outside improvement. I respect his concern about political involvement in these processes—but the existing algorithms are human products and are already the result of political processes. Again, his concern that “progressives will blithely assign prejudices even to algorithms that transparently can’t have any,” is missing the point that the structure and inputs of these algorithms are the result of existing human choices.

I don’t know what’s the best way forward here—as I said, I think Domingos is half right, but his discussion is ridiculously simplistic, unworthy I think of a professor of computer science. In his article, he writes that he “posted a few tweets raising questions about the latest changes — and the cancel mob descended on me. Insults, taunts, threats — you name it.” I hate twitter. I recommend he post his arguments not just on twitter (where discussion is little more than a series of yeas and nays) or on a magazine website (which will either have no discussion at all, or have discussions that degrade into insults, taunts, etc.) but as a blog, where there can be thoughtful exchanges and he can engage with these issues.

To return to the title of the present post: I see two things going on here. First, the term “bias” is vague and can mean different things to different people—we’ve discussed this before. Second, I suspect that Domingos is following the principle of “the enemy of my enemy is my friend,” which as we know can lead us into all sorts of contortions in a nonbinary world.

127 thoughts on “Retired computer science professor argues that decisions are being made by “algorithms that are mathematically incapable of bias.” What does this mean?

  1. Often, we take for granted the rules that govern the current system. I work in the “credit-card scoring” adjacent field. I often tell people about how credit scores started (https://www.lendingahand.com/2010/03/where-the-heck-did-fico-come-from/) and it is hilarious. Especially when you tell people who lived through Welcome Wagons but had no idea what their purpose was.

    Can you really imagine allowing companies sending fake Welcome Wagons to new home owners and making lending decisions based on how clean the living room is? But we take the FCRA for granted, since this was problem was addressed decades ago. It also makes things like China’s Social Credit System seem less strange.

  2. I believe it’s a very challenging area.

    For instance, bias consideration in machine learning often involves “fair” predictions “… proposing definitions of fairness and creating algorithms to achieve these definitions,… Others have proven fairness impossibility theorems, showing when different fairness constraints cannot be achieved simultaneously. For instance, the two fairness definitions at the heart of the debate over COMPAS’ fairness (calibration and balance for positive/negative class) cannot be achieved simultaneously in nontrivial cases” https://arxiv.org/pdf/2005.04176.pdf

    I have found in the past, the folks promoting improvement in ethical considerations in science often lack the technical understanding of the science involved and how requirements meant to make things fairer most likely will make them worse. So initially its a get bet that they will do more harm than initially.

    The first instance I encountered was an ethicist who insisted that any clinical trail conducted with less than 80% power was totally unethical who understood little if anything of how power was guestimated or what it even meant.

    Also explains why so many people can get onboard so fast – very little training is required and any field of science can be jumped into.

    • Also work in ML, have the same impression. Andrew, you are just incorrect about the meaning of “bias” that people have in mind. Or, more precisely, they mean anything and everything by the word, so long as they can use some vaguely associated chain of mathematical reasoning in order to justify their preordained political meaning.

      I care a lot about getting machines that are good at doing what we want to do, and avoiding unintended harms to groups that don’t deserve them. But it’s precisely because I care about the topic that popular discussion of bias in algorithms is so infuriating.

  3. Oof. That op-ed reads as more than half political (it might be more efficient for him to use “liberals” when referring to militant liberals, and to reserve the phrase “non-militant liberals” for a situation in which he encounters one of those) so my estimate of how much it gets right starts off lower.

    Beyond that, though, his characterization of “debiasing” is silly. If you have a biased system, the objective should be not to simply correct the output (“ensure that the same number of women and men are accepted” for a credit card or change parole recommendations “for the sake of having a proportional number of whites and blacks released”). The goal should be to identify and correct for the sources of bias, which should have the effect of more equitable decisions by an algorithm. This may sometimes be impractical because to do so would require data that isn’t available. That’s the time to evaluate whether you should be relying on an algorithm to make your decisions for you.

    • > The goal should be to identify and correct for the sources of bias, which should have the effect of more equitable decisions by an algorithm.

      Why _should_ that happen? If a car insurance pricing algorithm is biased and results in lower premium for women, is the objective to have “a more equitable decision” or not? The more direct solution in that case is to forbid differential pricing based on gender.

      • “The more direct solution in that case is to forbid differential pricing based on gender.”

        Yep, sure. But removing gender as a factor in determining insurance rates is not the same as adjusting the output of the algorithm to ensure that men and women are paying the same average rate. The latter sounds more like the op-ed’s characterization of debiasing, which is what I find silly.

        • If the algorithm is said to be “biased” because it gives different results for men and women, what would “debiasing” mean other than adjusting the algorithm to ensure that it gives the same result?

          Maybe that’s not your definition of “biased”. It surely isn’t the best definition of “biased”. But it’s not an unusual definition.

          Last year there was some controversy about Apple’s credit card, for example: https://www.wired.com/story/the-apple-card-didnt-see-genderand-thats-the-problem/

        • And yet that’s exactly the definition of bias used in critiques of algorithms (Carlos is right). Gender need not be a predictor in the model to produce unequal results by gender. That’s not a problem (for some of us, but others may still object because they view disparities as discrimination rather than as possible indicators of such) if the key predictor, say riskiness, in fact varies by gender, but what’s striking is that unequal results by gender *conditional on equal riskiness* can result from a model excluding gender (e.g., https://www.nber.org/papers/w28222 ). Measurement and modeling remain complicated.

        • It’s not my definition. I agree that there’s a lot of simplistic thinking about this, and it gets tricky fast. Beyond gender-correlated driving habits, what about differences in choice of vehicle or credit history, either of which could correlate with gender if it involved individual choices that reflect unequal constraints? Spoiler: I don’t have a concise answer.

          Where I was going with my initial comment, though:

          Actuaries come up with your rates by making assumptions about you given the best data they have, which in many cases isn’t very good (I didn’t become a better driver on my wedding day), but from the insurer’s perspective it’s better than nothing. This can be unfair at an individual level in all sorts of different ways.

          Our criminal justice system is a better example of why we should be concerned. Racial bias shows up in laws, in prevention, in your likelihood of being arrested, and in what happens to you once you are, and in your likelihood of future scrutiny/arrests. The individual unfairness can have a huge impact on people’s lives, and it compounds itself. I’m therefore skeptical of any claim that an algorithmic parole recommender is free from racial bias.

          So, fundamentally, I’m agreeing with Andrew’s comment about Domingos being too sanguine about algorithms being unbiased, but beyond that I think we need more skepticism of predictive algorithms as a class. When I run into this topic the conversation always seems to be about fixing the algorithms, but in many cases I think we’d do better to take responsibility for our decisions rather than to delegate them to tools we don’t understand.

        • Jeff said,
          “So, fundamentally, I’m agreeing with Andrew’s comment about Domingos being too sanguine about algorithms being unbiased, but beyond that I think we need more skepticism of predictive algorithms as a class. When I run into this topic the conversation always seems to be about fixing the algorithms, but in many cases I think we’d do better to take responsibility for our decisions rather than to delegate them to tools we don’t understand.”

          +1

        • So what do you think of the reverse: studies that show disparate outcomes when controlling for variables? Have they proven implicit bias in decision makers? Or are they missing factors that went uncoded in the analysis, but affected the decision makers’ judgment?

        • >The latter [adjusting the output of the algorithm to ensure that men and women are paying the same average rate] sounds more like the op-ed’s characterization of debiasing, which is what I find silly.

          Yet this is the primary objective of “equity” based definitions of fairness. It seems like it would be much more effective to attack this insipid definition of equity, than to quibble about how it is factored into an objective function.

  4. If we want a deep learning system to recognize humans in an image, and we train it exclusively on images that contain white people, then we shouldn’t be surprised if the system becomes unable to recognize non-white humans in an image.

    Here, the bias comes from the training data. In many cases, it may come from the way the training goals are defined. For example, the training goals may train for averages, with the result that the system’s decisions can’t properly apply to individuals who depart from those averages. Or the training goals may themselves be biased.

    It’s just like researchers’ degrees of freedom, but probably harder to figure out in any particular case.

    I also notice that there seems to be widespread confusion between an “algorithm” and the system arrived at by applying that algorithm. For deep learning networks, the algorithm is a method for consuming data and developing parameter values that cause the outputs of the system to meet some criteria. An actual deep learning system is a particular set of network parameters that the algorithm has found. The two are not the same.

    • +1 We have a Alexia. Apparently, Amazon trained its voice recognition without using very many second language speakers. My wife is Chinese. English is her second language although she speaks perfectly well just with a distinctive accent. Alexia often won’t follow her instructions. She says something several times and then the white guy has to repeat it. It really ticks my wife off. Imagine when machine learning systems are ubiquitous. People who think that debiasing these systems is just more woke nonsense are just kidding themselves.

      • One distinction I haven’t seen in the comments is the distinction of deep learning technologies for consumer products, and use of deep learning technologies, for example, in law enforcement or government.

        In the Alexa example, I just see this an an engineering problem. If tech wants to build bad models, and have their technology not work on foreigners, by all means, build a crappy product. I won’t buy it, hopefully word will get around and others won’t buy it as well.

        Same thing happens in China. Technologies for example, in computer vision don’t work as well on foreigners. It’s an inconvenience for me, but I don’t really care, because it’s not affecting my life too much besides wasting my time.

        The problem arises when deep learning technologies are used in government and law enforcement, or in autonomous vehicles can affect people’s lives. Some people, who aren’t computer scientists/statisticians, read hype tech literature and think it’s the truth, so they will believe a machine before themselves. Critical computer scientists don’t. There should be some kind of systematic vetting or regulation before we allow these technologies to impact lives. Computer vision shouldn’t be allowed to make arrests. We should have some limits or tests on autonomous vehicles before putting them on the road with average citizens, or at least have someone behind the wheel. The vetting would need to be someone with a CS background and has built these technologies before, but sensitive to social issues like this.

        Do I think we should regulate scientific research in computer vision? No! Let it flourish. We already have too many practical limits that makes research difficult. It should get more support, especially before we start putting it on the street.

    • It’s not just about the training set, though. Let’s say you have an algorithm that makes a heavy use of edge detection. It will have an easier time with any image that has sharp contrasts with both bright and dark areas, compared to one with many smooth gradients and similar colors. Then add to it the fact that the camera sensors, films, printers and even image file formats were fine-tuned for many decades to reproduce white skin tones well, and even with the most unbiased diverse training set your algorithm will still have harder time with some groups of faces. You can of course argue that it is then a bad, buggy algorithm, and I agree, but the problem is that the bug will demonstrate itself by what will effectively be a bias. In such case, debiasing would really be a specific kind of debugging, although I think that debiasing is much more general than that.

      • The whole history of skin color in photography is such a good example of how inequality is reproduced not by individual intention but by history and context. Some people really struggle with the idea that racism is not only about intentional individual actions, and that’s when they get so distraught about critiques that are really about structure.

  5. Domingo’s stand is that the set of instructions is unbiased.

    In Domingos’ analysis, the term algorithm refers to the set of instructions that process the input data and produce an output. Perhaps a little context about this discussion in the ML community could be useful:

    If x is biased, so that “a x + b” is biased. Yes. But in Domingo’s terms, the algorithm is the set of instructions “take the input, multiply it by a, then add b”. The expression “a x + b” is the complete system (algorithm+data).

    • But this isn’t addressing anyone’s actual concerns. At my job, which is not deep learning based, I cannot say “My algorithm works perfectly on unbiased iid data. Yes I knew ahead of time that we didn’t have that, and I knew some of the ways in which our data is biased that the algorithm isn’t accounting for, but you can’t blame me. My algorithm is perfect on its own terms”. A segment of ML practitioners want to be able to make this argument, and seem to not realize how hilariously unprofessional it makes them look. Dealing with the real world requires paying attention to the way your data was actually formed.

      Maybe a bit more to the point, does Domingo think that humans are not also running an algorithm? If humans are biased and algorithms are always unbiased, is he making some brave philosophical stance here? Can he outline the thing that he thinks causes human bias that cannot be happening in his algorithm? This whole thing just reeks of sloppy thinking.

      • “People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid and they’ve already taken over the world.”

        -Pedro Domingos

        I really think he’s much more sympathetic to the applied challenges than this article suggests. I think he’s just too vague about his real target. It does SEEM like he’s directing his critique at the whole of fairness, robustness, and ethics in machine learning, but I think he’s more specifically against twitter mobs and clumsy non-discrimination legislation. Cynically, you can say he’s being intentionally vague and provocative so he can get lots of cancellation attention; optimistically, he’s just being angry in response to other people being angry at him.

        • Well that would make more sense then. I certainly agree that many attempted critiques of algorithmic bias do a bad job at pinpointing the underlying problem (e.g. frequently things are blamed on “bias” that are really causal inference issues that will apply to the model no matter how you try to ‘correct’ it). The attention paid also seems to be focused on relatively specific issues that are popular among some academic circles but isn’t really proportional to where the problem is likely most severe. The same underlying technical issues that the ML ethics community is concerned about are almost certainly contributing to the role of social media in increasing political radicalization, but that angle on the problem gets less explicit attention. If this is the kind of thing that Domingos is upset about then I can certainly understand where he’s coming from.

    • That’s really not what he is saying. If he were saying doing arithmetic is not biased that would not be getting any attention. He is saying that people are wrong for questioning where the a and b came from, that we should not care that a and b produce results that reproduce inequality (and that is probably due to the methods used to estimate a and b).

      https://www.insidehighered.com/admissions/article/2020/12/14/u-texas-will-stop-using-controversial-algorithm-evaluate-phd

      “Within the system, institutions were encoded into the categories “elite,” “good” and “other,” based on a survey of UT computer science faculty.”

      First gen students, students attending the most common kinds of colleges, students who couldn’t afford to board at a college, students at HBCUs and HSIs need not apply.

      But the calculation of the applicants’ scores was perfect!

  6. It seems to me that Domingos is committing the same error I see in many pop-science depictions of machine learning in that he conflates at least two possible definitions of the term “algorithm”. Broadly speaking, the term “algorithm” refers to a procedure for converting givens that have a certain form to a result that has a certain form. But in this sense, “algorithm” can be applied to machine learning in two ways:

    First, “algorithm” describes the training procedure; this includes things like backprop, A-star, gradient descent, batch sampling, etc., all the stuff that tends to fall in the purview of traditional computer science.

    Second, you can say that a trained model represents an “algorithm” for converting an input vector into an output (e.g., a classification, recommendation, probability, etc.).

    So far, I’m just echoing Andrew’s main point: “Algorithm” in the first sense refers to abstract and agnostic procedures that get you from A to B (or X to Y). In this sense, an “algorithm” is not so much “unbiased” as it is “abiased”—it doesn’t really make sense to use the term “bias”, since by this understanding, the algorithm isn’t actually *doing* anything, it is just a set of instructions on a sheet of paper or text file or circuitboard. This might be what Domingos means when he claims that an algorithm is “mathematically incapable of bias”, though “mathematically incapable” is not the right way to say it, since it implies that you can prove something about the bias the algorithm would produce when the term “bias” doesn’t even apply to an algorithm considered from this perspective.

    But here’s where I’d elaborate a bit beyond what Andrew said in the main post in a way that captures Domingos’ error more fully: The second sense of “algorithm” refers to the trained model. So in this sense, the “algorithm” is not “y = ax + b”, it is “{y_1, y_2, …, y_n} = 5.40 * {x_1, x_2, …, x_n} – 45.33”. Bias arises in how the (x, y) pairs are chosen both during training AND in application, as well as how the model parameters are estimated on that basis. The point is that there are human choices being made at every step of this process: What input features are relevant? What instances should we include in training? What are the types of instances we expect the model to be applied to in the future? What algorithms (!) should we pick to estimate model parameters?

    To claim that the resulting model is free from bias because, once trained, it does not require human intervention makes as much sense as saying that a chair is “mathematically incapable” of collapsing because saws, hammers, glue, and nails are all well understood. “Algorithms” in the first sense are just tools sitting in the shed, and it doesn’t make sense to use the term “bias” in describing them. “Algorithms” in the second sense are chairs, they are the trained models meant to be used, and they inherit any biases that result from the long chain of human decisions that go into their construction. Domingos is trying to ascribe properties of the first sense of “algorithm” to the second sense, and it just doesn’t work.

    The failure to appreciate all the choice points and associated pitfalls is similar to how cargo-cult social priming researchers defend their work: “We used randomization and statistics, so our conclusions must be accepted because that’s what we were told in statistics class.” Domingos’ error also, I suspect, arises from an educational failure: As a computer scientist, he is probably more familiar with the term “algorithm” in its first sense, but now hears statisticians and machine learning types (and the popular media) using “algorithm” in the second sense and assumes they refer to the same thing. In fact, they are different types of entities entirely, and I think we need to be much more careful in how we use these terms and encourage our students to be careful as well.

    • I don’t agree with your claim that “human choices are being made”. The whole point of the procedure is to eliminate poor human judgement! Human choices are eliminated or dramatically reduced because tens or even thousands of models and variables can be tested to maximize the unbiased outcome.

      • I don’t agree that the whole point is to eliminate poor human judgment.

        Maybe that is the goal sometimes, and maybe it is even achieved sometimes, but to the extent that practical applications of machine learning have a “whole point”, it is to make decision making more efficient. This way you can make many more decisions per unit time and/or free up time that would otherwise have been used for decision making. Instead of making better individual decisions, you reduce the cost of each individual decision so your errors don’t matter as much.

        Human judgments figure in two ways:

        1) As I said, they are the source of the model that is ultimately applied. As a result, when the model is applied, it is following a procedure that is the outcome of a chain of human decisions which may or may not be any good.

        2) There is an implicit human choice in deciding to use the model at all. Every time we get an output from a model, we have to decide whether to accept it as is, to adjust it based on other factors, reject it, or tweak the model, among many other possible outcomes.

      • That just strikes me as silly. The fact that a computer might be testing all these models and maximizing something does not mean no “human choices are being made.” Relying on that mechanism to produce a model is a human choice. As a semantic matter, perhaps gec is wrong to say “human choices [are] being made at every step of this process.” If you automate the process sufficiently, you might reduce the human choices to only the first step of choosing that process. But that is a trivial difference – it is still a choice to not intervene in that process at each step. I found gec’s distinction of the two uses of the word “algorithm” both helpful and right on target.

  7. I don’t think this article is quite as bad as all that. He does acknowledge

    > Why the fuss? Data can have biases, of course, as can data scientists. And algorithms that are coded by humans can in principle do whatever we tell them.

    Now, in my personal reading of literature on fairness in machine learning and statistics, the research focus has been on these elements that he acknowledges. I don’t know of any work that presupposes his straw-man of literally racist objective functions. But maybe his “whole subfield of AI” exists somewhere I haven’t seen. Anyways, he does acknowledge that machine learning can be misapplied. To say “the algorithm provably produces an unbiased estimator, it’s just the data and applications that are problematic” feels a bit like playing a semantic game — everybody already knows that, we’re just speaking in shorthand, but maybe there really is some research subfield out there where people need to hear that minor distinction.

  8. Andrew: I disagree with your suggestion that bias is hidden in X. Whatever X is, it’s specifically selected to be a *profit* maximizer. By definition, it’s not biased.

    The simple reality is that all the incentives are against bias. Why would any company select – for example – a loan qualification algorithm that delivers lower profit? After all, if Chase provides a loan to someone Jamie Dimon considers undesirable, he won’t be living next door to them. What incentive does he have to exclude profitable customers? None!!! Yet there are very strong incentives to select against bias.

    If you want to say, “Well, Mr. Tall has a poor credit history because historically Talls have been discriminated against”, you can make that claim. But Mr. Tall still has a poor credit history. But if that’s the “discrimination” hiding in X, I’m all for it. Mr. Tall can go to the government and get is special Historically Discriminated Against loan. But companies shouldn’t be forced to provide it.

    • All of the incentives are in favor of not running into segmentation faults by accidentally dereferencing a null pointer but it happens in enterprise software anyways. You can also make the argument that “that’s just programmer error, computers don’t make mistakes, they just do what the programmer tells them to, and programmers are incentivized to not crash computers, therefore all research into eliminating segmentation faults is pointless.” I guess, but the point is that in many cases the ability to dereference null points and the NULL value as a value for every type was a tricky interface to begin with — Tony Hoare famously called it his billion dollar mistake. Hence, some research was done on other strict typing systems and now many languages like scala have a monadic optional type instead of a NULL, and now systems based on scala don’t get null pointer exceptions.

      Long story short, even if incentives push against bias, people make mistakes and systems can be unwieldy. Much of engineering research involves designing better systems where mistakes are more difficult to make or which are more robust to mistakes. Incentives aren’t a magic stick that means all problems are automagically already fixed — the point is that incentives pressure people to do valuable work, which is exactly the kind of work you’re saying shouldn’t be done.

      > The simple reality is that all the incentives are against airplane crashes. Why would any company invest money in – for example – an airplane that flies once and crashes?

      Oh wow, therefore I guess all that research into building airplanes that don’t crash should never have been done — the airlines simply would have chosen to not crash because it saves them money anyways.

      • To take a more direct example, examine the gaussian copula

        https://en.wikipedia.org/wiki/Copula_(probability_theory)#Quantitative_finance

        Their estimator was “unbiased”, but the structure of the model didn’t allow for asymmetric interdependencies between upside and downside regime behavior and the only data that had been available were those during a boom housing market. The incentives were all in favor of correctly appraising downside risk — that’s how they can most profitably price derivatives, but they used the copula anyways and it blew up in their and all of our faces.

        • That’s just false. The incentives were most definitely not in the direction of correctly appraising downside risk, everyone makes more money if you don’t.

    • There is a long history in economics of saying that “discrimination contains the seeds of its own demise,” meaning that discrimination generally reduces profits so competitors who do not discriminate will eventually out-compete those who discriminate. That argument has never been accepted by sociologists (at least the ones I have known). Economists have attempted (with varying success and failure) to explain why discrimination continues to exist – is it a failure of competition, does discrimination really exist, or can we invent (sorry, I mean describe) reasons why discrimination is actually profitable.

      While the standard economics approach seems to fail to describe reality, it is not that easy to just dismiss it. As jim says, “all [perhaps it should be ‘most’] the incentives are against bias.” It may be worth looking at those incentives. However, it isn’t clear to me that bias in an algorithm reduces profits. Bias in insurance rates might show up as more systematic misclassification (of risk) for a particular racial group relative to another. If that group is sufficiently small, then it may be profitable to trade such errors for increased accuracy assessing risk in the larger group. If you have to trade off accuracy in the two groups (which you often have to do in classification models), then the effect on profits isn’t clearly negative.

      • Dale writes: “There is a long history in economics of saying that “discrimination contains the seeds of its own demise,” meaning that discrimination generally reduces profits so competitors who do not discriminate will eventually out-compete those who discriminate.”

        It’s amazing to me that this type of economic thinking persists. We can make the same argument about any number of sub-optimal behaviors that persist. Why do business break their promises? Why do people take unnecessary risks? Courts are filled with disputes that both sides have an incentive to resolve swiftly. Why don’t they? Even if at the margin people have an incentive not to discriminate, intra-marginally they are not under any pressure to stop their bad behavior. And, there is no reason to think a priori that an economic player cannot persist with his bad behavior for his entire life before the economic incentives knock him out of the market. Then, no one learns the lesson. Look at biology, there is an “incentive” to survive, and that incentive does result in change just over geological time scales. Arguing that because there is an incentive not to do X, therefore X doesn’t exist is specious.

        • > It’s amazing to me that this type of economic thinking persists. We can make the same argument about any number of sub-optimal behaviors that persist

          Hell, this kind of reasoning makes the entire idea of a business innovation completely impossible. It’s impossible that someone came up with a new, more efficient way of structuring firms — that would imply that for a long time, people were doing things inefficiently, and doing things as efficiently as possible is profit maximizing. If you want to be really cheeky, you could even argue that every profitable technological innovation should have been made spontaneously at the dawn of free-markets, as any later invention implies firms giving up free lunches. No mistakes, ever.

        • While I agree in spirit with what you are saying your reasoning about innovation is not quite apt. The economic dogma Dale introduced has always been applied to inefficiencies within the existing production possibilities frontier (as far as I am aware). By definition innovation moves the production possibilities frontier outwards. It does not nor never has led to the absurd conclusion that you posit (that innovation in a market economy should have been fully exploited at t0).

        • There is this economics joke: how do you know who the economist walking down the street is? It’s the one who does not pick up the $20 bill, saying that if it was real, someone would have already picked it up.

          Note that I am not endorsing the traditional economics view of discrimination. It is a classic case of believing in a model at the expense of reality. However, that model is not so easily dismissed as you might think, unless you have no faith in competition. Competitive forces should undermine practices that are clearly inefficient, which overt discrimination should be.

          In any case, the situation with AI is different than discrimination. I don’t see how discriminatory application of AI is generally ineffficient. Again, if you use the case of a classification model, when the classes are unbalanced (which they usually are), it can easily be profitable to use an algorithm that differentially produces errors across groups – in such a case, competition would only enhance its use.

        • > However, that model is not so easily dismissed as you might think, unless you have no faith in competition. Competitive forces should undermine practices that are clearly inefficient, which overt discrimination should be.

          I agree with a trust in competitive pressures to push firms in a certain way in the long run. However, what Jim said is quite difference. He disagreed that “x” can be biased when x is an abstract representation of any covariate, simply because the estimators are being used to maximize profit and bias is unprofitable. To contradict the position that ANY abstract x COULD be biased on this basis is to have the absurd belief that firms can never make a profit unmaximizing decision or mistake. It’s a ridiculously strong claim.

          I also agree that unfair systems can be profitable for lots of reasons, I just wanted to harp on the silliest bit I read

      • “There is a long history in economics of saying that ‘discrimination contains the seeds of its own demise,’ meaning that discrimination generally reduces profits so competitors who do not discriminate will eventually out-compete those who discriminate”.

        Yes, but this happens in highly stylized models that are not intended as descriptions of reality. They are useful in the sense that when we see that discriminators are not out-competed in reality we can start to ask what do we have to add to the model to get this result and describe reality better.

    • Why should a company select, etc.? Because the company is run by biased people. Example, some years ago the Civil Rights Commission sent a bunch of people to apply for vacant apartments at buildings owned and operated by Trump. They all had good jobs and good clothes and credit histories and public records. All the black ones were refused. All the white ones weren’t. I don’t think a computer program was used, but a program could easily be written to do the same thing.

      Maybe that was indeed a profit-maximizing strategy, on the assumption that if we let blacks in we’ll get fewer future applicants, but it was a short-sighted one. Enough of that sort of thing and society fails and nobody makes a profit. It is also a profit-maximizing thing, in the short run, to cheat your customers.

      I guess the key point is whether or not you should have to consider the effect of your algorithms on other people and upon society, or be free just to maximize your own profits.

    • jim, you say “Whatever X is, it’s specifically selected to be a *profit* maximizer. By definition, it’s not biased.”

      That’s almost funny. Sure, you can define the product of your algorithm to be “unbiased” and then by definition it is unbiased. Agreed!

      It’s quite possible that the Red Sox maximized profits by failing to add a black player until 1959: the fans didn’t want black players, perhaps to the extent that they’d rather lose than have the team play black players, and maybe if the Sox had added a black player in, say, 1953 they would have made less money.

      By your definition, the decision not to hire black players was “unbiased”, but perhaps you can see why a lot of people think that’s a problem. If a superior baseball player is denied a chance to play for a team merely because he’s black, you and Domingos can say there’s no problem there, but most of the rest of us think there is.

    • Whatever X is, it’s specifically selected to be a *profit* maximizer. By definition, it’s not biased.

      The simple reality is that all the incentives are against bias. Why would any company select – for example – a loan qualification algorithm that delivers lower profit? After all, if Chase provides a loan to someone Jamie Dimon considers undesirable, he won’t be living next door to them. What incentive does he have to exclude profitable customers? None!!! Yet there are very strong incentives to select against bias.

      Except we have reams of evidence that people and organizations are not rational profit maximizers. Businesses throughout this country refused to serve Black people until the federal government made refusal illegal. And this wasn’t just the South. The Green Book started as a reference to Black friendly businesses around New York City. Most people place greater emphasis on their social identity than on their economic self-interest; especially if the trade off is simply making less money rather than losing money.

      Moreover, it is difficult for people to recognize their own biases. And given that bias is endemic to human culture, inputs will be biased and it will be especially difficult for non-diverse organizations to recognize it. Using models that correct for unconscious and societal bias provides a counterweight.

      For a completely non-political example, the exposure system in non-professional cameras overweighs the bottom portion of an image* because people tend to place their subject towards the bottom of the frame, instead of the center. Camera companies didn’t try to teach the masses to overcome their bias; their engineers (outside experts) adjusted the exposure systems to compensate for it.

      * This is a simplification in the days of computational photography, but, back in the days of film point-and-shoots, it was a pretty basic curve across the lower two-thirds of the frame.

      • While I agree with the fact that people can still be racist even when it is not in their direct financial interest (for instance, losing customers), I would also say that, contrarily to what is usually thought, statistics (let’s drop the misleading “algorithms”) can be a good way to counter that.

        For instance, I can well see how a racist shop owner may refuse a prospective client he is prejudiced against. I find it much more dubious to suppose a big institution would take pains to manipulate a scoring system in order to exclude those people.

        By constraining the human judgment, we are decreasing bias, not increasing it.

  9. Noah Smith wrote a good article last week that addresses the ethical questions of algorithmic bias ( https://noahpinion.substack.com/p/vaccine-allocation-age-and-race ). It was about the priority of Covid-19 vaccination, but applies more generally. He makes the important point that “race-neutral ethics can imply race-sensitive policy.” In the case of vaccination, if we start from the premise that all lives matter equally, regardless of race, then, since we know that the virus is more lethal for Blacks than for Whites, we should prioritize Blacks over Whites. OC, the question is more complex, and Smith addresses the complexities.

    • Think about this. People are trying to understand why there is such race disparity in Covid outcomes.
      Well blood oxygen levels are used to determine treatment.
      “Thus, in two large cohorts, Black patients had nearly three times the frequency of occult hypoxemia that was not detected by pulse oximetry as White patients. Given the widespread use of pulse oximetry for medical decision making, these findings have some major implications, especially during the current coronavirus disease 2019 (Covid-19) pandemic.”
      https://www.nejm.org/doi/full/10.1056/NEJMc2029240

      Well then not at all surprising that treatment decisions based on these levels lead to differential outcomes by race even if the medical staff followed the instructions exactly.

  10. I think this post might benefit from a little more context from someone in the AI community. First, while I don’t know that Andrew meant it this way, some might read “retired computer scientist professor” specifically in this context as somewhat pejorative. Right or wrong, Pedro is an extremely qualified voice, very senior and (setting aside current controversy) respected in the discipline.

    Second, Pedro is making a distinction between algorithms, and the data. That is a totally reasonable distinction, if not ironclad. I also can say with substantial confidence that Pedro would agree that an algorithm fed biased data can/will give rise to biased outputs.

    Much of what Pedro seems to be concerned with in this article is a movement of growing power that wants to explicitly add to the mathematical formulations (algorithms) of our AI systems fairness objectives that ensure certain balances of outcomes. This is well known to have exactly the properties Pedro describes, i.e. a reduction in a global performance or accuracy metric of the system in exchange for the balance property. The reason is simply that when you add constraints to an optimization problem, you get a reduction of 0 or more in the quality of the optimum relative to the unconstrained problem. In practice, the reduction is almost never 0.

    I also agree with Pedro that algorithmic decision making presents tremendous opportunities for a more just society. Algorithms can be audited to a degree that a human decision maker just can’t. Counterfactual analysis is often nearly trivially easy as well.

    My own instincts are mostly in line with Pedro’s. I’m uncomfortable with explicitly coding some sort of parity of outputs into our algorithms. It’s too easy to get very wrong, and a very slippery slope to structuring decision making to whatever favored group gets to decide on loss functions.

    Where I think Pedro is somewhat more wrong is that detecting biased data, debiasing it or collecting better datasets, etc. will itself be an algorithm driven process. Still, I’m a lot more comfortable with that.

    • Mlscientist:

      I did not mean “retired” to be negative. I hope to be a retired statistics professor some day! I described the author of that article as a retired computer science professor because that’s how he was listed in the published article.

      Regarding your technical point, I disagree with your framing in terms of adding constraints to an optimization problem. First, not all algorithms are optimization problems. Second, there can be lots of dispute about what should be optimizing. Third, in real life we regularize our solutions to get more stable estimates. We do Bayes or lasso or variational inference or whatever, we don’t do least squares. But the bigger problem, I think, is the assumption that whatever happens to be currently done, whatever it is, happens to be optimal. Again, just cos your algorithm computes “y = a x + b,” it doesn’t mean that “a” and “b” are well estimated, it doesn’t mean that “x” is well measured, and it doesn’t mean that “a x + b” is what you actually want: it could just be a conventional solution that someone programmed in way back when, in the same way that we use logistic models etc. by default.

      I do some work in survey research. If someone were to tell me not to adjust surveys to account for data imperfections because current methods used unbiased algorithms, I’d just laugh. Actually, the president of the American Association for Public Opinion Research said something like that a few years ago, and I did laugh. Sometimes very senior and respected people can be too strongly committed to the way that things have been done in the past.

      • > I did not mean “retired” to be negative. I hope to be a retired statistics professor some day!

        You may or may not be aware, but he’s actually younger than you :-)

        > I described the author of that article as a retired computer science professor because that’s how he was listed in the published article.

        As far as I can see, the article says “professor emeritus”. More glamorous than “retired”.

        • I don’t get it: Both “retired professor” and “emeritus” just tell me that a person did the job of being a professor and that they are no longer doing that job. What else are these terms supposed to evoke?

        • gec said,
          “Both “retired professor” and “emeritus” just tell me that a person did the job of being a professor and that they are no longer doing that job. ”

          It can also mean that a person has stopped doing for pay many of the things involved in being a professor, but continues to do some of them gratis.

  11. Glad you made your comment a post!

    I think your comment on debiasing adding complexity is right, though I see it more as adding assumptions to an already complex system. The more I dig into it, the more fraught I find assumptions in the algorithmic bias literature. There are a lot of ideas about how to intervene to make things more equitable that might seem intuitive, but are not theoretically grounded, and therefore really can make things worse in ways that are hard to predict. It’s hard to say what’s worse, not even thinking about bias caused by your algorithm, or thinking you know how to fix it and overstepping your actual knowledge of the situation? I see a lot of uncertainty that’s hard to reduce.

    There have been a few really good papers though that address different types of confusion head on. Jon Kleinberg and Sendhil Mullainathan have a line of work on this topic that’s very useful; here’s just a few that address problematic assumptions in the debiasing lit: https://sendhil.org/wp-content/uploads/2019/08/Publication-3.pdf, https://arxiv.org/abs/1809.04578. I also like Corbett-Davis and Goel on the difficulty of achieving certain fairness ideals.

    Also related to contextual complexity, in many domains people oversee the use of model predictions (gec brings this up above too), and have some ability to step in, which means we should be studying not just how the algorithm does versus how the person would do without it, but also how they do together. There’s a growing amount of research on this in CS, and it gets pretty complicated.

  12. My favorite example here (and it’s my favorite because it gets away from the sticky politico-social questions like the treatment of race) is the BCS football ranking algorithms. The NCAA outsourced part of its selection of the four best college football teams to some private computer algorithms. It should be intuitively obvious that score differential is an important predictor of whether team A is better than team B. But, once it is known that score differential will be important to the algorithms, coaches had much enhanced incentives to “run up the score.” Gaudy score differentials were influential with human voters as well (another component of the system) but there was no way to stop voters from using score differentials, but, by cracky, they could stop the algorithms from doing so, so they did. (The reason is that running up the score is deemed unsportsmanlike.)
    The algorithms, thus crippled, became less accurate than they used to be, in two senses. First, they were less capable of making judgments between teams since they were denied obviously relevant information. But more importantly, they began to stray from what the voters thought. The voters may be biased, but if you don’t find what the voters would have found, you must be at fault! Gradually, the voters were given more and more weight and the computers less and less. Finally, in 2014, they gave up on computer polls altogether and when to a Committee which used any damn method it wanted.
    The lessons here are many. People think they want objectivity, but they don’t. They think they want to incorporate alternative goals beyond the obvious measurement goals, but attempts to do become hopelessly confounded with the interests of various groups that want to win through the process. People think that computer algorithms can, through some mysterious process force respect, but they can’t, no matter how unbiased they are. Finally, people’s preferences for computer algorithms over human decisionmaking are judged on perceived results, not on the outputs the algorithms are intended to achieve.

  13. Domingos is consistently naive. He is optimistic without reason. It’s hard to see he was such a great prof except I think his science article don’t exhibit the same distressing adolescent naïveté that all his social commentary does. Better he should just stop talking or keep it to journal articles.

  14. I think distinguishing algorithms from data is important in this. Models are the result of applying fitting algorithms to data. So I think it’s fair to say “models” can be biased, even if the fitting algorithm, in a vacuum, is not.

    Data is especially likely to be biased (against groups of people) if it measures the results of human decision making: who gets a bank loan, which candidates are hired, etc. Not only do they result in biased models, but models tend to amplify the bias of the data. The fact that he wrote: “Algorithms help select job candidates… Businesses and legislators alike need to ensure that they are not tampered with” tells me that he doesn’t get it. No matter what degree of legislation you support, using a job candidate example shows a misunderstanding of the issue.

    Prior to the this, the thing that annoyed me about Domingos is that he describes machine learning as “enabling intelligent robots and computers to program themselves.” (see here: https://www.amazon.com/Master-Algorithm-Ultimate-Learning-Machine/dp/1501299387 and I also heard him use that description on an episode of EconTalk) Um, no it doesn’t. It estimates parameters in a model. That’s it.

    • Dave said,

      “I think distinguishing algorithms from data is important in this. Models are the result of applying fitting algorithms to data. So I think it’s fair to say “models” can be biased, even if the fitting algorithm, in a vacuum, is not.

      Data is especially likely to be biased (against groups of people) if it measures the results of human decision making: who gets a bank loan, which candidates are hired, etc. Not only do they result in biased models, but models tend to amplify the bias of the data. The fact that he wrote: “Algorithms help select job candidates… Businesses and legislators alike need to ensure that they are not tampered with” tells me that he doesn’t get it. No matter what degree of legislation you support, using a job candidate example shows a misunderstanding of the issue.”

      +1

  15. Algorithms are pure mathematics reflecting a reality.

    Data collected have bias.

    If we are using algorithms to model something, it’s obvious data should be clean from bias to help the algorithms to found the best modeling, more close to the reality.

    The problem of bias is not inside of algorithms, because they are pure, just reflecting some theme of reality. The problem is on data to be modeled.

    • Adam said,
      “Algorithms are pure mathematics reflecting a reality.”

      Baloney. Algorithms are sequences of steps intended to *model* a reality or process.

      • Nope. I am speaking about algorithms used in machine learning and AI. They represent the ideal world already studied, without noises. Linear equations, for example, it exists an algorithm to model linear data. A dataset to be modeled will probably have bias, but the algorithm is perfect, without bias. Didn’t you understand?

        • This is one of those things that is both true and misleading. A least squares model for fitting linear data is “perfect” and “without bias”, yes. But its mathematical perfection depends on certain assumptions being met by the data to be fitted. And the choice to minimize the sums of the squares of the deviations is usually justified by showing that the expectation values come out right.

          But one never has an infinite amount of data, and it’s not clear for any given finite data set that the same criterion will give an “optimum” result – and even that idea depends on a notion of what “optimum” means in this connection.

        • 1) Algorithms and models are not necessarily mathematics, pure or otherwise. They can be expressed mathematically, but you can use wind tunnels, flow charts, pumps and tubes (Herb Simon famously had a hydraulic model in his economic work), and gyroscopes and gears (like the fire-control computers on old battleships) to build models that instantiate algorithms. Mathematics is the language that ties instantiations together, but being mathematical is not a pre-requisite for being a model/algorithm.

          2) Martha said that an algorithm can be intended to *model* reality, but it is not necessarily a “reflection” of reality. An algorithm can be used to describe an idealized/simplified process that might occur in reality, that is what it means for it to be a “model”. In that sense, it might ignore noise, as you say, but plenty of models also model the noise, like the normal error term in least-squares regression. But there is never a guarantee that the model “reflects” any reality at all, except maybe the one that we hope or believe exists.

  16. Andrew, I think we are illustrating the challenges of communicating across disciplines.

    It’s trivially true, as you say, that not all algorithms are optimizations, and I agree on all of your estimation points. I don’t agree that they address my point.

    Much of fairness research in AI is, in fact, about constraining optimization problems to support some parity property in the final system output. A standard example would be to say “Find the most accurate model for predicting whether individuals will do X, but with the constraint that women and men score identically on average”, where X might be something like “click on an ad for an engineering job”. Adding a constraint like the one above essentially always reduces your overall accuracy, often across groups. This is the well-known accuracy-fairness trade off from the fair ML literature, and its existence is uncontroversial in the field. Note that it can still hold with ideal data and training processes.

    It’s popular because it’s easy to conceive of applications where this is acceptable or good, especially depending on your point of view. Still, it comes with costs, and I’m personally uncomfortable with the direction many of these discussions are going within the community.

    On your final point about adjusting data, please see my last line:
    “Where I think Pedro is somewhat more wrong is that detecting biased data, debiasing it or collecting better datasets, etc. will itself be an algorithm driven process. Still, I’m a lot more comfortable with that.”

    Apparently I was unclear, but we agree on that point. I really don’t know anyone in the field who thinks that the underlying models fix bias in the data somehow. Making your data less biased, adjusting, improving collection, etc. are avenues for improvement all around.

    • > Adding a constraint like the one above essentially always reduces your overall accuracy, often across groups.

      Disagree (with the “essentially always” part). Any algorithm/model is trained on finite and narrow data, and so has out-of sample error. If one can provide prior knowledge it’s quite possible, perhaps likely, to get a better predictor (measured out of sample, of course, but that’s what we care about).

      There’s a big step from “essentially always” to ‘this can still hold’ as you conclude your final paragraph. *In practice* how important is the tradeoff that you claims is uncontrorsial in ML: is it a near inevitability, or is it a theoretical possibility?

      If we “know” that women and men should do equally on some task (based on
      reason, or experience with a wide range of similar tasks) an algorithm can do
      better if forced to embed that knowledge than if it has to discover it for itself
      based on its necessarily extremely limited (in scope and quantity) training data.

      So if someone wants to “unbias” a model, it could be that they have a non-accuracy agenda (e.g. fairness). But it’s also consistent with them genuinely *believing* actual facts under which the model is presumptively suboptimal – from an out-of-sample accuracy perspective – and so would (given those beliefs) not acknowledge the tradeoff as being relevant.

      • To address your question, in practice the literature has found that the accuracy-fairness trade off to be pretty much what happens out of sample with problems that can be framed as adding a fairness constraint. It’s not a primarily theoretical concern. This isn’t hard to verify with a literature search.

        It’s certainly the case that biased and limited data can lead to unfair outcomes. That is an uncontroversial point.

        In general I don’t like hard constraints on models. Priors should have support outside of a delta function to allow data to disconfirm your beliefs. If you think your data are so bad as to support such a need, you need to collect better data.

        Further complicating these issues is that different natural notions of fairness are actually logically incompatible as well, forcing trade offs between notions of fairness as well. There is well developed theory around these trade offs. See for example https://arxiv.org/abs/1609.05807 .

        The point of my “this can still hold” point is that even ideal data and algorithms do not always get us out of these trade offs.

  17. Hitting a few themes that are discussed above:

    1. What do we mean by “Bias”?
    Jessica has pointed out that “the word ‘bias’ is overloaded” here. I agree. Are we talking about ‘bias’ in the statistical sense of an “unbiased estimator”, or in the plain-English sense of an tendency to favor or disfavor a group of people, or maybe something else?

    I think when people express concern about “bias” in decision-making, we’re specifically talking about an _unfair_ favoring or disfavoring of a group of people. But this may or may not be what people are talking about when they say a particular statistical model is “unbiased.”

    2. What are the algorithms trying to model or predict? This decision in itself can be a source of “bias” in the sense of favoring or disfavoring a group of people. For example, as Dave has pointed out, suppose you want to replace your bank’s loan officers with a computer program. You can train your program to make the same decisions your loan officers make, but if your loan officers were making biased decisions then the computer program will too.

    A commenter suggested “maximizing profit” as an inherently “unbiased” goal, indeed claiming that it is unbiased “by definition.” But consider the fact that the Boston Red Sox baseball team didn’t hire a black player until 1959. It’s entirely possible that this was indeed a profit-maximizing decision: maybe they would have had lower attendance or viewership/listenership (corresponding to lower advertising revenue) if they had hired blacks. Their decision to play white players who were inferior to black players they could have hired might have been profit-maximizing, but reasonable people can still argue that it was biased.

    3. What data are provided to the algorithm? This is a pretty big source of potential bias (in the sense of favoring or disfavoring a certain group). If a college is trying to predict the probability that a given prospective student will go on to graduate (if they are admitted), they could use an algorithm based only on, say, high school grade point average and SAT scores. Such a model might overestimate (or underestimate) the graduation probability of kids who went to especially good high schools, and those students will be different economically and racially from those who went to lesser schools. So maybe you should include a measure of school quality. And what about parents’ level of education? And so on. At any point you can say “we’re putting in numbers and the model gives us results, there’s no bias here” but there’s no reason to believe that’s true.

    4. Finally: as perhaps the Red Sox example illustrates (#2 above), there are important ways in which a prediction or decision can be socially or morally unacceptable even if there is an argument that it is “unbiased.” If it’s really the case that the Red Sox were merely maximizing profits by failing to hire black players — maybe the people making the hiring decisions were genuinely colorblind but they knew their fans weren’t — we might still find their decision objectionable or immoral. Most of us would prefer that companies (and people) _not_ simply profit-maximize. In some instances we might even prefer that they are biased (in the sense of giving preference to a certain group). I guess this is really just a restatement of my point #2: the choice of what we’re trying to achieve can be a major source of “bias”.

    • All excellent points. Thanks for the summary. I would add another dimension, which I’m not advancing it as having a practical, fully satisfying solution, but one which maybe we should pay more attention to. I’d be curious to hear more from others about this.

      Suppose we accept candidates to college based on their probability of successfully graduating. Anyone whose prediction of success is below a qualifying threshold should be discarded. Let’s say one set of inputs used in the prediction consists of all sorts of socio-economical indicators measuring whether the candidate has a supportive environment for studying (e.g., enough money for living in a place that facilitates studying, not living too far away and missing classes, or not requiring a demanding job in parallel). We can assume these factors do affect the probability of graduation. If we believe our models, we should be trying to making predictions based on as much information as possible about the candidate, including those indicators. Call this Model 1.

      If we make a judgement call that this type of information is discriminatory, under the premise there was only so much a young candidate could control it, we may feel inclined to discard these variables (including, for the sake of argument, other proxies for them that might exist in the data). We still can have a model that does as well as possible on the information that is used. It’s just not as accurate as Model 1, which also uses the (moral-based) discriminatory information. Call this Model 2.

      If we follow the premise that trading-off predictive accuracy for moral values of fairness and privacy is a mortal sin, the latter Model 2 looks unattractive. But then, in Model 1 it’s the candidate who solely pays the price for “your risk is too high” assessment. It boils down to a fatalistic premise that having a “high” probability of failing should be taken as “you are as good as failing”. Tough luck, it sucks to be you. Alternatively, we could transfer part of the risk to the institution offering the opportunity: the college *will* see a decrease on graduation rates by using Model 2 (assuming Model 1 does its job). It’s just that the cost of failing doesn’t go (just) to the candidate who was allowed to enter. It will perhaps cost the institution in terms of reputation, and other candidates deemed to have higher success under Model 1 than Model 2 and hence were not let in. Maybe it just sucks to be them?

      The above is just to illustrate that prediction accuracy is perhaps not the be-all-end-all of augmented decision making with statistical predictions. There is more than one risk at every prediction, as there isn’t a single objective function to be optimized: in each of these decisions there are multiple stakeholders. It could be argued that allowing more failure leads to “bad” feedback loops: if the school loses reputation, then everybody loses. But this is true only if we continue to penalize these failures as usual, ignoring the effort on trying to make things fairer. As another hypothetical example, a bank may want to give a loan to a starting business that has part of its risk due to possible racial discrimination against its owners. The risk is real, the bank is not making it up. But should the race of the applicants (and any proxy to it) be key factors on the risk assessment, or should we assume a as-if counterfactual world where this information is ignored? The argument against the latter is that the rate of failure will increase and it will affect the credit score of this class of applicants. But.. this assumes that penalties remain the same. Why? Shouldn’t we be spreading the risk among all parts instead of just putting all burden on the shoulders of the weakest link? By “all parts”, this may even include tax incentives/public insurance to compensate banks for the extra risk, so the burden is not just on them either. These bad feedback loops are in good measure an artefact of an “other things being equal” framing that doesn’t need to be so.

      Sure, all of the above may require wishful thinking about achieving new compromises against a very ingrained way of doing things. But I don’t like to see are assumptions that things are they way they are because of some magical fundamental law of society that is immutable.

  18. >The latter [adjusting the output of the algorithm to ensure that men and women are paying the same average rate] sounds more like the op-ed’s characterization of debiasing, which is what I find silly.

    Yet this is the primary objective of “equity” based definitions of fairness. It seems like it would be much more effective to attack this insipid definition of equity, than to quibble about how it is factored into an objective function.

  19. Just to be clear, I’m not claiming that learned models are always unbiased. They obviously can be, but here also people have been far too quick to see race/gender biases where it’s far from clear there are any. Either way, it’s incontrovertible that the learning algorithms themselves (e.g., linear regression) have no race/gender biases, and thus attempting to “debias” them is in fact the source of bias. And an op-ed is necessarily greatly oversimplified; its job is to alert people and foster discussion, which I think this one has.

    • Domingos says “ Either way, it’s incontrovertible that the learning algorithms themselves (e.g., linear regression) have no race/gender biases, and thus attempting to “debias” them is in fact the source of bias.”

      If you’re talking about the algorithms in the narrow sense of the mathematical operations, this is not just incontrovertible but so obvious as to not need mentioning. But this is not what people mean when they say the algorithms are biased. What people are unhappy about is that the combination of (objective function, training data, model) will lead to results that are at least arguably unfai, i.e. biased, against certain groups. You surely understand this; it would be nice if you acknowledged this concern and addressed it, rather than simply stating the obvious fact that mathematical operations are nit inherently biased.

      • I’m with Phil here. Pedro, why simplify to the point that it seems much more like a political statement than an attempt at showing a path forward? I agree that politics have entered into the AI/ML ethics discussion, but why exacerbate that aspect, rather than try to get to the heart of what’s at stake and what the options are?

      • I do think this article is vaguer than it has to be, but I’ve been steel manning Pedro here.

        I don’t think he’s attempting to downplay the importance of applied problems or fairness research in machine learning. For example, this statement:

        “‘Debiasing’, in other words, means adding bias. Not surprisingly, this causes the algorithms to perform worse at their intended function.”

        I believe refers to the impossibility theorems on the incompatibility of different intuitive notions of fairness. We wouldn’t even know about this if people weren’t doing research on fairness and ethics in machine learning, so I don’t think Pedro is knocking that field of research.

        I read this article as taking aim specifically about the push for clumsy regulation, such as those requiring ethics statements and those prohibiting inclusion of protected-class features and correlates, and how they intersect with online cancel-mobs. There are fairness and robustness questions, but these hardline regulations ARE terrible solutions, and I’m a little worried that they might get smuggled into legislatures anyways.

        There’s a kind of “with me or against me, part of the solution or part of the problem” logic whereby questioning these proposed solutions gets conflated with questioning the existence of the problem altogether, at which point you become an enemy, and I’m kind of feel like this has happened to Pedro.

        • As someone who has been questioning the implementation of things like the broader impacts statement requirement, I’m totally in agreement with this: >There’s a kind of “with me or against me, part of the solution or part of the problem” logic whereby questioning these proposed solutions gets conflated with questioning the existence of the problem altogether, at which point you become an enemy, and I’m kind of feel like this has happened to Pedro.

          People (Twitter especially!) are jumping to conclusions like this left and right, and its a real problem in this debate.

          I guess the question I’m having trouble answerng though is that if that’s the problem Pedro is trying to point out, why respond with an op-ed that seems so deliberately political, rather than aspiring to the “rational” line of thinking where one points out the complexities so that the so called militant liberals can’t ignore them so easily? It just doesn’t seem like an effective strategy if one really wants to point out issues with the ethics movement. Part of what’s tough for me is more the suggestion that he’s being rational while the other side is a bunch of activitists; I see staunch activism in both cases so we should at least be honest about that.

        • I’m not Pedro, so if he’s still around and wants to comment y’all can just ignore me but some speculation:

          1. Maybe building political coalitions is just a part of how you get things done or stop them from being done, and Pedro thinks the benefit of getting things done this way outweighs imprecision in a utilitarian sense

          2. Maybe he’s annoyed? I’ve seen some discourse from very brilliant people about how Pedro is personally an asshole. Now, all I really know about him is that he was on EconTalk and has a good quote about how machines are too stupid and have taken over the world, and it’s very possible that he is personally an asshole, but even so it’s not going to make anyone more empathetic to your cause.

        • I agree that that saying he’s an asshole so he can’t be taken seriously is not a good way to convince others. But I think for many the priors are so strong in this case that it’s hard to overlook them and see the latest event with fresh new eyes. Imagine trying to “forget” that someone previously suggested climate change was not real. It’s kinda hard.

        • Agree – I guess I was being too subtle here https://statmodeling.stat.columbia.edu/2020/12/29/retired-computer-science-professor-argues-that-decisions-are-being-made-by-algorithms-that-are-mathematically-incapable-of-bias-what-does-this-mean/#comment-1627260

          And a typo (as always) “So initially its a get bet that they [these hardline regulations] will do more harm than [good] …

          The first instance I encountered was an ethicist who insisted that any clinical trail conducted with less than 80% power was totally unethical who understood little if anything of how power was guestimated or what it even meant.

          Also explains why so many people can get onboard so fast – very little training is required and any field of science can be jumped into.”

      • ” What people are unhappy about is that the combination of (objective function, training data, model) will lead to results that are at least arguably unfai, i.e. biased, against certain groups.”
        If I may translate, “if the results,regardless of the process, are not what we prefer then the process should not be used”. It sounds that way to me even if a bit exaggerated.

        • My house, built in the 1940’s has an article in the CC&Rs recorded in the County of Los Angeles that says it can never be sold to anyone who is “not a member of the caucasian race”. Of course it’s not enforceable anymore, but it took until the 70’s to invalidate if I understand correctly.

          But sure, there are no biases inherent in systems that we’re training software on and it’s all just social justice warriors imposing their preferences on the masses.

        • If a process concludes Obama is a basketball and Xi Jinping is a ping pong ball then I would reject the process yes. If you want to be semantic, the “process” here is the practice of choosing whatever images are most available on the public internet for my training set rather than stochastic gradient descent on convolutional neural networks, but you can consider the whole pipeline as one machine learning process.

    • > And an op-ed is necessarily greatly oversimplified;

      You don’t need a lot of space to put in caveats. If you are widely misunderstood, that’s on you – not the amount of space you were given.

    • You probably should not be making this comment to a practicing statistician. The basic analogue of your statement in a field where people are accountable for explaining their results would be something like “statistical regression methods can’t be biased, so trying to debias your analysis methods is in fact a source of more bias”. This is just wildly untrue in practice to the point where it’s kind of shocking that a conversation like this by apparent professionals is even taking place. If you want to stand a chance of making correct conclusions from real data you cannot think this way.

      In practical circumstances you *must* deal with biased data, and you essentially have to do so by incorporating expert knowledge, trying to measure where the biases might be, and changing the way you estimate things (sometimes lowering the accuracy on your nominal surrogate for the real inference task — the horror!) to account for it. The problem with ML algorithms is not that the raw math is biased, it is that we currently don’t understand how to adapt the algorithms to incorporate soft knowledge of data biases, linking particular kinds of change to even qualitative statistical guarantees. If you can’t do this then there are several avenues of application that are completely dead to you. ML ethics is looking at some of the more concerning consequences for this mistake but the same technical problem shows up in more benign ways elsewhere. I *completely understand* your frustration at the people who say “different outcomes means bias” and stop there, but the answer to that needs to be *correct* and work in real life: insisting that the entire issue isn’t really your problem is not correct and not viable.

      At least we’re having this conversation at the right place; you should read Andrew’s professional writings some time. He’s spent an incredibly generous amount of time explaining to scientists that “objective” statistical methods will not save them from themselves, and that hiding behind the supposed purity of black-boxed math isn’t remotely enough to lead to right answers in practice. Apparently he will have to spend the next decade repeating this word for word to ML practitioners.

  20. “AI” means this: no customer service for you, the unwashed peasant; respond to the voice-prompts or else goodbye! If you have a large enough cash balance, at the bank, you can talk to someone, who’ll fix the problem. The rest of you peasants: out of sight, out of mind. Go watch “Breaking Bad” on your smart-phone and pretend that if you get mad enough you’ll get your way. Or, vote a nazi into high office. But in the day-to-day piddling affairs of the peasantry, if you got some problem, tough; talk to the F**ing computer or else disappear.

    • “AI as implemented by humans means this.” AI does make it cheaper to do as you say, and that is an important point. It can also make the ethics more opaque so people don’t feel as personally responsible – and that is important as well. But it is not AI that did that – it is the humans who designed and implemented it. I mostly agree with your characterization – but I always want to place the responsibilities on the humans that did these things.

      • Of course! I get carried away with ironical rhetoric; so left the crux implicit. That is the *crux* of the matter and it should be brought into the light; over and over and over. The design and the intent is human. How could it be otherwise? It has many facets, flows from many sources; all which find their happy coalescence in this AI technique: the withdrawal of the rivulets of accountability, no longer to be the proper share of the commoner. Take away some dispositive power, nominally held by birthright, sell it back to him for a few tens or hundreds of dollars here or there. That’s the model by which, in the course of a generation, the freeholder is suckered or forced into villeinage.

      • I do wonder, though, about — given the economic and social effects of e.g. replacement of lower-education-required jobs — could we argue that automation of (non-dangerous) jobs is *per se* ethically questionable?

        I’m not saying I believe this; I mostly don’t. But it might be an interesting line of thought.

  21. It feels like some people are failing to understand that from a fairness point of view the term ‘biased’ refers to bias in the protected class, not in the model as a whole. When protected classes are minorities, there will be a natural imbalance in samples, which can be picked up by the model as a ‘preference’ for the majority class, which at a minimum should be measured. Taking a look at the residuals can result in requiring explicit corrections to the dataset, even if itself is not biased in the sense of not representing the population.

  22. I feel there are several “biases” being discussed concurrently. Here are a few I’ve seen referenced:

    1) Bias in data leading to biased estimates and predictions, as discussed by Andrew.
    2) Bias inherent in the algorithm. I believe this is what Domingos is referencing. A neural network will not decide to lower women’s car insurance premiums unless doing so reduces the loss function. As opposed to humans, who often include racist or sexist priors (bias), leading to decision far from that of straightforward, “unbiased”, optimization of the data.
    3) Bias in all the rest of modeling: the choice of data, loss function/endpoint, when to stop adjusting the model, etc. These choices are all made by humans with priors.
    4) Bias in outcome. For example: even with perfectly representative data, agreed upon loss functions, etc., models may predict women have lower creditworthiness (see the Wired article Carlos linked to). This can be in a catch-all gender parameter, or if the data fidelity is high enough, in downstream correlated variables (such as risk-tolerance). Either way, the model could predict the average woman has lower creditworthiness than the average man.

    I feel 4) is the most debated on social media. For me, it’s more of a moral/political question. Science can estimate the lives saved if alcohol consumption goes to zero, it can’t say whether prohibition is a good or worthwhile policy. Science estimates measurable effects, not utility. Even with uniform agreement on the data and averages, some may prefer to have equality of outcome, depending on what they believe to be the cause the differences (James Demore biological tendencies vs. gender socialization), personal preference for economic efficiency, or personal preference an perception and fairness.

    One alternative to this is to go back to human decision-makers, who with wide variance and personal biases, have the ability to adjust towards their personal notion fairness on a case-by-case basis, incorporating far more than the measurables fed into machines. See Harvard admissions and all that entails.

    • Yes, but there’s a tension here: the “ability to adjust towards their personal notions of fairness” is, for lots of people, the *problem* not the solution. If you trust the humans who make these decisions, you’re happy; if you don’t, you’re unhappy. And in a low trust environment, not only don’t you trust the humans making decisions, you don’t trust the humans making models which are intended to instantiate some notion of fairness. But that’s never the computer’s fault, which I take to be Pedro’s point. But the tendency to blame “the computer” or “the algorithm” has as its wellspring the public relations and entirely unsincere attempt to evade responsibility by saying: Shut up… it’s Science, and the Computer knows better than any of us.

    • > 2) Bias inherent in the algorithm. I believe this is what Domingos is referencing. A neural network will not decide to lower women’s car insurance premiums unless doing so reduces the loss function. As opposed to humans, who often include racist or sexist priors (bias), leading to decision far from that of straightforward, “unbiased”, optimization of the data.

      The choice to optimize for reducing the cost function isn’t necessarily “unbiased”. The cost function itself isn’t necessarily “unbiased”.

      • > The choice to optimize for reducing the cost function isn’t necessarily “unbiased”.

        That’s a great point. The people advocating for optimizing could be doing it because they believe common loss functions would lead to their preferred outcomes. Then again, the choice to not optimize would likely be similarly biased.

        > The cost function itself isn’t necessarily “unbiased”.

        Absolutely (and hopefully mentioned in 3). I always wonder what college admissions aim to achieve: maximize graduation rates, maximize future donations, minimize odds of “Eight life sentences” being listed as an award in the alumni magazine (https://tinyurl.com/y96scl48)? The choice is political more than scientific and leads to incredibly disparate outcomes. Acting as if the resulting decisions are somehow objective seems naive, even if the alternatives aren’t better.

      • jon said,
        “The choice to optimize for reducing the cost function isn’t necessarily “unbiased”. The cost function itself isn’t necessarily “unbiased”.”

        +1

  23. As an engineer, I’m used to using mechanistic models that come from first principles. I feel naive for saying this, but how could someone not be concerned about bias if the best they can do to model a system is to use observational data? Said another way, if your entire understanding of the system comes from limited (hint->bias) samples of each variable*, how can you claim to understand the system well enough to discount bias out of hand? I guess it is my impression that any modern and useful inference-based model is likely to be of a system that is so complex that you must also admit there is bias. I am curious about what I am missing in this conversation coming from outside the field.

    *this is saying in words what i think follows from the idea of using “unbiased algorithms”

  24. Will the use of gender and race as explanatory variables in research be scorned upon in the near future? I hope we don’t end up in a situation, where research that shows them to be significant be suppressed.

    • It’s the opposite, if you assume gender and race effects don’t exist you end up with proxies that make it seem like there is no bias. Exploring gender and race differences lets you understand things like how the differences in lived experiences of Blacks and Whites may have an impact on other things like health outcomes or who they vote for. And then in that way we also see that race and gender are proxies for all kinds of individual level experiences (like being told you aren’t college material etc).

  25. In 2019, the National Institute of Standards and Technology (“NIST”) published a report analyzing the performance, across races, of 189 facial recognition algorithms submitted by 99 developers, including Microsoft, Intel, Idemia, and other major tech and surveillance companies. Many of these algorithms were found to be between 10 and 100 times more likely to misidentify a Black or East Asian face than a white face. In some cases, American Indian faces were the most frequently misidentified. Most algorithms were substantially less likely to correctly identify a Black woman than a member of any other demographic.

    https://jolt.law.harvard.edu/digest/why-racial-bias-is-prevalent-in-facial-recognition-technology

    I think one of the big issues in AI is the attempt to shift responsibility elsewhere — to the data, to “the AI” itself as if it was capable of evaluating itself. Any IT system on the planet has someone who pays the bills and who decides if it is fulfilling its function or not. Every IT system has designers who evaluate if their products meets their goals.It’s worth remembering that there are *people* who are responsible for what the algorithms they design or deploy do; and if you design algorithms, you are also sharing part of the responsibility for which goals these algorithms can and cannot support. There are obviously plenty of ways in which the features that you optimize your algorithm for can introduce bias into a real-world setting, and by teaching that “the algorithm” can’t be biased, you are teaching developers to be irresponsible. I’m not happy with that.

    Oh, and putting the term “militant” in a headline about an article about bias when the suggestions you’re criticizing are anything but violent comes across as quite ironic. The examples chosen are also highly politicized–why?

    The solution, however, is simple: recognize that there are people responsible for any computer system, and hey are in the responsibilities for the consequences of decisions that these computer systems support. With this recognition, you can let the political process sort any consequences of politics you don’t agree with. But claiming algorithms are by nature unbiased and not amenable to social regulation entrenches any biases they incorporate (or are claimed to incorporate), and removes them from political evaluation. That’s super nice for you if you create these systems, but backfires badly when someone you don’t agree with does: which means it is a bad idea.

    • Every field does this. Problems that are core to your field remain so until you realize your current tools can’t solve them. Then you redefine your field so that it’s someone else’s problem. Physics at least tries to discover the actual ontology of the world until quantum mechanics, then all of a sudden that’s not what science is about. Biologists don’t try to give a remotely adequate answer to “what is life” because you can’t figure it out through gel electrophoresis and you don’t need it to do PCR, etc.

      Machine learning currently has so little understanding of its subject matter that we don’t understand how to pool together complex background knowledge about the data to influence predictions in the same way a good statistician would to produce better estimates. But ML practitioners want the money *now*, so it’s someone else’s problem and there’s nothing to be done other than find iid data somehow. Eventually there *will* be algorithmic solutions to this problem and then these same people will be preening themselves over how amazing their field is for figuring it out.

    • The designers have something to sell: design-making machines for entities that wish to remove the last taint of accountability from the decisions they thus can force upon those too poor to have it any other way. At first this is an attractive prospect to a hard-headed organization which sees it as a way to take no more guff from its “customers” — but to pitch it as a great convenience; e.g. the bullshit “chat assistants”. Sooner or later when they’ve all signed on, they all suffer from the race-to-the-bottom: Nobody, even at the highest level of the chain, is any longer accountable to anyone. It is a sophisticated variation on the “please …. listen …. carefully …. for …. our …. menu …. options …. have …. changed…..” scam. It also puts a lot of engineers to work. But …. not …. for …. long!

    • Great point about shifting responsibility — as opposed to taking responsibility and doing something about it, which is what the “AI Ethics” community is focusing on. Pedro’s objection of NeruIPS requirement for a discussion of impact, and including an ethics review, is similarly a disavowal of responsibility.

      • Responsibility sounds so quaint nowadays. Like the Golden Rule and so forth. Look at the daily outrages about which we read if we can bear to. My fellow citizens! And yet, if I have any hope or desire at all, that things move in the direction of some better equilibrium, I am forced to concede: It must begin with me. I too must strive to take responsibility, wherever and whenever I can, and also to endeavor to practice what my fellow citizens may think is ridiculously out-of-date. (Hell, I noticed even the New York Post came around the other day).

  26. The fundamental flaw in discussions about bias in AI is that it takes as it’s implicit comparator a perfectly unbiased choice. But that’s never the question. Rather, the question is if the AI is more or less biased than the humans who would otherwise be doing the job. We know humans are pretty susceptible to bias so in most cases I’d expect the AI to actually have less bias than a human doing that task.

    Also we tend to ask too much in terms of being unbiased. There is a great paper out there pointing out that in any situation where the base positive rate differs between two populations people see any decision mechanism as unfair (eg if blacks and whites offend at different rates you can’t make both the type 1 and type 2 errors of identifying an offender independent of race). That’s because we can’t even decide what fairness is even supposed to mean.

    • Rather, the question is if the AI is more or less biased than the humans who would otherwise be doing the job.

      I have a feeling the answer to this question might depend on who is doing the answering.

      But there is another distinction. Computer systems are resistant to change (think about how many mission-critical systems were still running on old COBOL code back in y2k); so while humans as a society may learn about new biases and incorporate this new knowledge into their behaviour, are AIs who do not particpate in society on the same terms able to do that?
      To put it more bluntly, would a hypothetical AI that was “less [racially] biased than the humans [in 1960]” still be a good idea today? “The way we did things worked out well for us, and there’s no reason to change” is a political view, not a scientific one.

      Maybe we shouldn’t be comparing AIs to the “average human” who is doing the job today, but rather to the humans we would want to be doing the job ten years in the future: because humans, participating in society, may learn to recognize their own biases, and correct them, while AIs probably won’t.

      So what we’re having here is a political discussion of what we want that future to look like, recognizing that the future is shaped by the decision-making (or -supporting) systems we deploy today.

      And is introducing an “unbiased” AI system simply a strategy to change the way we do things without having the “hassle” of an associated political discussion of the effects of that change? (That strategy has plenty of precedent when IT systems are concerned: every time something unpleasant is blamed on “the computer”, you can see that principle at work.) It’s a kind of gaslighting to say “this is not political, and if you think it is, I will label you as unreasonable militant”.

      When there is political debate about choosing algorithm A or algorithm B (B being a “debiased” modification of A), and both algorithms are less biased than humans, I wouldn’t consider “A is better than humans, so let’s stop discussing B” a convincing argument. It’s a political argument, because it points out “you’re getting some of what you want, so let me have some of what I want, too”, and it may work, or it may not. That depends on how the political process plays out; if the A proponents have enough power to block B such that the alternative actually is “humans or A”, then that argument may help make deploying A politically viable. But that’s still a political process, and not a technical one.

      • I think it is more of a commercial argument. Rarely things that do not make sense commercially survive.
        All models are biased to that effect. Story sits on top of this.

        • Filters on coal power plants do not make sense commercially; it’s more profitable to not have any and let society bear the cost of the effects of the ensuing pollution. Government regulation creates a level playing field, ensuring that it is no longer profitable to run an unfiltered coal power plant.

          If you regulate AI, you can ensure that features which “do not make sense commercially” are present. Which features are desirable for a society to impose is a political issue; see my previous post.

      • “Maybe we shouldn’t be comparing AIs to the “average human” who is doing the job today, but rather to the humans we would want to be doing the job ten years in the future”

        No. You should compare with the average-human who will *not* be doing the job ten years in the future; because the race to the bottom will have eliminated even that last shred of respectable service which he can perform and for which he can be rewarded. Because the clever commercial interests will have replaced him with the more servile, cheaper version of himself: the “chat-assistant”. That average-human will then be encountered either rioting through the countryside and through the towns with his fellows, like the great rioting mobs of displaced peasants of the English fifteenth and sixteenth centuries; or he will be recruited to do the same in order to “keep the peace” in the overseas dominions from which residual raw-materials must be scavenged.

Leave a Reply to somebody Cancel reply

Your email address will not be published. Required fields are marked *