11 thoughts on “Google on Responsible AI Practices

  1. Prediction: With greater-than-even probability, Frank Harrell will not like the section that reads: “false positive and false negative rates sliced across different subgroups.”

      • Let’s take a simple case, where a false positive is offering a loan to someone who will default and a false negative is not offering one to someone who would pay it back. The goal would be to make sure we’re not introducing extra bias for some subgroup of the population.

        But forget about false positve and false negative—that’s just one error criterion for when you make binary decisions. The bigger idea is really just evaluating different groups and interactions in multilevel regressions rather than simply looking at aggregate results, which might not show up biases in subgroups.

    • It does look like a public relations piece – clicking through gets you to actual papers e.g. I found this https://ai.google/research/pubs/pub47077

      The “sliced across different subgroups” seems to be a warning not rely on on global assessments of say prediction (like the NN that had the lowest mse predicting toxicity by often missing highly toxic but doing very well on lower toxicity).

      • That sounds more like using the wrong evaluation metric. They didn’t actually want to optimize for low mse since underestimating high toxicity is more dangerous, so why did they train their model to do that?

  2. “The way actual users experience your system …”

    Who are the non-actual users and what do they experience when they meet your system?

    Is this philosophy of statistics or the statistics of philosophy?

    • Non-actual users are the three geek friends in the next cubicles who give it a whirl and say “Cool”.

      While not actual aliens, their knowledge and reactions have no relation to an actual user.

    • The non-actual users can be simulated users that you create while training the system. Let’s say I’m building a search system. I’ll simulate user queries and collect them for evaluation. They may or may not be representative of what actual users do when they hit the system in the wild. jrkrideau brings up the intermediate step of the friends in the next cubicles, who, like the simulated test set, may not be represented of the intended target audience for an application.

Leave a Reply to Andrew Cancel reply

Your email address will not be published. Required fields are marked *