Great and beautifully written advice for any data science setting:

- Google. Responsible AI Practices.

Enjoy.

Posted by Bob Carpenter on 18 January 2019, 4:00 pm

Great and beautifully written advice for any data science setting:

- Google. Responsible AI Practices.

Enjoy.

## Recent Comments

- Daniel Lakeland on Yes, you can include prior information on quantities of interest, not just on parameters in your model
- Keith O'Rourke on Yes, you can include prior information on quantities of interest, not just on parameters in your model
- Andrew on Yes, you can include prior information on quantities of interest, not just on parameters in your model
- Keith O'Rourke on Multilevel structured (regression) and post-stratification
- Daniel Lakeland on Yes, you can include prior information on quantities of interest, not just on parameters in your model
- Daniel Lakeland on Yes, you can include prior information on quantities of interest, not just on parameters in your model
- Keith O'Rourke on Yes, you can include prior information on quantities of interest, not just on parameters in your model
- Timothy on Yes, you can include prior information on quantities of interest, not just on parameters in your model
- The naked statistician on Yes, you can include prior information on quantities of interest, not just on parameters in your model
- Z on Yes, you can include prior information on quantities of interest, not just on parameters in your model
- Richard Kennaway on “I feel like the really solid information therein comes from non or negative correlations”
- The naked statistician on Yes, you can include prior information on quantities of interest, not just on parameters in your model
- Bob Carpenter on Yes, you can include prior information on quantities of interest, not just on parameters in your model
- Bob Carpenter on Yes, you can include prior information on quantities of interest, not just on parameters in your model
- Nik Vetr on Yes, you can include prior information on quantities of interest, not just on parameters in your model
- Andrew on Yes, you can include prior information on quantities of interest, not just on parameters in your model

## Categories

Prediction: With greater-than-even probability, Frank Harrell will not like the section that reads: “false positive and false negative rates sliced across different subgroups.”

Ben:

I actually have no idea what they mean by “false positive and false negative rates” in that context.

Let’s take a simple case, where a false positive is offering a loan to someone who will default and a false negative is not offering one to someone who would pay it back. The goal would be to make sure we’re not introducing extra bias for some subgroup of the population.

But forget about false positve and false negative—that’s just one error criterion for when you make binary decisions. The bigger idea is really just evaluating different groups and interactions in multilevel regressions rather than simply looking at aggregate results, which might not show up biases in subgroups.

It does look like a public relations piece – clicking through gets you to actual papers e.g. I found this https://ai.google/research/pubs/pub47077

The “sliced across different subgroups” seems to be a warning not rely on on global assessments of say prediction (like the NN that had the lowest mse predicting toxicity by often missing highly toxic but doing very well on lower toxicity).

That sounds more like using the wrong evaluation metric. They didn’t actually want to optimize for low mse since underestimating high toxicity is more dangerous, so why did they train their model to do that?

“The way actual users experience your system …”

Who are the non-actual users and what do they experience when they meet your system?

Is this philosophy of statistics or the statistics of philosophy?

Non-actual users are the three geek friends in the next cubicles who give it a whirl and say “Cool”.

While not actual aliens, their knowledge and reactions have no relation to an actual user.

I read it just as an emphasis expression, something like: the specific type of people that experience your system. Not as a deep philosophical statement.

The non-actual users can be simulated users that you create while training the system. Let’s say I’m building a search system. I’ll simulate user queries and collect them for evaluation. They may or may not be representative of what actual users do when they hit the system in the wild. jrkrideau brings up the intermediate step of the friends in the next cubicles, who, like the simulated test set, may not be represented of the intended target audience for an application.

I assume this is how Google actually implements responsible policy:

https://www.engadget.com/2019/01/21/france-fines-google-over-gdpr/

Being old and cynical as I am, one has to wonder if for Google, and other large similar companies. if the terms “responsible” and “AI” (or “ML”) aren’t oxymorons

I’d be curious to hear of a project or feature that google axed because it didn’t pass this policy. i.e. Is this actionable or only lip service?