Great and beautifully written advice for any data science setting:

- Google. Responsible AI Practices.

Enjoy.

Posted by Bob Carpenter on 18 January 2019, 4:00 pm

Great and beautifully written advice for any data science setting:

- Google. Responsible AI Practices.

Enjoy.

## Recent Comments

- Huw Llewelyn on No, its not correct to say that you can be 95% sure that the true value will be in the confidence interval
- Anonymous on No, its not correct to say that you can be 95% sure that the true value will be in the confidence interval
- george on No, its not correct to say that you can be 95% sure that the true value will be in the confidence interval
- Sameera Daniels on No, its not correct to say that you can be 95% sure that the true value will be in the confidence interval
- eln on Lessons about statistics and research methods from that racial attitudes example
- elin on Lessons about statistics and research methods from that racial attitudes example
- elin on Changing racial differences in attitudes on changing racial differences
- Ben Prytherch on No, its not correct to say that you can be 95% sure that the true value will be in the confidence interval
- Anoneuoid on No, its not correct to say that you can be 95% sure that the true value will be in the confidence interval
- Sameera Daniels on Abandoning statistical significance is both sensible and practical
- Ulrich Schimmack on No, its not correct to say that you can be 95% sure that the true value will be in the confidence interval
- Anoneuoid on No, its not correct to say that you can be 95% sure that the true value will be in the confidence interval
- Tötteröpää on Several reviews of Deborah Mayo’s new book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars
- John Maindonald on Abandoning statistical significance is both sensible and practical
- Rmadillo on Claims about excess road deaths on “4/20” don’t add up
- Keith O'Rourke on Several reviews of Deborah Mayo’s new book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars
- Alex on Claims about excess road deaths on “4/20” don’t add up
- RJB on A question about the piranha problem as it applies to A/B testing
- Anoneuoid on A question about the piranha problem as it applies to A/B testing
- Sameera Daniels on Abandoning statistical significance is both sensible and practical

## Categories

Prediction: With greater-than-even probability, Frank Harrell will not like the section that reads: “false positive and false negative rates sliced across different subgroups.”

Ben:

I actually have no idea what they mean by “false positive and false negative rates” in that context.

Let’s take a simple case, where a false positive is offering a loan to someone who will default and a false negative is not offering one to someone who would pay it back. The goal would be to make sure we’re not introducing extra bias for some subgroup of the population.

But forget about false positve and false negative—that’s just one error criterion for when you make binary decisions. The bigger idea is really just evaluating different groups and interactions in multilevel regressions rather than simply looking at aggregate results, which might not show up biases in subgroups.

It does look like a public relations piece – clicking through gets you to actual papers e.g. I found this https://ai.google/research/pubs/pub47077

The “sliced across different subgroups” seems to be a warning not rely on on global assessments of say prediction (like the NN that had the lowest mse predicting toxicity by often missing highly toxic but doing very well on lower toxicity).

That sounds more like using the wrong evaluation metric. They didn’t actually want to optimize for low mse since underestimating high toxicity is more dangerous, so why did they train their model to do that?

“The way actual users experience your system …”

Who are the non-actual users and what do they experience when they meet your system?

Is this philosophy of statistics or the statistics of philosophy?

Non-actual users are the three geek friends in the next cubicles who give it a whirl and say “Cool”.

While not actual aliens, their knowledge and reactions have no relation to an actual user.

I read it just as an emphasis expression, something like: the specific type of people that experience your system. Not as a deep philosophical statement.

The non-actual users can be simulated users that you create while training the system. Let’s say I’m building a search system. I’ll simulate user queries and collect them for evaluation. They may or may not be representative of what actual users do when they hit the system in the wild. jrkrideau brings up the intermediate step of the friends in the next cubicles, who, like the simulated test set, may not be represented of the intended target audience for an application.

I assume this is how Google actually implements responsible policy:

https://www.engadget.com/2019/01/21/france-fines-google-over-gdpr/

Being old and cynical as I am, one has to wonder if for Google, and other large similar companies. if the terms “responsible” and “AI” (or “ML”) aren’t oxymorons

I’d be curious to hear of a project or feature that google axed because it didn’t pass this policy. i.e. Is this actionable or only lip service?