https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html

It seems to be worth a look. Many books and ebooks are available too, but it looks like mostly self study at this point.

]]>I gave serious consideration to it last fall and into the winter. Decided that with most likely only 5-6 years left in my career it didn’t merit spending maybe 500-1,000 hours of my own time (in addition to work) to try and become somewhat proficient in techniques that would require major advocacy to even use in any meaningful way.

If I were 10 years younger it might a battle worth waging but then again 10 years ago it would have been even harder to attempt without the modern tools for Bayesian workflow that have become readily available of late.

]]>https://dilshersinghdhillon.netlify.com/post/multiple_comparison/ ]]>

http://dilsherdhillon.rbind.io

This paper gives me some more ideas to think about. Thanks for posting this! ]]>

I love it. This is stuff that I’ve been talking about for a long time but have never actually done. These people really did it. Progress!

Are you sure? It looks like NHST to me:

Researchers often wish to test a large set of related interventions or approaches to implementation.

[…]

We repeatedly simulate factorial experiments with a variety of sample sizes and numbers of treatment arms to estimate the minimum detectable effect (MDE) for each combination.

[…]

we consider the MDE of this experiment to be the smallest difference in effect size with at least an 80% chance of being found significant in the correct direction by the Bayesian model.

[…]

In our experiment, we set the threshold for significance at .975 to correspond to a two-sided p value at the 95% confidence level, but experimenters may wish to explore other threshold values or even consider dispensing with the intermediate significance calculation altogether.

They give some lip service to “there is no theoretical reason why the posterior probability cannot be used directly as the outcome” and “dispensing with the intermediate significance calculation”, but do not explain to the reader what exactly they would do with this info.

The goal of the study was apparently:

The experiment started with a basic website template that remained the same across all treatment arms and consisted of a map showing school locations at the top followed by a list of schools. Four categories of information were shown for each school: distance to school, academic performance, safety, and school resources. Based on the results of our power calculations, we were able to test a total of five factors in a (3 x 3 x 2 x 2 x 2) configuration, for a total of 72 distinct treatment arms. In this study, the five factors were not examined independently: rather, the experiment sought to identify which of the 72 possible combinations of factor levels represented the best possible design of an information display, after accounting for the interaction effects between factors.

In other words, the study sought to identify which of the 72 treatment arms represented the best possible display for each outcome.

Do you even needs stats to do that? This reminds me of the “riddle”:

You collected data from two groups and are interested in the difference using alpha = 0.05. Group A had a mean of 10, while Group B had a mean of 8. The p-value = 0.1. What is the average difference between group A and group B?

]]>