Comments on: “Beyond ‘Treatment Versus Control’: How Bayesian Analysis Makes Factorial Experiments Feasible in Education Research”

By: Andrew Wilson

Andrew Wilson — Thu, 15 Aug 2019 15:44:10 +0000

I came across this link elsewhere on this site:
https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html

It seems to be worth a look. Many books and ebooks are available too, but it looks like mostly self study at this point.

By: Anoneuoid

Anoneuoid — Thu, 15 Aug 2019 10:09:16 +0000

In reply to Brent Hutto. If you know R or python it is not difficult at all to do something like this. If you don't, then you will gain a very useful skill independent of anything having to do with bayesian stats.

By: Brent Hutto

Brent Hutto — Wed, 14 Aug 2019 21:33:27 +0000

In reply to Michael Nelson.

Michael,

I gave serious consideration to it last fall and into the winter. Decided that with most likely only 5-6 years left in my career it didn’t merit spending maybe 500-1,000 hours of my own time (in addition to work) to try and become somewhat proficient in techniques that would require major advocacy to even use in any meaningful way.

If I were 10 years younger it might a battle worth waging but then again 10 years ago it would have been even harder to attempt without the modern tools for Bayesian workflow that have become readily available of late.

By: Michael Nelson

Michael Nelson — Wed, 14 Aug 2019 20:44:06 +0000

“Bayesian methods are a valuable tool for researchers” that I fear would require a year or more of coursework for a classically-trained statistician like myself to become proficient at this level. Which I would actually love to do, if anyone is has a fellowship that covers my current salary and benefits, and the tuition costs to boot. :)

By: Dilsher Singh Dhillon

Dilsher Singh Dhillon — Wed, 14 Aug 2019 10:53:22 +0000

In reply to Dilsher Singh Dhillon.

Uh oh. Here’s the post I mean to refer to.
https://dilshersinghdhillon.netlify.com/post/multiple_comparison/

By: Dilsher Singh Dhillon

Dilsher Singh Dhillon — Wed, 14 Aug 2019 10:52:20 +0000

I’ve been exploring a similar area and trying to hash out appropriate simulations!
http://dilsherdhillon.rbind.io
This paper gives me some more ideas to think about. Thanks for posting this!

By: Anoneuoid

Anoneuoid — Tue, 13 Aug 2019 14:25:42 +0000

Andrew said:

I love it. This is stuff that I’ve been talking about for a long time but have never actually done. These people really did it. Progress!

Are you sure? It looks like NHST to me:

Researchers often wish to test a large set of related interventions or approaches to implementation.
[…]
We repeatedly simulate factorial experiments with a variety of sample sizes and numbers of treatment arms to estimate the minimum detectable effect (MDE) for each combination.
[…]
we consider the MDE of this experiment to be the smallest difference in effect size with at least an 80% chance of being found significant in the correct direction by the Bayesian model.
[…]
In our experiment, we set the threshold for significance at .975 to correspond to a two-sided p value at the 95% confidence level, but experimenters may wish to explore other threshold values or even consider dispensing with the intermediate significance calculation altogether.

They give some lip service to “there is no theoretical reason why the posterior probability cannot be used directly as the outcome” and “dispensing with the intermediate significance calculation”, but do not explain to the reader what exactly they would do with this info.

The goal of the study was apparently:

The experiment started with a basic website template that remained the same across all treatment arms and consisted of a map showing school locations at the top followed by a list of schools. Four categories of information were shown for each school: distance to school, academic performance, safety, and school resources. Based on the results of our power calculations, we were able to test a total of five factors in a (3 x 3 x 2 x 2 x 2) configuration, for a total of 72 distinct treatment arms. In this study, the five factors were not examined independently: rather, the experiment sought to identify which of the 72 possible combinations of factor levels represented the best possible design of an information display, after accounting for the interaction effects between factors. In other words, the study sought to identify which of the 72 treatment arms represented the best possible display for each outcome.

Do you even needs stats to do that? This reminds me of the “riddle”:

You collected data from two groups and are interested in the difference using alpha = 0.05. Group A had a mean of 10, while Group B had a mean of 8. The p-value = 0.1. What is the average difference between group A and group B?