So, if I am interested in evaluating one particular study from the past, it makes sense for me to take the Bayesian perspective and condition on the data (such as the p-value from the study) rather than looking at frequency properties, yes?

I’m thinking of this as being analogous to confidence intervals vs credible intervals. Only credible intervals can give me information about a particular result that I’ve obtained from the data, since confidence intervals speak to the characteristics of the procedure but don’t technically tell me anything about what has happened after I’ve collected the data.

Would that be a not-too-shabby understanding?

]]>From the Bayesian perspective, the right thing to do is to condition on data. If there’s only one sufficient statistic, then all data summaries are equivalent, so you can condition on the z-score or the p-value or whatever. When I did those analyses conditioning on “p less than 0.05,” that’s because I was evaluating the frequency properties of claims made with confidence using standard methods.

]]>For (b), yeah, I wasn’t thinking of conditioning on the Bayes Factor. It was more of using an example from a scientific paper where the authors didn’t reject the null for their chosen alpha (0.05), yet had another line of statistical reasoning to conclude that they demonstrated a real effect.

I just find it so interesting how the smaller the p-value, the greater the Type-M error, given the same other inputs. I realize it’s because in order to get the smaller p-value the observed effect has to be further away from the externally estimated effect size, but I still find it fascinating.

]]>The basic idea is that findings should be presented with alternatives, some with the same meaning, some in ways that appear similar but carry a different meaning. Tis sets a boundary of meaning (BOM).

An example of a findings is: “When I increase X, Y will also increase”. An alternative is to show this with a graph. All this is based on semiotic representations and meaning reusable learning object (MERLO) used in educational psychology.

To evaluate the BOM one can apply an S type errors. i.e. I make the above claim, and I am wrong, Y decreases. The paper refers to an application of the JMP software for doing it.

]]>The problem with Bayes factors (and variations thereof) is that they are quite dependent on priors. IMNSHO, they may be useful when one has a credible set of prior information (such as a meta-analysis of prior results) ; when this is missing, a Bayes factor can get almost any value by selecting the convenient prior. This applies also to the “weakly informative” priors, the “researcher degree of freedom” being here the enxtend to which said researcher whishes to be “weakly” informative…

Aki Vekhtari’s position of comparing models by estimating their out-of-sample predictive behaviors is (partially) a way out of this conudrum… but answers a different question.

]]>as for b) you can do this one using just the p=0.08 and see what you get, conditioning on the Bayes Factor in some sort of general way doesn’t help you much because it’s highly model dependent so there’s nothing generic to say there.

]]>Would apply this methodology in cases where there was technically no statistical significance. I’m thinking primarily of these two cases:

(a) There was no alpha value chosen before the study, and the authors weren’t testing a p-value against an alpha, but just reporting a p-value (such as 0.06) and deciding that it was sufficiently small to conclude that there was likely an effect and worth further experimentation/investigation. (Fisher-ian?)

(b) There was an alpha value chosen (0.05), and the t-test didn’t reject the null because the p-value was, say, 0.08. However, in addition to this NHST analysis, the authors generated a Bayes Factor of 2.0 and claimed this showed that a difference between the two groups was twice as likely as having no difference between groups, and, therefore, conclude a difference in groups exists.

Letter (a) is a decent description of the type of analyses that I often do (mostly DOEs), since I don’t use alpha-thresholds unless required by a third party.

Letter (b) is (basically) something from a paper that I’m analyzing, and it would be great if I could estimate the Type-S/M errors without violating any statistical laws.

In the Beyond Power Calculations paper, it does say:

“If the result is not statistically significant, the chance of the estimate having the wrong sign is 49% (not shown in the Appendix; this is the probability of a Type S error conditional on nonsignificance)—so that the direction of the estimate gives almost no information on the sign of the true effect.”

…so I do have hope that the methods are generally applicable to nonsignificant results as well.

]]>The idea is you have an experiment that uses a sample size that produces a standard error of 1. And you have a mean value calculated from your sample. The “data points” here are as if they were that mean value. The “true mean” is 0.5 but the standard error of the mean is 1… so you put your statistical significance filter on this and you get the dark black results if you repeat the experiment 5000 times.

It shows that *conditional on using statistical significance to decide whether a ‘real effect’ is there* you are either massively overestimating the effect (type M, dark squares near the top) or you aren’t even estimating the effect in the right direction, *and* in this case the magnitude is silly too… (type S, dark triangles at the bottom).

Basically conditional on using statistical significance… you are getting the very wrong answer.

]]>