All this discussion is good because people are moving beyond unthinking use of 95% intervals.

]]>Most of the statistician I have worked with who apply statistics (Hospital research institutions, Cochrane Collaboration, Oxford Centre for Statistics in Medicine, regulatory agencies, etc.) are not particularly knowledgeable or interested in non-standard or advanced statistics (i.e. anything they did not study when they were in graduate school.)

They generally know what the traditional approaches to statistics are (Anova, regression, GLMs, Repeated measures Anova, and naive random effects or generalized estimating equations and possibly default prior Bayes), and that’s what they expect to be using in their work.

There maybe some hope for re-education here especially given good blogs, on line video webinars and running prototype analysis scripts (e.g. Github repositories.)

Or maybe I’ve just been unlucky.

Perhaps the bigger challenge will be teaching that statistical tool box referred to here http://www.dcscience.net/Gigerenzer-Journal-of-Management-2015.pdf whatever that needs to be???

]]>Expected return is the general purpose method.

]]>Now, when it comes to testing a null hypothesis vs their hypothesis, they should know better.”

People who never had the inclination to figure out what a distribution is can’t be expected to understand what is the problem with testing a null hypothesis vs their hypothesis. There is, regrettably, a surprising amount of ignorance out there. Once there was a proposal among some in my university for math to teach a combined calculus/statistics course for biological science students. One proponent said “Don’t waste time talking about distributions. Just teach them how to read an ANOVA table.” (Please don’t take this as a diatribe about all biological scientists. My experience is that they range widely in their understanding of statistics and their openness to being told when/why they are doing something unwarranted.)

]]>Agree that statistics is seldom not hard and I was not suggesting that traditional analyses should be a standard or even a usual default.

> There’s a better chance of people interpreting posterior probabilities correctly

If those are taken as omnipotent (or near omnipotent as with well understood disease screening applications) then probabilities likely will be better interpreted (at least with some training.)

Now here, http://statmodeling.stat.columbia.edu/2016/08/22/bayesian-inference-completely-solves-the-multiple-comparisons-problem/ is the problem all in the Bayesian setting with a flat prior (or near flat Normal prior with huge variance) that is often used as a default versus Bayes with _sensible prior_ or Bayes again with a sensible prior versus Frequentist with no prior at all?

Logically first versus seem the same to me as the second versus.

I do worry about thoughtless Bayes becoming the new default analysis ritual http://www.dcscience.net/Gigerenzer-Journal-of-Management-2015.pdf

Also, I do think coverage in repeated use of Bayes does need to be considered http://statmodeling.stat.columbia.edu/2016/11/05/why-i-prefer-50-to-95-intervals/#comment-341799 except in exceptional applications.

]]>I wouldn’t take the opposition to Bayesian methods from clinical/biomed researchers seriously. The vast majority are just following rote instructions/flow charts when it comes to data analysis. They never had the time/inclination/need to figure out what a distribution is, etc. That opposition fades away as soon as you give them the reason and tools to do mcmc parameter estimation, etc.

Now, when it comes to testing a null hypothesis vs their hypothesis, they should know better. In that case it is just an attempt to save face because what has been passing for scientific reasoning is so ridiculously and obviously fatally flawed.

]]>What really puzzles me is why there is so much opposition to Bayesian methods (from personal experience) – I think the answer is largely to do with people in the field feeling heavily invested in the existing paradigm, not understandding the problems and not wanting change.

]]>> generally speaking the former implies the latter.

Yes, if the prior and data model is true which of course they never actually are – so two terms for “Calibrated interval”, “coverage interval” are also needed – Sander Greenland coined omnipotent for coverage assuming prior and data model is true.

Yes, if one has credible/sensible priors and data models (that have been checked) as well as good insight for what to make of the resulting posterior probabilities.

> but virtually nobody does

And most that do just use default priors and data models that most likely were not checked – just pulling the Bayesian crank and claiming “all done”.

(Andrew provided a example of how bad this can be for repeated use of a default prior implied credible intervals.)

> Why is this difficult or controversial?

I think both, but the insurmountable opportunity (challenge) is to make it less difficult and thoughtfully less controversial.

* “Confidence interval”, “uncertainty interval”, “credible interval” — which connote data-conditional (epistemic) properties of the interval itself, and

* “Calibrated interval”, “coverage interval” — which connote truth-conditional (sampling) properties of the interval-generating method.

Of course more often than not they line up (under *some* interpretation), but generally speaking the former implies the latter.

]]>That’s one of the mistakes of the whole p-value/significance enterprise. IMO the handwaving around the importance of multiple comparison corrections is just a backdoor means of introducing implicit utilities and implicit base rates.

]]>If you have a 95% posterior probability that the return is 1500-1555, then make the investment. If it’s [-1300, 7000], it’s not a clear cut decision then, is it? Likewise, if the 95% interval is [1, 10] (only 1-10 dollar return), it would still be a safe bet, and I’d take that bet.

]]>;-)

]]>Making decisions based on 95% intervals pretty much works when those intervals are small and the consequences of the decision lie in a narrow region of outcome space… in other words when there really isn’t much uncertainty.

Suppose a 95% interval for the 1 yr return on a 1000 dollar investment is 1500 to 1555, now suppose it’s -1300 to 7000

]]>In my experience (clinical research) almost everybody has little idea what a confidence interval means, and they just produce 95% CIs because (a) that gives them an easy way of seeing if the result is “significant” and (b) statisticians told them they should do this about 20 years ago (and journals took it on board). Confidence intervals are almost invariably given a Bayesian interpretation (most likely values etc) – not surprisingly, because that is what people want to know. Given that, it makes sense to use a Bayesian analysis – but virtually nobody does. Why is this difficult or controversial?

]]>There’s something that fascinates me about the name Andy Yap, because my name is Andy and I yap all the time!

]]>I’ve become very fond of caterpillar plots that show both a narrow (you like 50%, I like 66%) and wide (95%) intervals.

]]>Priors are going to have a big effect on the endpoints of 99.99% intervals. And you’re going to need a huge sample to estimate the necessary .00005 and .99995 quantiles—only the order of millions of draws, all of which will need to be saved or you’ll need a special online algorithm (which can handle extreme empirical quantiles with very limited memory).

]]>http://statmodeling.stat.columbia.edu/2016/09/24/a-break-in-the-thin-blue-line/#comment-314390

]]>Looks interesting. I looked at their documentation but couldn’t figure out what they’re doing. I guess that’s standard with non-academic software, they want to keep their algorithm a secret? Maybe we should get someone to build something similar using Stan.

]]>Have you ever seen this app? https://www.getguesstimate.com/ I think it is pretty neat to play around with. Also it gets to a question of uncertainty for what purpose?

]]>When I wrote about power pose before the recent Carney statement, what I wrote was that Cuddy was defending the work and Carney and Yap were keeping quiet. I wondered if Carney and Yap were not speaking up because they were hoping the whole thing would go away. After Carney’s statement came out, I wrote that Carney had changed her position but Cuddy was still defending the work. Yap remains quiet, but last time I checked, the power pose paper was still featured on his website.

]]>You wrote recently about researchers explaining away failed replications by postulating all sorts of new modulators, and said that “This was what the power pose authors said about the unsuccessful replication performed by Ranehill et al.” That’s not quite true; only Amy Cuddy said that.

Dana Carney, the lead author of the original power pose paper, now has a statement on her website (http://faculty.haas.berkeley.edu/dana_carney/vita.html, “My position on ‘Power Poses'”) that begins:

“Reasonable people, whom I respect, may disagree. However since early 2015 the evidence has been mounting suggesting there is unlikely any embodied effect of nonverbal expansiveness (vs. contractiveness)—i.e.., “power poses” — on internal or psychological outcomes.

As evidence has come in over these past 2+ years, my views have updated to reflect the evidence. As such, I do not believe that “power pose” effects are real…”

The citation to the paper on her website is now immediately followed by a note:

” ***This result failed to replicate in an adequately powered sample. See: Ranehill, Dreber, Johannesson, Leiberg, Sul, & Weber (2015). [.pdf] and a p-curve analysis also suggested the effect is not likely real [.pdf]”

I feel this blog could use more positive examples of researchers responding well to failed replications, and this is an excellent example of that.

]]>68% as it is similar to a standard deviation.

66% as it is also easy to describe. Using the UN IPCC guidance note on uncertainty you could call this the “likely range”. https://www.ipcc.ch/pdf/supporting-material/uncertainty-guidance-note.pdf

]]>