Comments on: Ken Rice presents a unifying approach to statistical inference and hypothesis testing

By: Ken Rice

Ken Rice — Fri, 02 May 2014 03:04:46 +0000

In reply to West. West: if the null is that s=0, the Bayes rule is based on E[s|n]^2/Var[s|n], equivalently E[r-b|n]^2/Var[r-b|n]. If, a priori, we are absolutely certain that b is really tiny and our prior has no strong dependence between b and r, this is approximately E[r|n]^2/Var[r|n]. If we just set b=0, you want to test r=s=0, and have one observation with n=1. Then, under the uniform and Jeffrey's prior for s, the posterior mean is equal to the posterior variance is equal to the Bayesian Wald statistic, with values 2 and 1.5 respectively. In other words, I think your algebra is right. Under both priors the bulk of the support in the posterior is well within a couple of posterior SDs of the null - so no, the signal:noise ratio is not strong enough to trigger a "reject" signal. (Values of n=3 would trigger a signal, however.) For n=1 the posterior supports values near s=1 (sensibly) but on this scale the noise overwhelms that. If you don't like this answer because you deem the available precision to be irrelevant to your testing decision, the loss function we're using doesn't express your utility; you want to answer a question that's different to the one being addressed. (Perhaps your loss only depends on whether s=0 vs s>0?) NB you might prefer looking at the problem on the Log[s] scale, with the null being some very small value of s.

By: Ken Rice

Ken Rice — Fri, 02 May 2014 01:29:51 +0000

In reply to K? O’Rourke.

Hi Keith, thanks for your nice comments. Here are some clarifications – happy to follow up more by email.

On my “later statement”; yes, I do want to make that distinction. If one wants a comprehensive summary, giving the full posterior is fine – much in line with lots of Andrew’s advice on this blog, I think. But sometimes that’s way too much detail to be practical, and a criteria is needed – a loss function – that describes how good or bad different cruder choices would be (e.g. the value of a point estimate, the yes/no of a test).

Is this completely separating the comprehensive summary from the cruder one? No, if we’re permitting subjunctive use of loss functions; i.e. a rational person who held THIS utility would do THIS, but with THAT utility would do THAT, etc etc. It’s okay, I think, to view comprehensive and crude as answers to different questions. But let’s say what those questions are, explicitly.

On using sign: it’s a bit opaque in the posted slides, but the other losses use signed decisions (only) with continuously-value measures of effect size, albeit measures that don’t accelerate as wildly as the quadratic ones in the paper. I do think effect size should matter to some extent; if the signal we missed is a modest improvement over a sugar pill, that’s bad, but not as bad as missing the next penicillin.

By: West

West — Thu, 01 May 2014 16:37:54 +0000

In reply to Ken Rice. After some thought, I believe my consternation comes from applying this Bayes rule to a Poisson counting problem I am working on and getting a nonsensical result. * I have two Poisson processes whose total rate is r=s+b, where b is known to be very small while s remains unknown and could possibly be zero. * Now I analyze a large amount of data (of length t) and obtain a count of n=1 * The likelihood of getting n=1 if r=b (ala s=0) is very small, say with a pvalue=1e-6. * The Bayes rule is then E[r-b|n]^2/Var[r|n] = E[s|n]^2/Var[s+b|n] * Because b is so tiny, Var[s+b|n] ~> Var[s|n] (this is where I think I am floundering) * If p(s|n) is a gamma distribution, the Bayes rule = shape parameter of p(s|n). If I used an improper uniform prior, it is n+1=2. With a Jeffrey's prior its n+1/2=3/2. If I stick with the improper uniform prior for p(s), the corresponding alpha from the Wald-test is 0.157. So despite the fact that getting 1 count from b alone is incredibly unlikely, this test recommends I "conclude nothing" rather than reject null (r=b). This seems bizarre to me. Now the most likely reason this result makes no intuitive sense is that I am not applying the test correctly.

By: K? O'Rourke

K? O'Rourke — Thu, 01 May 2014 12:45:06 +0000

Ken: This is thoughtful and well written.

I would have preferred an explicit qualification of “either reports some scientific findings [now, given current awareness/evaluation of all relevant studies] — or gives no firm conclusions [for now].

Now your later statement “if the inferential goal is a comprehensive summary of what is known, then reporting ‘nothing’ is inappropriate” makes perfect sense to me but do you mean to separate the evaluation of evidence (comprehensive summary of what is known) completely from any “underlying decision [is] whether a report on some scientific quantity is merited, or not”?

Also, I would have _predefined_ the loss function as most appropriate for statistical testing (as I think Tukey argued, we just want an indication of direction of effect) would only involve getting the sign correct or incorrect (you refer to that loss function in your slides).

Interesting that a much more ambitious goal of getting an accurate estimate specified the loss function that leads to Bayesian analogs of the Wald test…

By: Ken Rice

Ken Rice — Wed, 30 Apr 2014 17:48:34 +0000

West: thanks for your interest.

Using the same loss function but with a prior that has a “spike” at the null, i.e. what one might use when a point null is reasonable, then the Bayes rule still relies on only the posterior mean and variance. So that’s not disconcerting, I think, unless for some reason one wants to insist that Bayesian tests only use the posterior probability of the null or the Bayes Factor.

But the test’s general large-sample agreement with default frequentist methods does go away when the prior has a “spike”. This won’t be disconcerting if interest lies only in the test’s Bayesian properties, and it’s also not disconcerting if only the test’s frequentist properties are key – though it’ll take more work than usual to figure out what those properties are.

However, getting disagreement between Bayesian and default frequentist methods is (by definition) disconcerting if one thinks these analyses should agree – and I think the fact that we all still call the Jeffreys-Lindley paradox a “paradox” suggests that lots of people do expect this agreement. So if you’re disconcerted by it, you’re not alone, but thinking carefully about what the various methods assume and the conclusions they draw should help unravel the issue.

By: West

West — Wed, 30 Apr 2014 07:16:16 +0000

Very interesting! Will definitely look into using this test.

While much is made of the problem of poorly defended point nulls, do the conclusions change if my null is far more defensible? From the argument within the paper, I don’t think it does. Unsure whether that should be disconcerting or not.