Beyond Power Calculations: Some questions, some answers

Posted on August 28, 2019 9:00 AM by Andrew

Brian Bucher (who describes himself as “just an engineer, not a statistician”) writes:

I’ve read your paper with John Carlin, Beyond Power Calculations. Would you happen to know of instances in the published or unpublished literature that implement this type of design analysis, especially using your retrodesign() function [here’s an updated version from Andy Timm], so I could see more examples of it in action? Would you be up for creating a blog post on the topic, sort of a “The use of this tool in the wild” type thing?

I [Bucher] found this from Clay Ford and this from Shravan Vasishth and plan on working my way through them, but it would be great to have even more examples.

I promised to write such a post asking for more examples—and here it is! So feel free to send some in. I have a couple examples in section 2 of this paper.

After I told Bucher the post is coming, he threw in another question:

I’d also be curious about if you would apply this methodology in cases where there was technically no statistical significance. I’m thinking primarily of these two cases:

(a) There was no alpha value chosen before the study, and the authors weren’t testing a p-value against an alpha, but just reporting a p-value (such as 0.06) and deciding that it was sufficiently small to conclude that there was likely an effect and worth further experimentation/investigation. (Fisher-ian?)

(b) There was an alpha value chosen (0.05), and the t-test didn’t reject the null because the p-value was 0.08. However, in addition to the frequentist analysis, the authors generated a Bayes factor of 2.0 and claimed this showed that a difference between the two groups was twice as likely as having no difference between groups, and, therefore, conclude a difference in groups.

Letter (a) is a decent description of the type of analyses that I often do (mostly DOEs), since I don’t use alpha-thresholds unless required by a third party.

Letter (b) is (basically) something from a paper that I’m analyzing, and it would be great if I could estimate the Type-S/M errors without violating any statistical laws.

I have my fingers crossed, because in your Beyond Power Calculations paper you do say,

If the result is not statistically significant, the chance of the estimate having the wrong sign is 49% (not shown in the Appendix; this is the probability of a Type S error conditional on nonsignificance)—so that the direction of the estimate gives almost no information on the sign of the true effect.

…so I do have hope that the methods are generally applicable to nonsignificant results as well.

Full disclosure, I [Bucher] posted a version of this question to stackexchange but have not (yet) received any comments.

My reply:

We were thinking of type M and type S errors as frequency properties. The idea is that you define a statistical procedure and then work out its average properties over repeated use. So far, we’ve mostly thought about the procedure which is “do an analysis and report it if it’s ‘statistically significant'”—in my original paper with Tuerlinckx on type M and type S errors (full text here), we talked about the frequency properties of “claims with confidence.”

In your case it seems that you want inference about a particular effect size given available information, and I think you’d be best off just attacking the problem Bayesianly. Write down a reasonable prior distribution for your effect size and then go from there. Sure, there’s a challenge here in having to specify a prior, but that’s the price you have to pay: Without prior, you can’t do much in the way of inference when your data are noisy.

49 thoughts on “Beyond Power Calculations: Some questions, some answers”

StanFloyd on August 28, 2019 11:01 AM at 11:01 am said:

Here are two examples:

https://surveyinsights.org/?p=8708

https://www.jmir.org/2017/11/e397/

Reply ↓
Michael Nelson on August 28, 2019 2:31 PM at 2:31 pm said:

I read through the second example given by StanFloyd and I wonder if the authors misapplied the procedure. They report Type M and S errors twice, once for the literature-based expectation (delta = .37, Type M = 1.56) and for the ES observed in their study (d = .26, Type M = 2.13). Is this appropriate? It seems like the wrong way to use Type M error. Instead, they should have said something like “If our a priori delta = .37 is correct, we would expect to observe an effect of about .37/1.56 = .24, which is very close to what we actually observed.” Without a statement like that, and with them having computed two values for Type M, it won’t be clear which Type M value to use when interpreting their results. Or maybe the authors are implying that the “true” Type M rate is between 1.5 and 2, but they never say that. Actually, the only rationale they state for conducting the procedure is “Gelman and Carlin suggest” it. I mean, that’s good enough reason for me… :)

Reply ↓
- Mark Webster on August 29, 2019 6:11 AM at 6:11 am said:
  
  As far as I can tell they don’t calculate errors for the observed interaction effect size, which is 0.7994. d = 0.26 is the lower bound of the confidence interval for the effect from the literature, and they also use the high-end value of d = 0.48. So they’ve used several possible effect sizes given previous literature, which seems OK. I’m guessing that their conclusion is based on all three cases of d * (type M) being smaller than their observed effect size, and the type S errors being small enough to make this simple scaling reasonable. I’m not sure that that’s how Gelman and Carlin intended the errors to be used, but it doesn’t look like an awful approach, if all you’re checking is whether your findings agree with previous literature re: significance, rather than the effect size itself.
  
  Reply ↓
Anoneuoid on August 28, 2019 3:00 PM at 3:00 pm said:

the authors generated a Bayes factor of 2.0 and claimed this showed that a difference between the two groups was twice as likely as having no difference between groups

Say the population is modeled as normally distributed with sd = 1, with all means equally likely a priori.

We call the model with mean = 0 H0, and all other possible models H1. Then to get the bayes factor we calculate the likelihood of the data under H0, and sum of the likelihoods under every possible mean besides 0 for H1. If we integrated over all possible means besides zero, shouldn’t the answer be infinity?

In R:

set.seed(1234)
x = rnorm(10)
pDH0 = prod(dnorm(x, 0, 1))
delta = 10^-(0:5)

bf = data.frame(delta = delta, bf = NA)
for(i in 1:length(delta)){
mu = seq(-10, 10, by = delta[i])
mu = mu[!mu == 0]
pDH1 = sum(sapply(mu, function(m) prod(dnorm(x, m, 1))))

bf$bf[i] = pDH1/pDH0
}

Results approach infinity as smaller intervals are used:

> bf
delta bf
1e+00 3.110063e-01
1e-01 1.551517e+01
1e-02 1.641517e+02
1e-03 1.650517e+03
1e-04 1.651417e+04
1e-05 1.651507e+05

Reply ↓
- Mark Webster on August 29, 2019 5:32 AM at 5:32 am said:
  
  You would need to take the mean of the H1 likelihoods, not the sum: the choice of delta affects the prior’s normalising constant.
  
  Reply ↓
  - Anoneuoid on August 29, 2019 8:16 AM at 8:16 am said:
    
    Thanks, but I dont see why. If I want a composite hypothesis that mu is either 9 or 10, I would use this denominator: P(D|H10) + P(D|H9). If we are using a flat prior this constant should be in the numerator p(D|H0), and cancel out right?
    
    Also, this is what I think of when you say “normalizing constant”: https://en.m.wikipedia.org/wiki/Normalizing_constant
    
    But I would never call that the prior’s normalizing constant. I have never heard of a prior itself having one. Do you mean something else?
    
    Basically I am just looking for the derivation of the correct way to handle this. When I looked it up I came across some pretty odd stuff about needing to use Cauchy priors for some reason, which looked like a fudge to me. And intuitively I would think that probability the mean is exactly zero should approach zero as we include more and more very similar alternative possibilities like 1e-6, -1.2e-5, etc.
    
    The Bayes factor is in this case is essentially the posterior p(H0|D) with all priors cancelled out. The denominator is only slightly less because it is missing p(D|H0), but if we sum enough different terms of similar magnitude the loss of one should be negligible.
    
    Reply ↓
    - Mark Webster on August 29, 2019 10:00 AM at 10:00 am said:
      
      The prior has a normalising constant, because it’s a probability distribution. In any case, I don’t mean the hypothesis prior. I mean the prior distribution for mu, conditional on H1. Since you’ve set this to be uniform, there’s a normalising constant proportional to delta. More details below.
      
      You’re comparing the null likelihood L(x | H0) to the likelihood for H1, L(x | H1) = E(L(x | H1, mu)), where the expectation is over values of mu. This is equal to the sum of L(x | H1, mu) over values of mu, times the probability of that mu, given H1, i.e. sum_{mu} L(x | H1, mu) p(mu | H1). There are 20/delta possible values of mu, and the distribution is uniform, so p(mu | H1) = delta/20 for each possible mu. So the H1 likelihood is
      
      L(x | H1) = E(L(x | H1, mu)) = sum_{mu: p(mu | H1) > 0} L(x | H1, mu) * delta/20.
      
      There’s now an extra multiplicative factor proportional to delta, which should stop the likelihood going to infinity.
    - Anoneuoid on August 29, 2019 10:07 AM at 10:07 am said:
      
      Doesnt that delta/20 also need to be in the numerator or else the prior is not uniform?
    - Anoneuoid on August 29, 2019 10:48 AM at 10:48 am said:
      
      I don’t mean the hypothesis prior. I mean the prior distribution for mu, conditional on H1.
      
      In this case the value of mu is the hypothesis. This is what I calculated:
      
      Starting with Bayes rule for p(H_0|data):
      
      Numerator:
      p(H_0)*p(data|H_0)
      
      Denominator:
      p(data) = p(H_0)*p(data|H_0) + p(H_1)*p(data|H_1) + … + p(H_n)*p(data|H_n)
      
      Then we use a uniform prior, so:
      
      p(H0) = p(H_1) = … = p(H_n)
      
      So the priors all cancel:
      
      Numerator:
      p(data|H_0)
      
      Denominator:
      p(data|H_0) + p(data|H_1) + … + p(data|H_n)
      
      Then call this the composite hypothesis:
      
      p(data|H_c) = p(data|H_1) + … + p(data|H_n)
      
      So we can write:
      
      Numerator:
      p(data|H_0)
      
      Denominator:
      p(data|H_0) + p(data|H_c)
      
      Then the posteriors for the two hypothesis (of “no difference” and “some difference”) are:
      
      p(H_0|data) = p(data|H_0)/[ p(data|H_0) + p(data|H_c) ]
      p(H_c|data) = p(data|H_c)/[ p(data|H_0) + p(data|H_c) ]
      
      The “normalizing constant” in the denominator of each cancels if we take the ratio to get the Bayes factor (actually the reciprocal of what was used originally):
      
      p(H_0|data)/p(H_c|data) = p(data|H_0)/p(data|H_c)
      
      Substituting back in for p(data|H_c):
      
      BF = p(data|H_0)/[p(data|H_1) + … + p(data|H_n)]
      
      Here what I call H_c was called H1 in the code, but I think it is clearer this way…
    - Mark Webster on August 29, 2019 11:20 AM at 11:20 am said:
      
      OK, I think we disagree here:
      
      > Then call this the composite hypothesis:
      > > p(data | H_c) = p(data | H_1) + … + p(data | H_n)
      
      I think this should be a mean, not a sum. For example, if you had a situation where p(data | H_k) = some constant b for all k, the above would give p(data | H_c) > 1 for n > 1/b, instead of p(data | H_c) = b.
      
      Taking the mean here would mean multiplying each value of bf in your original results by delta/20, which still gives you an increasing series, just not one that increases so dramatically.
    - Anoneuoid on August 29, 2019 1:48 PM at 1:48 pm said:
      
      I think this should be a mean, not a sum.
      
      But the sum was already there from the very first step. All I did is aggregate it into a single term for the step you pointed out.
      
      The dividing by n you want to do is already incorporated into the uniform prior, which equals 1/n for all “hypotheses” (including H_0) and so cancels out:
      
      p(H_0) = p(H_1) = … = p(H_n) = 1/n
      
      For example, if you had a situation where p(data | H_k) = some constant b for all k
      
      If I understand this example correctly… then “p(data | H_k) = some constant b for all k” is impossible. How is the likelihood going to be exactly the same for two normal distributions with different means?
      
      But I think that isn’t so important. It is the cancelled priors are causing the confusion. The denominator is:
      
      p(data) = p(H_0)*p(data|H_0) + p(H_1)*p(data|H_1) + … + p(H_n)*p(data|H_n)
      
      When all the priors are equal to 1/n:
      
      p(data) = (1/n)*[ p(data|H_0) + p(data|H_1) + … + p(data|H_n) ]
      p(data)*n = p(data|H_0) + p(data|H_1) + … + p(data|H_n)
      
      Then subtract the term for H_0 from both sides to get the composite hypothesis:
      
      p(data|H_c) = p(data)*n – p(data|H_0)
      
      I could be wrong, but this gives the answer I find intuitively correct as well (see the discussion below with Daniel Lakeland).
    - Mark Webster on August 29, 2019 4:50 PM at 4:50 pm said:
      
      The prior probability for H_c is (n-1)/n, right, so
      
      p(data) = p(data | H_0) * 1/n + p(data | H_c) * (n-1)/n,
      
      or, to match your version,
      
      p(data | H_c) = p(data) * n/(n-1) – p(data | H_0) * 1/(n-1).
      
      We also know that
      
      p(data) = p(data | H_0) * 1/n + p(data | H_1) * 1/n + … + p(data | H_n) * 1/n,
      
      and therefore
      
      p(data | H_c) = p(data | H_1) * 1/(n-1) + … + p(data | H_n) * 1/(n-1).
      
      The example with all the probabilities being b isn’t meant to be an example for the normal distribution case, it’s a example because the normality is irrelevant to the composition of likelihoods.
      
      Also, Lakeland’s not saying anything that relates to Bayes factors, he’s talking about the probability of any one parameter value being zero in a continuous distribution. That applies to the prior, before we have any data. The largest Bayes factor would occur if your alternative hypothesis is that the mean is equal to xbar, but the factor would still be finite.
    - Mark Webster on August 29, 2019 5:00 PM at 5:00 pm said:
      
      Trying a simpler argument. Suppose you have no data. Then p(data | H_k) = 1, for any mean parameter k. Your algorithm would then claim that the Bayes factor against any single parameter value – not just zero – would tend to infinity as you approach a continuous prior. In other words, it would conclude that the empty dataset gives overwhelming evidence against any parameter value, in addition to the information in the prior. Does this seem reasonable?
    - Anoneuoid on August 29, 2019 5:18 PM at 5:18 pm said:
      
      the normality is irrelevant to the composition of likelihoods.
      
      How so? Eg, prod(dnorm(x, 1, 1)) is different than prod(dnorm(x, 1.1, 1)) right? So how can p(data|H1) = p(data|H1.1) = a constant?
      
      The largest Bayes factor would occur if your alternative hypothesis is that the mean is equal to xbar, but the factor would still be finite.
      
      This is a totally different situation. It isnt point hypothesis vs composite hypothesis of everything else.
    - Anoneuoid on August 29, 2019 5:22 PM at 5:22 pm said:
      
      In other words, it would conclude that the empty dataset gives overwhelming evidence against any parameter value, in addition to the information in the prior. Does this seem reasonable?
      
      Yes, this is exactly what my intuition says should be the case.
    - Mark Webster on August 29, 2019 5:25 PM at 5:25 pm said:
      
      You think an empty dataset can give overwhelming evidence against a hypothesis, instead of no evidence at all?
    - Anoneuoid on August 29, 2019 5:38 PM at 5:38 pm said:
      
      Yes if the hypothesis is a priori false given the assumptions being used. Like hypothesizing 1=2. If 1=2 bayes theorem would look different or not exist.
    - Mark Webster on August 29, 2019 5:43 PM at 5:43 pm said:
      
      But a Bayes factor doesn’t count the information from the prior, it’s the ratio between the prior and posterior odds i.e. the effect of the data only. If there’s no data, it should be equal to one.
    - Anoneuoid on August 29, 2019 5:53 PM at 5:53 pm said:
      
      To derive the bayes factor you need to make certain assumptions, I figure somewhere in there it must imply this result.
    - Mark Webster on August 29, 2019 5:58 PM at 5:58 pm said:
      
      If the tested value is impossible, then sure, the Bayes factor is undefined. But mu = 0 isn’t impossible a priori, it just has a probability of zero, as do all the other possible values.
    - Anoneuoid on August 29, 2019 6:32 PM at 6:32 pm said:
      
      Im not clear on the distinction you are trying to make between a set of a assumptions leading to an outcome having zero probability vs “impossible”.
    - Daniel Lakeland on August 29, 2019 6:51 PM at 6:51 pm said:
      
      > Im not clear on the distinction you are trying to make between a set of a assumptions leading to an outcome having zero probability vs “impossible”.
      
      If there’s a set of appreciable size containing the given value whose total probability is zero, then this is a stronger notion than just “you won’t predict this one particular value”.
      
      For example : p(x) = {if x = 1 then proportional to normal(0,1)}
      
      the entire infinite interval for x < 1 has zero probability so not only does say 0 have zero probability but so does any region around 0 +- size less than 1.
    - Mark Webster on August 30, 2019 10:43 AM at 10:43 am said:
      
      OK, forget the case where delta goes to infinity. Just take delta as fixed, and say we have no data. Then mu = 0 has positive probability, and p(data | H_0) is defined as 1. The Bayes factor is then p(data | H_c) / p(data | H_0)
      . But for the other point hypotheses, H_k for k != 0, we also have p(data | H_k) = 1. What should p(data | H_c) be?
    - Anoneuoid on August 30, 2019 11:25 AM at 11:25 am said:
      
      Just take delta as fixed, and say we have no data. Then mu = 0 has positive probability, and p(data | H_0) is defined as 1.
      
      I’m not following this at all unfortunately. It sounds like you are saying to consider the case where mu must be an integer (for example), why would p(data|H_0) = 1 if we have no data?
      
      We were able to consider no data in the continuous case because p(data|H_0) necessarily became negligibly small relative to the sum of many hypotheses with very similar likelihoods. There is nothing like that going on here, so p(data|H_0) would be undefined if we had no data.
    - Daniel Lakeland on August 30, 2019 12:35 PM at 12:35 pm said:
      
      p(data | anything) is a function of the data value prior to observing data. It’s not a number. You can say however that:
      
      integrate(p(data|anything) ddata) = 1 for the integral over all possible data values.
    - Mark Webster on August 30, 2019 4:35 PM at 4:35 pm said:
      
      This is posterior to observing the data, but the data’s length is zero.
    - Daniel Lakeland on August 30, 2019 5:21 PM at 5:21 pm said:
      
      Exactly, after observing no data, the probability distribution over the parameters is the prior, and the probability over any new data point is the prior predictive, which as a density is a function of a free variable, namely whatever value for the data you want to plug in.
    - Mark Webster on August 30, 2019 5:35 PM at 5:35 pm said:
      
      Yes. My somewhat roundabout point is that zero-length data here means the Bayes factor would be one. That requires p(data | H_c) = 1, which it won’t be if you set it to be the sum of the likelihoods for all the sub-hypotheses H_k, since those are also all equal to one.
- Daniel Lakeland on August 29, 2019 10:32 AM at 10:32 am said:
  
  There’s no such thing as a probability distribution that’s uniform over the whole real line.
  
  And yes, if you have a continuum of possibilities for mu, then the probability that any one of them will be the correct one exactly goes to zero, just as if you have x ~ Normal(0,1) then the probability that x = 0 exactly is zero, as is true for any other exact value of x. In order to have a well defined probability you have to integrate over some interval, so the probability that x is in [-0.00001, 0.0001] is a nonzero number for example, but it’s related to the width of the interval.
  
  Reply ↓
  - Anoneuoid on August 29, 2019 11:02 AM at 11:02 am said:
    
    then the probability that any one of them will be the correct one exactly goes to zero
    
    Yes, this is my intuition. That is why I believe the correct calculation of a Bayes factor in this case should result in infinity regardless of the data.
    
    Picking an interval that “may as well be zero for all practical purposes” could work.* I don’t use Bayes factors… just thought the original quote seemed off:
    
    the authors generated a Bayes factor of 2.0 and claimed this showed that a difference between the two groups was twice as likely as having no difference between groups
    
    * I’ll take your word for it for now, but it seems like you would be able to inflate your Bayes factor by using a less precise interval…
    
    Reply ↓
    - Daniel Lakeland on August 29, 2019 11:37 AM at 11:37 am said:
      
      the authors generated a Bayes factor of 2.0 and claimed this showed that a difference between the two groups was twice as likely as having no difference between groups
      
      Yes, if you have some discrete hypotheses, like “the value is 0” or “the value is 1” or “the value is -1” then you can generate a bayes factor for “the value is not 0”
      
      but if you have a continuous probability density then “the value is not zero” has probability 1 a-priori regardless of any data…
    - Anoneuoid on August 29, 2019 2:15 PM at 2:15 pm said:
      
      Can you shed some light on the difference between that and what they do here: https://statswithr.github.io/book/hypothesis-testing-with-normal-populations.html
    - Daniel Lakeland on August 29, 2019 4:51 PM at 4:51 pm said:
      
      Yes, they have two discrete models they’re checking, the model where mu is exactly m0, compared to the model where mu is unknown and has prior normal(m0,sigma^2/n0)
      
      I think the calculation you’re trying to do is comparing two possible subsets of mu under a *single* prior/model, but one subset is infinitesimally large. In other words, using nonstandard analysis where dmu is actually an infinitesimal number:
      
      p(data | mu=0) p(mu=0) dmu / sum(p(data | mu = m) p(mu=m) dmu, for m values from -M to +M step by dmu, with m not equal to 0)
      
      You’ll notice that in the numerator of this quantity is a limited number p(data|mu=0) p(mu=0) multiplied by an infinitesimal number dmu so the numerator is infinitesimal.
      
      On the other hand, in the denominator is an integral which is infinitesimally close to 1.
      
      The result will have to be infinitesimal, and the closest standard number to an infinitesimal number is the number 0.
    - Daniel Lakeland on August 29, 2019 4:54 PM at 4:54 pm said:
      
      Thinking along these nonstandard analysis lines, you can compare the calculation you linked to… in that calculation you can define two priors p1 and p2
      
      p1 = 1/dmu for values between -dmu/2 and dmu/2, that is an infinitely high spike of width dmu around 0….
      
      p2 = the normal distribution function normal(0,sigma^2/n0)
      
      then the numerator for their calculation is:
      
      p(data | mu=0) p1(mu=0) dmu = p1(data | mu=0) 1/mu * dmu = p(data|mu=0)
Nick Adams on August 28, 2019 5:08 PM at 5:08 pm said:

He’s right about type S errors being Fisherian.
A P-value against a null hypothesis of theta less than zero is an indirect measure (via modus tollens) of the risk of a type S error.

Reply ↓
Ron Kenett on August 28, 2019 8:01 PM at 8:01 pm said:

I believe more credit needs to be given to S-type errors than actually given. A clinical researcher is able to state a claim such as: “applying treatment A reduces the effect of B”. He is concerned being wrong in that, in fact, A increases the effect of B. To consider this framework of presenting claims he can state his claims using meaning equivalence alternatives and also, what he is not claiming, with surface similarity alternatives. The S type error controls for meaning equivalence alternative statements that are wrong. The nice thing is that it involves the whole study design in that it “bootstraps” the S-type error. The even nicer thing is that clinical researchers can properly interpret such errors. In fact, and as stated by Gelman, it is about making “claims with confidence”. For an example see https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3035070.

Reply ↓
- Anoneuoid on August 28, 2019 8:54 PM at 8:54 pm said:
  
  A scientific experiment aims at identifying significant effects in a non-biased manner with maximum precision.
  
  No, it doesn’t… All “effects” will be significant if you keep increasing the sample size and don’t keep lowering the significance threshold to compensate.
  
  Reply ↓
  - Anonymous on August 29, 2019 3:50 PM at 3:50 pm said:
    
    Wow. Looking into the thread above I came across this:
    
    In fact, if the critical value increases with the sample size suitably fast, then the disagreement between the frequentist and Bayesian approaches becomes negligible as the sample size increases.
    
    https://en.wikipedia.org/wiki/Lindley%27s_paradox
    
    Reply ↓
Andrew Timm on August 29, 2019 9:26 AM at 9:26 am said:

I’ve gotten an email or two from people who use the package in production with feature requests or questions, I can see if they’d want to share. I think one was in pharma and the other in manufacturing.

Both were situations where slightly exaggerated treatment effects could be a real problem. One mentioned a sort of downstream problem where a small exaggeration at step A snowballed into a much more significant problem down the road.

Reply ↓
Anoneuoid on August 30, 2019 3:01 PM at 3:01 pm said:

Continuing from above:

p(data | anything) is a function of the data value prior to observing data. It’s not a number. You can say however that:

integrate(p(data|anything) ddata) = 1 for the integral over all possible data values.

I’m having trouble following this thread now. Can you clarify what you are responding to?

I think I laid out my reasoning which is based on a few principles of probability and basic algebra pretty clearly… it isn’t clear to me where (if anywhere) Daniel Lakeland and Mark Webster think I made an error.

I see that Mark Webster thinks I should be normalizing to the number of possible values, but this seems to be based more because he does not like the implications of not doing that. I do not see how this should be justifiably incorporated into my calculations.

Reply ↓
- Carlos Ungil on August 30, 2019 3:38 PM at 3:38 pm said:
  
  I think that when “the authors generated a Bayes factor of 2.0 and claimed this showed that a difference between the two groups was twice as likely as having no difference between groups” they were probably putting a mass of probility at mu=0. See for example https://www.ncbi.nlm.nih.gov/pubmed/29441460
  
  Reply ↓
  - Anoneuoid on August 30, 2019 8:09 PM at 8:09 pm said:
    
    I don’t think it really matters what prior you use, there will still be infinitely many very similar likelihoods to the one with mu = 0. In the continuous case the Bayes factor (“some difference” over “exactly zero difference”) should be infinite. So whatever they are calculating must be something else (perhaps they use an interval around zero).
    
    Reply ↓
    - Nick Adams on August 30, 2019 8:57 PM at 8:57 pm said:
      
      There are at least 6 different ways to calculate a Bayes factor (see Held and Ott “On P-values and Bayes factors”).
      I presume they are using BF= -e p log p as this is about 2.0 when the p value is 0.08.
    - Anoneuoid on August 31, 2019 9:00 AM at 9:00 am said:
      
      I followed that to Edwards 1963, where it says:
      
      The example specified by the last two paragraphs has a sharp null hypothesis
      and a rather diffuse symmetric alternative hypothesis with good reasons
      for associating substantial prior probability with each. Although realistically
      the null hypothesis cannot be infinitely sharp, calculating as though it were is
      an excellent approximation. Realism, and even mathematical consistency, demands
      far more sternly that the alternative hypothesis not be utterly diffuse
      (that is, uniform from – 00 to + 00); otherwise, no measurement of the kind
      contemplated could result in any opinion other than certainty that the null
      hypothesis is correct.
      
      Edwards W, Lindman H, Savage LJ. 1963. Bayesian statistical inference for psychological research.
      Psychol. Rev. 70:193–242
      
      So it looks like I am correct. It is based on that “spike and slab” concept which is a fudge someone came up with because they disagreed with what Bayes rule was telling them: it is a waste of time to compare one exact prediction vs “anything else” (ie, NHST).
      
      Instead, they should be comparing the precise predictions derived from multiple explanations people have come up with, along with the associated measurement and theoretical uncertainties. If you do that, you won’t have these types of problems.
    - Carlos Ungil on August 31, 2019 3:28 AM at 3:28 am said:
      
      > In the continuous case the Bayes factor (“some difference” over “exactly zero difference”) should be infinite.
      
      If by “continuous case” you mean that the probability of mu being exactly zero (arbitrarily close to zero) is zero (infinitesimal) that’s precisely what I suggested that is NOT being assumed in the calculation of that Bayes factor.
      
      By the way, I didn’t find the paper referenced by Nick Adams but I found this one from the same authors: https://www.zora.uzh.ch/id/eprint/135381/1/final.pdf
    - Anoneuoid on August 31, 2019 8:33 AM at 8:33 am said:
      
      If by “continuous case” you mean that the probability of mu being exactly zero (arbitrarily close to zero) is zero (infinitesimal) that’s precisely what I suggested that is NOT being assumed in the calculation of that Bayes factor
      
      Yes, it sounds like they are making some sort of contradictory assumption. Ie, that Mu is a continuous variable and also not a continuous variable at the same time. Or that mu is continuous everywhere except at exactly zero, is there an example of something like this existing in nature?
      
      This sounds like a mathematical fantasy/fudge someone came up with to justify doing something that Bayes rule was telling them they shouldn’t be doing (checking for exactly zero difference between groups).
    - Carlos Ungil on August 31, 2019 12:10 PM at 12:10 pm said:
      
      > some sort of contradictory assumption. Ie, that Mu is a continuous variable and also not a continuous variable at the same time
      
      That’s called a mixture. Look it up.
      
      For an example of something like this existing in nature, you may appreciate this one from Haldane:
      
      “An illustration from genetics will make the point clear. The plant Primula sinensis pos- sesses twelve pairs of chromosomes of ap- proximately equal size. A pair of genes se- lected at random will lie on different chro- mosomes in 11/12 of all cases, giving a pro- portion x = .5 of “cross-overs.” In 1/12 of all cases, they lie on the same chromosome, the values of the cross-over ratio x rang- ing from 0 to .5 without any very marked preference for any part of this range, ex- cept perhaps for a tendency to avoid values very close to .5.”
      
      https://projecteuclid.org/download/pdfview_1/euclid.ss/1494489818
    - Anoneuoid on August 31, 2019 1:12 PM at 1:12 pm said:
      
      A pair of genes selected at random will lie on different chromosomes in 11/12 of all cases, giving a proportion x = .5 of “cross-overs.”
      
      This assumes Mendel’s second law always holds, which does not appear to be the case, eg:
      
      Because asymmetrical meiotic division is an almost universal characteristic of female meiosis and functional asymmetry of the spindle poles also appears to be a general feature, the frequency with which nonrandom segregation may take place is dependent on the frequency of functional heterozygosity at loci controlling interaction with the spindle (i.e., fulfillment of the third condition required for nonrandom segregation). Although the overall frequency of functional heterozygosity at loci interacting with the spindle (of which the most obvious are the centromeres of each and every chromosome) is unknown, minimum estimates may be derived in some systems. For example, nonhomologous Rob translocations are the most common chromosome abnormality in humans. These are observed with a frequency of approximately 1 per 1,000 meioses (Hamerton et al. 1975) and all Rob translocations appear subject to nonrandom segregation during female meiosis (see below and Pardo-Manuel de Villena and Sapienza 2001). One might argue that 0.1% functional heterozygosity is not “important” but one must remember that chromosome rearrangements are not the only way of creating such heterozygosity and that centromeres are not the only chromosomal structures that may interact with the spindle. Thus, the true level of this type of functional “centromere polymorphism” is likely to be substantially higher.
      
      https://www.ncbi.nlm.nih.gov/pubmed/11331939
      
      More generally, it’s suspected non-random segregation is very important in maintaining tissue stem cells. One daughter gets the older DNA and remains a stem cell, the other gets the newer DNA and goes on to differentiate into whatever functional tissue cell is needed: https://royalsocietypublishing.org/doi/10.1098/rstb.2010.0279
      
      But assuming the segregation of two chromosomes could be completely independent of each other, that is comparing two different hypotheses regarding the data generating processes. This is different from “the data was sampled from a normal distribution with unknown mean of either exactly zero vs something else. That would be the same process with a different parameter.
    - Martha (Smith) on September 1, 2019 12:58 AM at 12:58 am said:
      
      Thanks, Anoneuoid, for the additional information on meiotic and mitotic asymmetries. Very interesting.

Statistical Modeling, Causal Inference, and Social Science

Beyond Power Calculations: Some questions, some answers

49 thoughts on “Beyond Power Calculations: Some questions, some answers”

Leave a Reply to Anoneuoid Cancel reply