Someone who knows that I hate the so-called Fisher exact test asks:

I was hoping you could point to a Bayesian counterpart or improvement to “Fisher’s exact test” – for 2 x 2 categorical, contigency tables with possibly very small numbers (too small to do a chi-square.) I see that you had a blog post on it before (1) but there are several issues i’m unclear about:

(i) What would a full applied Bayesian analysis look like of this type of problem, in general? I have seen one beta-binomial like analysis but never in practical/applied examples. Any practical examples you may have for this, e.g. papers or code examples you’ve used in teaching, would be great.

(ii) What if we add the twist that the data from the two populations for our 2 x 2 test is paired? e.g. we have several male and several female patients, and the two conditions are drug / no drug. But, each male and female are paired as they are twins (which breaks the independence of the samples obviously.) How is this modeled from a Bayesian perspective?

(iii) Less important: when in practice is it ok to use Fisher’s exact test if you’re open to Bayesian analysis? ‘Never’ is a reasonable answer but i’d like to understand practical reasons why you think this. Finally, if all of our data counts are greater then 10, do you think its legitimate to use a chi-square?

My reply:

(i) The basic analysis is pretty simple, it goes like this:

y1 ~ Binomial (n1, p1)

y2 ~ Binomial (n2, p2)

We need a prior distribution on (p1,p2), and we usually assume that n1,n2 provide no information about p1,p2. (This latter point depends on the design of the study, but I’m keeping it simple here.) What’s a good prior distribution depends on the problem, but in many cases, a simple uniform distribution on (p1,p2) will be fine. Whatever your prior is, you then throw in the likelihood and you get posterior inference for (p1,p2). Draw 1000 simulations and then use these to get inference for p1-p2. That’s it. With moderate or large sample sizes, this is basically equivalent to the standard t-test.

If you have many tables, you can set up a hierarchical model for the p’s. We have an example near the end of chapter 5 of Bayesian Data Analysis.

(ii) With paired data, you can fit a logistic regression. Call the data y_ij, where i=1 or 2 and j is the index for the pairing. Then you can model Pr(y_ij=1) = invlogit (a_i + b_j), with a hierarchical model for the b_j’s, something like b_j ~ N (mu_b, sigma_b^2), with weakly informative or flat prior distributions on mu_b, sigma_b.

(iii) The only case I could even imagine using the so-called Fisher exact test is if the data were collected so that the row and column margins were both pre-specified. The only example I can think of with this design is Fisher’s tea-tasting experiment. In all cases I’ve seen, at most one margin is preset by design. Also, I’d never do a chi-squared test in this setting. See chapter 2 of ARM for an example where I thought a chi-squared test was OK.

I know you dislike Bayes factors and the posterior probabilities derived from them, but it seems worth pointing out that such approaches do exist, e.g., this example from Bayesian Computation in R.

You as a statistician may like the recent election to the Moscow "Duma" (Parlament)

Of course, the winner is the Putin's and Moscow Major's "Edinaya Rossiya"

This is the chart

http://pics.livejournal.com/uborshizzza/pic/005dg…

each dote isa Moscow district

Axis X is the percent of people participated in the election at that Moscow district

Axis Y is the number of people voted for a certain political party

Blue dotes is a winner "Edinaya Rossiya"

Red dots are "Russian communists"etc

For example,

for Moscow district with 25% attendance rate (25% of registered voters attended an election) around 15% voted for Edinaya Rossiya and 4.5% voted for Communists and little bit smaller number of people voted for other parties

for Moscow district with 50% attendance rate (50% of registered voters attended an election) around 40% voted for Edinaya Rossiya and 4.5% voted for Communists and little bit smaller number of people voted for other parties

How would you explain it?

I worked the simple beta-binomial model Andrew suggests agove for an example drawn from the Wikipedia article on contingency tables.

I lay out the model in a bit more detail and work through an example using R, up to computing posterior quantiles and plotting the posterior density of p1-p2. Here's the link:

http://lingpipe-blog.com/2009/10/13/bayesian-coun…

There's nothing like a small programming exercise in the afternoon to wake me up after lunch.

I expanded on Andrew's description with R code and a worked example from the Wikipedia page on contingency tables (for easy comparison):

http://lingpipe-blog.com/2009/10/13/bayesian-coun…

R is just so much fun.

In employment discrimination litigation you can get fixed marginals.

Say a company has 100 employees, of whom 20 are (old/black/female/disabled/gay … pick a group). The company is forced to layoff 30 employees due to reduced sales. So essentially, one marginal is fixed by design at 80/20, and the other is fixed by design at 70/30. The issue is whether layoffs are assigned disproportionately to one of the two groups.

This is not to argue that the proposed bayesian method is inappropriate, but just to suggest that the assumption of fixed marginals is not always loony.

Teemo: Interesting example. I think the unusualness of it reinforces my point that typically it doesn't make sense to condition on both margins. I just get so frustrated that people do this conditioning without thinking about what they're doing. It's expecially frustrating when it's done by so-called frequentists, whom I would think should be especially concerned with modeling the actual data-generation process.

Teemo beat me to it, since it's an example I use all the time, but the further argument even in the case where only one margin is fixed is that in most cases the fixing of the other margin is approximately ancillary, ie while it doesn't strictly give no information, it is minimally informative. Thus, the tension between the FET and the Bayesian is really a battle of what information are you making up and trying to have little impact: the prior or the information in the fixing of the second margin? Neither is really completely objective.

Andrew,

Have you seen this old article by Sander Greenland (The American Statistician. 45(3). 248-251. 1991) that argues fixed margins are sometimes justified on the basis of counterfactual inference. I think Neyman may have made a similar point, but I can't find that reference.

Jonathan:

My main problem with the so-called Fisher exact test is that it has practical problems; see the article by Agresti and Coull for more discussion. It's basically dominated by the simple difference-of-two-independent-binomials analysis. Really, all the so-called exact test has going for it is the presumption that it is "exact." My point is that, except in very rare circumstances, it's not exact at all. Or, to put it another way, it's an exact solution to a problem you'll never see.

The situation gets even worse when you get to more complicated problems. People have put huge amounts of time into the technically-demanding problem of computing p-values based on distributions of multiway contingency tables with fixed margins. But there's never really a thought as to why they're doing this. It's a solution in search of a problem, really.

Contrariwise, the Bayesian approach is direct, it's simple in the easy problems, and it easily generalizes to harder problems.

In the sprit of the nicely balanced reponse Pearl posted regarding causal analyses to this blog about a month ago.

There has been a lot of serious thought given to the motivation and properties of FET (perhaps most of the literature on higher order aymptotics)

– it is the least wrong way to evade a nuissance parameter in exponential models and its the least assumming analysis for a two group RCT with binary outcomes – groups were randomized and treatment had absolutely no effect (i.e. Fisher's strict Null so success and failures are fixed)

okay the approach demonstrated in FET is indirect, it's hard even in the easy problems, and it almost never generalizes to harder problems.

but whats wrong with something that does work in simple cases (as long as the motivations are not misperceived)?

such as the much simpler direct simulation of the posterior from the joint distribution for Bob' simple demonstration example (R code below)?

Keith

# DATA

n1 = 52 # men

y1 = 9 # left-handed men

n2 = 48 # women

y2 = 4 # left-handed women

# SIMULATION

I = 10000 * 500 # simulations

p1 = runif(I,0,1)

possible.y1=rbinom(n=I,prob=p1,size=52)

p2 = runif(I,0,1)

possible.y2=rbinom(n=I,prob=p2,size=48)

joint=data.frame(p1=p1,p2=p2,y1=possible.y1,y2=possible.y2)

posterior=joint[joint$y1==9 & joint$y2==4,]

dim(posterior)

posterior$diff = posterior$p1-posterior$p2

# OUTPUT

quantiles = quantile(posterior$diff,c(0.005,0.025,0.5,0.975,0.995))

print(quantiles,digits=2)

print("Probability higher % men than women lefties:")

print(mean(posterior$p1>posterior$p2))

The p-value of the Fisher's exact test (FET) is equivalent to the posterior probability Pr(p1<p2|y1,y2) if we choose the priors p1 ~ Beta(0,1), p2 ~ Beta(1, 0) — an extreme choice that's really hard to justify for anyone let alone for a non-Bayesian.

Due to this non-symmetric setup, the FET is extremely conservative, i.e. loathe to reject the null: see e.g. Stephen Senn's article "Lessons from TGN1412". FET doesn't reject 6/6 vs 0/2 events as to be significantly higher at 2.5% level. The implied prior is thus very skeptical about the alternative, but of course this skepticism diminishes as sample size gets larger.

Reference: Altham, P (1969), Exact Bayesian Analysis of the 2×2 Table, and Fisher's "Exact" Significance Test. JRSS (B) 31.2.261-269.

Thanks for the reference to the Altham paper, Jouni! It will make good reading material for my bus ride home.