Question on hypothesis testing

Mike Frank writes,

Hi, I’m a graduate student at MIT in Brain and Cognitive Sciences. I’m an avid reader of your blog and user of your textbook and so I thought I would email you this question in the hopes you have thoughts on it. I’m in a strange position in my research in that I do a lot of Bayesian modeling of cognitive processes but then end up doing standard psychology experiments to test predictions of the models where I have to use simpler frequentist statistical methods (which are standard in psychology, hard to publish without them) to analyze those data.

The basic question is how to compare binomial data from two different conditions in an experiment when there are multiple datapoints from each individual in each condition (so the trials are not independent). The simplest option seemed to me be to use a chi-square test (e.g., compare 54/56 trials correct in one condition with 43/56 trials correct in the other, aggregating across participants). But I’m told this practice violates the independence assumption of the test. I’m not sure I totally understand why this is a problem here, but that may be a separate question entirely.

In contrast, what most psychologists do is calculate a percentage correct for each individual and then do a paired t-test between the two sets of means. But I’ve read that using standard ANOVAs or t-tests on this type of binomial data violates the assumption of normal distribution of the data and is invalid (and can lead to bad inferences in many situations).

More sophisticated people have recommended using a GLM with a logit link function so that it is appropriate to binomial data and then making it a mixed model which can include individuals (subjects) as a random effect. But if I have multiple comparisons between conditions that differ qualitatively (e.g., not along some particular continuum), it seems like I would need to run the GLM on different pairs of conditions and look for a significant effect of condition in each case, and that doesn’t seem particularly elegant either (although at least more appropriate). What I’d really like is just a simple hypothesis test like a chi-square or t-test but appropriate to the form of the data.

My reply:

The t-test comparing the means is correct. You compute the mean for each person and then do a person-level analysis. Thus, you’re not actually using the total percentage correct, you’re using the mean and sd across people for each condition.

The chi-squared test is not so interpretable because it doesn’t give you differences in proportions, it only gives you a hard-to-interpret p-value.

Logistic regression is also fine.

3 thoughts on “Question on hypothesis testing

  1. I'm happy to see this email. I really love the work that comes out of the MIT Brain and Cognitive Sciences department, but I've always found it odd that the bayesian models are typically checked with frequentist methods.

  2. If you want to infer about a population of people then you should be thinking about your sample size in terms of how many people you have in the study. The issue you are running into with the chi-square test is that you could have total cell counts way beyond the number of subjects in the study, which is a sign that something is troubled.

    If you have a good size sample the use of ANOVA shouldn’t be discounted simply because of non-normality of the observations. ANOVA is wonderfully robust to deviations from normality if you have even a modest sample size. It is pretty cool that way.

    On a quick read it is not exactly clear what your concern is about using a GLM with random effects, but the issue looks like a design complication that should be considered in any analytical approach you take.

    If the GLM will give you an analysis closer to the way you are conceiving the research question than the ANOVA I would recommend trying to determine if your design can be accommodated in a GLM (and the ANOVA).

  3. I have completed my research involving 10 struggling readers. I gave a pretest before I started my research. I gave a posttest to the same struggling readers after three weeks of using manipulatives for reading intervention. I use t statistic to analyze the results of my research. My reviewer said that t statistic is only used for 30 or more samples. What is the appropriate statistic to test my hypothesis:

    Ho1: There is no significant relationship between the use of reading manipulatives and phonemic awareness development.

    Ha1: There is a significant relationship between the use of reading manipulatives and phonemic awareness development.

    Thank you in advance for your help.

    Lilia Burton
    Email: [email protected]

Comments are closed.