Skip to content

Sort of multiple comparisons problem

Nick Allum writes:

I have an experiment where I want to test for differences between two randomized treatment groups where the treatment is length of the questionnaire (long/short). The hypothesis is that satisficing will occur in longer questionnaires because people get bored and want to finish the survey rather than take the time to answer questions carefully. I have about 7 multi-item scales and for each one I can generate various indicators of satisficing. For example use of middle category is computed by summing the number of items on which a person chooses the middle category on the multi-item likert scale divided by the number of items in the scale. So I have 4 indicators like this (non-differentiation, middle category, dk rate, use of extreme categories) computed for each scale. I regress treatment on each of these 28 indices and get a coefficient and a standard error for each. As you’d expect, some are significant, some are not. Some have large differences between treatments, some smaller.

I could present all of these and make a stab at what I think it means overall for the effect of questionnaire length on each of the indicators of satisficing but then I thought: maybe I could think of each of the seven attitude scales as repeated observations of the same thing, for example, propensity to choose middle category, and specify some sort of multilevel/random effects model that would summarise the overall effect of treatment on each of the 4 indicators in only four regressions rather than 28. So, my model has 7 observations (level 1) within each of 800 persons (level 2). The only predictor is treatment, which only varies between level 2 units (persons). There are no predictors that vary within persons. So it is a multilevel random intercepts model with only level 2 predictors.

The question is: does this make any sense as a summary of all of those potential 28 separate regressions that has any advantage at all over simply computing the mean over each of the seven scales for each of the four indicators and then regressing the resulting four means with four simple OLS regressions, one for each indicator.

I was thinking of it as a multiple comparisons problem, although I suppose one could treat the aggregation of any multi-item scale in the same way, but we generally don’t. I think this is sort of interesting, but I also have to finish my paper!

My reply:

First, I think that a clear graph of the different averages could do a lot for you. I’m thinking of a set of line plots–you can probably figure out something that’s a 2-way grid of plots, with a few lines in each plot. You could also indicate +/- 1 s.e. on each estimate with a light gray vertical bar.

After this, then, yes, I think you might get some gains by fitting a multilevel model and doing partial pooling toward some sort of additive estimate. I haven’t thought about your problem in detail, but this is the sort of general advice I would give.


  1. In my experience, the surveys I tend to fill out as quickly as possible are those with tedious, ridiculously long lists of likert-type questions stacked together in a compact space. This is common in marketing surveys conducted online. If one had such a survey instrument in which respondents fill in bubbles close together (like an old Scantron test form), a simple method of initial analysis would be to create a graphic replicating the survey instrument in which responses are shaded according to the overall frequency of each response. Thus, for example, if respondents tended to choose the middle category, particularly towards the end of the survey, this bubble would be shaded progressively darker.

    Of course, if others are like me in such circumstances and tend to fill in the same bubble (I may choose the "somewhat happy" option, which would be neither a middle nor an extreme category, when describing my attitude toward 32 different household products) but vary in choosing different bubbles to stick with, then the patterns would be evident in a quick glance at our individual instruments but not necessarily in a graphic plotting the overall distribution of responses.

  2. Nick Allum says:

    That sounds like a useful way of presenting likert data, although mine are not all in the format you describe.

    I have started trying to create plots as Andrew suggests. I have alighted on a series of Forest plots (often used in meta-analysis). I will be able to show pooled estimates from a MLevel analysis on these same plots.

  3. Keith O'Rourke says:

    As for "fitting a multilevel model and doing partial pooling toward some sort of additive estimate" especially if you can construct a meaningful judgementally based aggregation score the details in this paper might be helpful

    Greenland S, O’Rourke K. On the bias produced by quality scores in meta-analysis, and a hierarchical view of proposed solutions. Biostatistics. 2001 Volume 02, Issue 4, pp. 463-471

  4. Nick Allum says:

    Thanks for the reference. Very helpful.