Those darn cows: a question about analysis of a sort-of-randomized experiment

Juan Carlos writes,

I just have some simple questions regarding the 50 cows example you used in pp 222 in the BDA book and I was wondering if you could give me a hints.

What would be your reaction if you read a study like the 50 cows experiment? More specifically, how would the decision to re-randomize have an impact on the evaluation? I see this kind of designs pretty often in education research, and I don’t have a definitive take on this. For example, I’ve recently read a paper where the authors randomly assigned students to treatment and control by gender and within schools. But then, they checked to see if they had equal representation between treat/control by disability, ethnicity, and lunch status – and if not, they randomize again until they got a “good balance” in terms of this three covariates.

I see in your book (pp 222) that you said “the treatment assignment is ignorable, but unknown”. How “bad” is this? Would this have any impact on the evaluations results? Would it be better to use a randomized block design (including disability, ethnicity and lunch status)? Why would this be better?

I’ve talked with other statisticians around here, but none of them could give me a definitive answer. Then, the 50 cows example came to my mind, and I knew you’ll be the right person to comment on this. Maybe you or your colleagues could discuss this issue in your blog; other people might we interested in the same topic …

My response: Don Rubin told us about the cow experiment in the class I took from him in 1985–I think he may have got the data during his visit to University of Wisconsin but I’m not sure. Anyway, the correct analysis is basically to do a regression of outcome on treatment, also controlling for the variables used in the treatment assignment. In this case these are simply the pre-treatment variables given in the data table. It’s best if these variables are at least roughly balanced among treatment groups, as this makes the inferences more robust to assumptions about the regression model. (This is an issue we also discuss in our regression chapter, in our incumbency advantage paper, and it’s also been subsequently labeled as “double robustness” in some of the statistical literature.) Anyway, once you control for the variables used in the treatment assignment, it really doesn’t matter at all that the cows were re-randomized, or re-re-randomized, or whatever. The key is that the assignment was just based on that information and nothing else (not, for example, whether the cows looked healthy, if that information has not been recorded). According to Rubin, the cow experiment was pretty clean in that the randomizers really only had that information available.

I think the same analysis would be ok in your school lunch study. Again, the sloppy randomization doesn’t really matter. A randomized block design would be fine too.

4 thoughts on “Those darn cows: a question about analysis of a sort-of-randomized experiment

  1. The sloppy randomisation doesn't matter? Well, you'd have to throw out all randomisation based inference for a start.

    Isn't this an example of the good of the many vs the good of the few? i.e. with proper randomisation you may get a poor randomisation for your study (e.g. all the males get assigned to one group) but over all experiments you have some pretty strong guarantee. If you start picking your randomisations based on some representativeness criteria then you'll do better for your study, but you loose the nice (exact) properties of randomisation based inference. (And sure you might not be using permutation tests, but it's nice to know that they're available, backing up your anova, despite deviation from assumptions)

  2. Hadley,

    We discuss this in Chapter 7 of Bayesian Data Analysis. What's important is the information used in assigning the treatments. If no information is used, then in fact the treatments _must_ be completely randomized; there is no other way to assign treatments that uses no information. In this case, we know what information was used in the treatment assignment so things are ok.

    Randomization does give you robustness (in expectation), though, as also discussed above (and in our Chapter 7).

    I'm generally not interested in permutation tests since they're based on a generally uninteresting null hypothesis of a treatment effect that is exactly zero or exactly some constant. Similarly for so-called exact properties.

    In any case, the key is to control for those pre-treatment variables in your analysis. And that can take some work, especially if you think there might be some nonlinearity or interactions. To my mind, that's the strongest motivation for doing a completely randomized experiment: that you can get away with a simple analysis afterwards. And that really can be a big deal. But it's not about the "randomization inference" for me, it's about the simplicity and interpretability.

  3. I am currently working in Forensic Engineering. Within the subfield that our company works in, statistics is a joke. Plaintiffs see one or two instances of a problem and then immediately call for 100% replacement of these items across hundreds of locations. This is known as "the usual extrapolation". It has nothing to do with real statistics.

    But if you want to use statistics properly for this type of case, randomization is powerful because it's hard to argue with a random number generator.

    However, rarely can we afford to do a fully randomized block design either. Often there are several potentially important considerations, such as manufacturer, batch, weather exposure, and soforth that if we tried to block these items we'd find a cross product of 3x5x7x2 different combinations with either 1 or 0 items in each block :-P When it comes to this kind of situation, the best we can do is to pick the most important factor (perhaps weather exposure) and block that and randomize the rest.

    Often we work with a dataset where someone else has sampled something, and we don't know what the criteria were for the selection of the units. Perhaps 10 windows were tested for leaks. Were they units where people complained about leaks? Did they have water stains near them? Did they all occur on the second floor under a particular type of roof parapet? We don't know. The typical behavior of plaintiffs is to seek out and test only things they know will fail.

    It's a big difference between when no information is used, vs when unknown information is used.

  4. Believe Fisher was arguing for simplicity and interpretability ( specifically the lessening of necessary asumptions ) over efficiency in design (later in his career). Same reason many would avoid doing cross-over studies that require the assumption of no carry over effects …

    Some may find the following note by Stephen Senn of interest regarding the original post http://www.bmj.com/cgi/eletters/330/7495/843#1039

    Keith

Comments are closed.