Seth on small-n and large-n studies

After reading Seth Roberts’s article on self-experimentation, I had a dialogue with him about when to move from individual experimentation to a full-scale controlled experiment with a large-enough n to obtain statistically significant results. My last comment said:

But back to the details of your studies. What about the weight-loss treatment? That seems pretty straightforward–drink X amount of sugar water once a day, separated by at least an hour from any meals. To do a formal study, you’d have to think a bit about what would be a good control treatment (and then there are some statistical-power issues, for example in deciding whether it’s worth trying to estimate a dose-response relation for X), but the treatment itself seems well defined.

Seth replied as follows:

Here are some relevant “facts”:

Long ago, John Tukey said that he would rather have a sample of n = 3 (randomly selected) than Kinsey’s really large non-random samples. He did not explain how one would get a randomly selected person to answer intimate questions. Once one considers that point Kinsey’s work looks a little better — because ANY actual sample will involve some compromise (probably large) with perfectly random sampling. Likewise, the closer one looks at the details of doing a study with n = 100, the more clearly one sees the advantages of smaller n studies.

How do the results of self-experimentation make their way in the world? An example is provided by blood-sugar testing for diabetics. Now it is everywhere — “the greatest advance since the discovery of insulin,” one diabetic told me. It began with self-experimentation by Richard Bernstein, an engineer at the time. With great difficulty, Bernstein managed to present his work at a scientific conference. It was then followed up by a British academic researcher, who began with relatively small n studies. I don’t think he ever did a big study (e.g., n = 100). The benefits were perfectly clear with small n. From there it spread to become the norm. Likewise, I don’t think that a really large study of my weight-loss ideas will ever be necessary. The benefits should be perfectly clear with small n. Fisher once said that what is really convincing is not a single study with a really low p value but repeated studies with p < .05. Likewise, I don't think that one study with n = 100 is half as convincing as several diverse studies with much smaller n. It is so easy to believe that bigger is better (when in fact that is far from clear) that I wonder if it is something neurological: Our brains are wired to make us think that way. I cannot remember ever hearing a study proposed that I thought was too small; and I have heard dozens of proposed studies that I thought were too large. When I discussed this with Saul Sternberg, surely one of the greatest experimental psychologists of all time, he told me that he himself had made this very mistake: Gone too quickly to a large study. He wanted to measure something relatively precisely so he did an experiment with a large n (20 is large in cognitive psychology). The experiment failed to repeat the basic effect.

P.S. Seth’s paper was also noted here.
See also here for Susan’s comments.