Skip to content

Reference for the claim that you need 16 times as much data to estimate interactions as to estimate main effects

Ian Shrier writes:

I read your post on the power of interactions a long time ago and couldn’t remember where I saw it. I just came across it again by chance.

Have you ever published this in a journal? The concept comes up often enough and some readers who don’t have methodology expertise feel more comfortable with publications compared to blogs.

My reply:

Thanks for asking. I’m publishing some of this in Section 16.4 of our forthcoming book, Regression and Other Stories. So you can cite that.


  1. Michael Lugo says:

    Either I’m imagining the book sitting on my desk or this post was pre-written – the book is no longer “forthcoming”, it has come.

    • Andrew says:


      Yes, most of our posts are on approx 6 month delay; see for example here. And then some of the scheduled posts get bumped for more topical material, and some posts can even get bumped twice . . . It’s like when you submit a paper to a journal: once it’s accepted, you can never be quite sure when it will actually appear.

  2. Luigi Leone says:

    I was under the impression that how much one needs to increase the sample size is contingent on the shape of the interaction. For incremental interactions (in one group the effect is stronger than in the other group) I think 16X seems just right; for “knock out interactions” (a strong effect in one group, but a practically nil effect in the other) maybe even 4X could be ok. With the strongest interactions, those that occur with a main effect of about zero, and thus materialize because of opposite effects in the two groups, I was under the impression that samples of similar size (1X) may show adequate power. Am I enormously wrong?

    • Andrew says:


      Yes, that’s right. The rule of 16 is a starting point or rule of thumb. There are cases where the interaction is larger than the main effect, in which case the story is much different.

      • John H says:

        I think the 16x rule also depends on the coding scheme of the factors. For example, if using DOE +1,-1 coding rather than treatment coding, I don’t think the 16x rule holds up, but I’ve not done the simulation.


        • Don’t mean to be mean, but why not just do the simulations rather than ask?

        • Neil Diamond says:

          I made this comment on 30 June this year in another thread but will make it again:

          I missed the 16 discussion last year but if had seen it I would have made a few points.

          The usual definition of an interaction between factors A and B is (at least for two level factors) the difference between the effect of A at high B and the effect of A at low B, divided by two. The division is to make all the standard errors the same.

          Using this definition, and if you assume that interactions are about half the size of main effects, 16 becomes 4.

          Also, for two level (physical) experiments, it is quite common to get the magnitude of the interaction approximately equal to the magnitude of the two main effects. This just means that one combination of the two factors is unusually high or low and the other three combinations give about the same response (Daniel, 1975, page 135).

          • I don’t know about “usual” but I think this is a stupid definition and the excuse “to make the standard errors the same” is the kind of thing you’d expect from some classical math-stats text where no one cares about the purpose of the analysis because it’s all just formal calculation anyway.

            It’s pretty obvious to me that the *applied question of interest* is “how much does the effect change if you move from low B to high B” in which case the definition of interest to the applied person is just “the difference between the effect of A at high B minus the effect of A at low B” and this seems to be the definition Andrew uses, so when you redefine the scale you are also redefining the meaning (you’re saying that the physical outcomes are all 2x bigger than what Andrew assumed they were)

            • Anon says:

              Generally good advice you might want to try: Don’t be a jerk

              • Well, I wasn’t trying to be. But I am usually very frustrated with math-stats and probabilists and other book writers who work on abstract mathematical formula problems and design “definitions” because they make abstract formulas look good. I didn’t mean to imply anything specifically about Neil. He’s probably right that in some corner of the world this is the “usual” definition.

                But think about it from the perspective of someone treating patients… suppose you have a situation where under “low B” giving a drug makes a patient’s blood work say X amount better… and under “high B” the drug makes it X+Z better, and therefore we define the interaction effect as Z/2 ???

                It’s like labeling all the markings on the syringe 1/2 as big as they really are and telling people to just compensate for it by doubling whatever they really want to inject.

              • AllanC says:

                This is sound advice. But I do not believe it is violated by Daniel in his response.

                An assault on an idea is not nearly the same thing as questioning the ability of an indvidual. They may coincide, at times, but certainly they are distinct things.

                In scientific discourse it should be possible to suggest an idea is rubbish witbout the author of the idea taking personal offence. If we do not seperate idea from the individual that generated it I fear science will suffer; as a result of people uncessarily holding back to spare feelings.

              • Thanks Allan, that was in fact the intention, to comment on that particular definition, not the person.

            • Andrew says:

              Just to step back a moment, under whatever definition we use, I agree with Neil that any statement about interactions depends on their magnitude. In the sorts of problems I’ve studied, I think the assumption that interactions (as I’ve defined them) are about half the size of main effects is a reasonable starting point, in that in so many problems people focus on estimating the average treatment effect, under the assumption that interactions are not so crucial. But i believe it could be different in different fields.

              • Yes, but what do you mean by “half the size of the main effect”. I think Neil’s point was that if you give

                A & low B together and get X

                A & high B together and get X + Z

                that he would call the interaction effect Z/2 and so if X and Z were the same size that “the interaction would be half the size of the main effect”

                Is that what you meant?

            • Carlos Ungil says:

              > It’s pretty obvious to me that the *applied question of interest* is “how much does the effect (of high vs low A) change if you move from low B to high B”

              Good for you. Other people have some stupid love for symmetry and care just as much about “how much does the effect (of high vs low B) change if you move from low A to high A”.

              • Carlos Ungil says:

                They may also be interested, for some stupid reason, in getting a measure of the effect of each factor on its own when the other factor is kind-of-undefined.

              • I don’t have problems with symmetry, I have problems with making computations about the world more complex and error prone. I’ve experienced enough people making say treatment mistakes or engineering design mistakes because of using the wrong units

                In this case Andrew made an assumption and someone came along and insisted that the definition of words he was using was “wrong” and meant that all the physical effects were twice as big as what he did in fact mean and that therefore the sample size needed was a factor of 4 smaller than Andrews calculation…

                this is just like coming along and insisting that although Andrew plainly measured everything in inches the usual way the measure things is cm and if we just copy his numbers across and put a cm next to it we can prove the area of the thing is much smaller than Andrew claimed

              • Also as a general principle I think categorical effects should not be used for quantitative variables… voted for vs voted agains… sure use a categorical effect , but “high potassium vs low potassium” use instead a continuous variable in g/ml

                So if you’re talking “voted for vs voted against” etc it’s weird to imagine the reference class as “halfway between”, which is a non-existent state of the world.

              • Of course, I could easily be missing the point… so correct me if I’m misinterpreting… But here’s an example of the kind of thing that I think is just weird about what I understand the proposal to be:

                Suppose you’ve got a base chemical, and there are two changes of interest, you can either add an additional OH group or not… and you can ionize it to -1 charge with an extra electron or not…

                Now you want to know something about a reaction rate under the different conditions… So you define your reference class as “having 1/2 extra OH group, and 1/2 extra electron” and then compare the various conditions as “adding or subtracting and extra 1/2 OH group, or adding and subtracting an extra 1/2 electron”

                That just seems wacky. But maybe I’m misinterpreting.

              • Carlos Ungil says:

                > Suppose you’ve got a base chemical, and there are two changes of interest, you can either add an additional OH group or not… and you can ionize it to -1 charge with an extra electron or not…

                Ok, say whatever quantity you’re interested in is 1 for the base chemical, 2 when you add an OH group, 3 when you add an electron and 4 when you add both the OH group and the electron.

                The usual approach would be to say that the main effect of adding an OH group is +1, the main effect of adding an electron is +2 and the interaction effect as 0. Do you have a better definition?

                It get’s more interesting when there is an interaction effect. Say that the baseline is 1, when you add either one is 3 and when you add both is 4. Now the usual stupid definition gives a main effect for both of +1.5 and an interaction effect of -0.5. On average, adding one feature increases the property of interest by 1.5. But if the other feature is there it adds only 1 (1.5-0.5, from 3 to 4) while if it isn’t it adds 2 (1.5+0.5, from 1 to 3).

              • Here’s what I understood the proposal to be..

                Suppose the base chemical has property 0, and if you add an electron it’s still 0…

                But if you add an OH group without an electron it goes to 1, whereas if you add an OH with an electron it goes to 2.

                Now the direct effect of the electron is 0…

                the direct effect of the OH group is 1

                the interaction effect of the electron is that it increases the effect of OH from 1 up to 2, which I would call an interaction effect of 1

                Now, as I understood the proposal we were going to *define* the word “interaction effect of the electron” to mean:

                “the difference between the effect of A at high B and the effect of A at low B, divided by two.”

                under this definition we’d have (2 – 1)/2 = 1/2 as the “interaction effect of the electron”

                I don’t see how that could make sense. So “to find out how much adding an electron caused the OH effect to increase” we should plug in electron = 1 and then ….

                multiply that by 2 arbitrarily because if we include that factor of 1/2 into the definition of the effect, it makes the standard error formula more beautiful.

                This is what I was finding extremely strange…

              • I guess what’s really being proposed is that we should code “absence of an electron” as -1 and “presence of an electron” as +1, and then the coefficient which represents the “effect of the electron” is 1/2 because it represents the change from the electron variable going from 0 to 1 (meaning we go from having “half an electron up to a full electron”), and when determining how big the effect of going from “no electron” to “one electron” we should realize that we’ve changed the variable “electron” from -1 (no electron) to +1 (we have one electron) so the variable changed by 2 units… so when we multiply by 1/2 we get the correct answer “1 unit”


                And my take on it is that when you have a variable that is naturally either “present” or “absent” it makes sense to code these as 1 and 0 because this is consistent with basic boolean algebra and almost everyone’s intuition about how to code “present” vs “absent”

                And when you have a small number of discrete counts of things, we should use 0, 1, 2 to mean 0,1,or 2 of those things… not for example -1, 0, 1 to mean “0, 1, or 2 things”

              • Chris Wilson says:

                Daniel, I have used -0.5/0.5 or -1/1 coding for binary variables in regression because the resulting coefficients, for instance in a 2X2 factorial design, can be interpreted as the ‘effect’ of moving from baseline to treatment (or present to absent, whatever) averaging over both levels of the other variable. Whether or not you use -0.5/0.5 or -1/1 just rescales things, and IMO isn’t worth getting hung up on. With 0/1 coding, your interpretation is conditional on holding the level of the other variable fixed.

              • Carlos Ungil says:

                I’m not sure about the relevance of your discussion about numeric coding schemes and half electrons. You could code it as “E” for electron and”e” for no electron (with similar coding for the presence/absence of an OH group “H” and “h”).

                In your example:

                eh = 0
                Eh = 0
                eH = 1
                EH = 2

                You seem to prefer the definition:

                (if h) effect of E = 0
                (if e) effect of H = 1
                interaction effect = 1

                Another definition is:

                (average) effect of E = 0.5
                (average) effect of H = 1.5
                interaction effect = 0.5

                If we consider these two definitions, arguably the latter makes more sense in the former in the context of Andrew’s example about “a treatment (that) could be more effective for men than for women” as the “main effect” he talks about is more likely to be the average effect than the effect for one of the subgroups.

                To link both examples, imagine sex is male when the OH group is present and female otherwise. The main effect would be 0.5. The effect for one subgroup would be main – interaction = 0, for the other main + interaction = 1.

                One could also define “interaction effect” as the difference between the effect on males (EH – eH = 1) and the effect on females (Eh – eh = 0) and it would be 1 (twice as much as with the previous definition). With that definition, the effect for one subgroup would be main – interaction/2 = 0, for the other main + interaction/2 = 1.

              • I guess I’d argue that “present / absent” has a natural coding of 1,0 whereas A vs B categories makes sense for a symmetric coding potentially. It’s more important to have a consistent coding that you specify than to claim a “one true coding” specified by mathematical convenience

                The thing that really got me was taking some interpretation in which Andrew does a calculation that is self-consistent, and then calling out a supposedly preferred definition of the units, maintaining the same numerical quantity as you transport the analysis to this new scale, and then claiming that this implies only 4x the sample size instead of 16, as if Andrew’s calculation were wrong.

                This is entirely equivalent to having andrew say the size of his paper was 1×1 (implicitly measured in inches) but claiming that the one true way to measure distances is cm and 1cm x 1cm is a much smaller piece of paper than was originally claimed.

              • Carlos Ungil says:

                It’s a matter of convention.

                The thing that got me is the claim that one widely used convention is stupid because it’s pretty obvious to you that the applied question of interest is the value (EH-eH) – (Eh-eh). How do people dare to care about average effects and to use them as reference for their definition of interaction term?

                Anyway, the main problem with the “you need 16 times the sample size to estimate an interaction than to estimate a main effect” slogan is the implicit assumption that there is a fixed relationship between main effects and interactions – whatever their definitions.

              • Sorry, it’s hard to express things in comments in between doing other things…and sometimes I get too concrete, or too abstract in my explanations.

                I agree it’s a matter of choice, and I guess ultimately that was my point. Mathematicians sometimes like to assert that due to some symmetry argument in the *purely abstract mathematical* formulation that there’s basically a distinguished choice that everyone should use. What makes this obvious in the mathematical situation is that there is NO OTHER information which is relevant, because it’s pure math.

                If there are two choices, then -1, +1 preserves symmetry around 0, and -1/2, +1/2 preserves symmetry around 0 and makes the difference between the two equal to 1… these are all purely mathematical considerations, that make some sense in that context.

                But if you’re coding “has no electron” as -1/2 and “has an electron” as +1/2 you’re likely to create enormous confusion with say physicists to whom it’s perfectly obvious that the absence of an electron is 0 and the presence of an electron is -1 ;-)

                Or if you’re studying breast cancer, and almost everyone with this problem is female, you code female as 0 and male as 1, it’s the “unusual condition” considered as a perturbation compared to the usual case.

                The symmetry argument holds when there’s balance in the population for example, but if 99.7% of everyone has one particular condition, then making that condition 0 and the unusual condition 1 makes much more practical sense in interpretation.

                I’m pushing back against the abstract mathematical formulation being primal and something everyone should conventionally conform to. I’m particularly calling out insisting on some abstract mathematical formulation that reinterprets another person’s model and *changes the meaning* so that you come out with a different answer, but only because you assumed a different meaning to the assumptions.

Leave a Reply