Statistical experiments and science experiments

The other day we had a discussion about a study whose conclusion was that observational studies provide insufficient evidence regarding the effects of school mask mandates on pediatric covid-19 cases.

My reaction was:

For the next pandemic, I guess much will depend on a better understanding of how the disease spreads. One thing it seems that we’ve learned from the covid epidemic is that epidemiological data will take us only so far, and there’s no substitute for experimental data and physical/biological understanding. Not that epi data are useless—for example, the above analysis shows that mask mandates have no massive effects, and counts of cases and deaths seem to show that the vaccines made a real-world difference—but we should not expect aggregate data to always be able to answer some of the urgent questions that can drive policy.

And then I realized there are two things going on.

There are two ideas that often get confused: statistical experiments and science experiments. Let me explain, in the context of the study on the effects of masks.

As noted above, the studies of mask mandates are observational: masks have been required in some places and times and not in others, and in an observational study you compare outcomes in places and times with and without mask mandates, adjusting for pre-treatment variables. That’s basic statistics, and it’s also basic statistics that observational studies are subject to hard-to-quantify bias arising from unmodeled differences between treatment and control units.

In the usual discussion of this sort of problem in statistics or econometrics, the existing observational study would be compared to an “experiment” in which treatments are “exogenous,” assigned by an outside experimenter using some known mechanism, ideally using randomization. And that’s all fine, it’s how we talk in Regression and Other Stories, it’s how everyone in statistics and related sciences talks about the ideal setting for causal inference.

An example of such a statistical experiment would be to randomly assign some school districts to mask mandates and others to a control condition and then compare the outcomes.

What I want to say here is that this sort of statistical “experiment” is not necessarily the sort of science experiment we would want. Even with a clean randomized experiment of mandates, it would be difficult to untangle effects, given the challenges of measurement of outcomes and given the indirect spread of an epidemic.

I’d also want some science experiments measuring direct outcomes, to see what’s going on when people are wearing masks and not wearing masks, measuring the concentrations of particles etc.

This is not to say that the statistical experiment would be useless; it’s part of the story. The statistical, or policy, experiment is giving us a sort of reduced-form estimate, which has the benefit of implicitly averaging over intermediate outcomes and the drawback of possibly not generalizing well to new conditions.

My point is that we when use the term “experiment” in statistics, we focus on the treatment-assignment mechanism, which is fine for what it is, but it only guards against one particular sort of error, and it can be useful to step back and think about “experimentation” in a more general sense.

P.S. Also relevant is this post from a few months ago where we discuss that applied statistics contains many examples of causal inference that are not traditionally put in the “causal inference” category. Examples include dosing in pharmacology, reconstructing climate from tree rings, and item response and ideal-point models in psychometrics: all of these really are causal inference problems in that they involve estimating the effect of some intervention or exposure on some outcome, but in statistics they are traditionally put in the “modeling” column, not the “causal” column. Causal inference is a bigger chunk of statistics than might be assumed based on our usual terminology.

16 thoughts on “Statistical experiments and science experiments

  1. Andrew – I believe the distinction you make here is related to the generalizability dimension. Statistical experiments provide estimates of coefficients and from there one should be able generalize to conditions within and outside the design space. Scientific experiments are providing a different type of generalizability which relies on scientific knowledge.

    If we had a good grasp of assessing and reporting generalizability of findings, we would know how to combine these.

    Causal estimates carry an important generalizability property. Presumably a validated causal statement is more generalizable than a descriptive analysis. A study reproducing evidence of a causal effect reinforces this generalizability.

    The mask-COVID19 studies, in an earlier blog, permit us to discuss this issue. It attracted many responses. What can we learn from all this regarding the assessment of findings’ generalizability? I raised there some questions which were left unanswered…

  2. I remember some of the papers I found most interesting were the restaurant outbreak case studies, specifically the one from South Korea where they measured the air currents in the room and had cctv to know how long and where people were exposed.

    I also think we haven’t quite had the ethical debate on challenge trials, especially when we expand beyond just using them to test vaccines, and start infecting and exposing volunteers to study the spread of the disease. I feel observational studies were quite limited in answering questions of presymptomatic/asymptomatic contagiousness, mask efficacy, air flow effects, surface spread, etc. Being able to fully observe and control the conditions from before exposure all the way through recovery seems valuable.

    Also, question for the commenters. Does anyone know of good papers on presymptomatic contagiousness? I remember being underwhelmed by the evidence early in the pandemic. I was a bit disturbed given it was a significant motivator for mask mandates and lockdowns.

  3. Nice post for motivating methodological diversity in fields that still give an outsized weight to statistical experiments.

    When I read the title, a slightly different distintion came to mind, related to different ways we can treat the results from a controlled experiment: in one case (which we could call the statistical experiment) we treat the experiment like a black box noisy channel and all we care about is what guarantees we can make if we repeat that process. Sort of like Blackwell’s use of the term experiment. But in the other (scientific experiment) case, we know we can’t escape sources of uncertainty like how close the sampling process we used comes to taking a random sample from the population we want to generalize to, and so making guarantees about the information that is getting generated in the blackbox sense is not so applicable.

    • Jessica:

      Yes, my above post is related to my concern that, in the standard approaches to causal inference in statistics, the treatment effect is considered to be a black box. This is what I’ve called the “take a pill” model of science. Causal identification, as usually presented in statistics, is important—we talk a lot about it Regression and Other Stories—but often it’s kind of a distraction from more important questions of what’s actually happening in the world. Statistics as it is usually taught (including how I teach it, unfortunately!) is much more about proving that something works (in a black-box sense) than understanding how it works or how to come up with ideas that will work. There’s an implicit division of labor: science is about understanding the world and statistics is about checking that things really work. I think we can and should do better, and indeed we do in many subfields (pharmacology, item-response modeling, etc.); we just haven’t integrated this attitude into the core of statistics.

      • “statistics is about checking that things really work. ”

        I wouldn’t characterize it that way.

        IMO statistics is about *quantifying the degree* to which some process occurs – as you stats geeks say *conditional on* – the method of measurement. For example we might calculate the degree to which junior high school kids smoke dope using survey data, school locker searches, dog smell detection or some other detection method. Each method might give us a different answer. Statistics can’t directly tell us which answer is correct without some other relationship between the calculated answer and the detection method.

        Frequently we can tell a given process does (or does not) occur either from observations or by tabulating the data without recourse to statistical calculation. What we can’t tell is the exact degree to which that process occurs.

        • Chipmunk:

          What I wrote was, “There’s an implicit division of labor: science is about understanding the world and statistics is about checking that things really work. I think we can and should do better…”

          That is, “statistics is about checking that things really work” is how statisticians and practitioners often seem to think. I agree with you that this attitude is mistaken.

  4. You might be interested in Jon Williamson’s talk on evidential pluralism at the latest Phil Stat Wars’ panel. He argues that causal inference is not synonymous with statistical inference, and a causal framework requires statistics (association studies) as well as science (mechanistic studies).

    It’s on https://phil-stat-wars.com/ under Session 4.

  5. The mask mandate example here doesn’t really work, because the real question of interest is not whether mask mandates reduce infection but whether masking does. So one ends up asking and answering the wrong question.

    The implicit assumption behind the question “do mask mandates reduce infection” is that a mandate will lead to compliance. One could study the question “do mask mandates reduce infection when compliance is high vs low”, e.g., by comparing (for example) Japanese and German or US populations (the Japanese tend to follow mandates more, at least anecdotally this seems to be true). But that is a different question again.

    And as people pointed in the last post, if one really does want to know if masks reduce infection, one would have to treat “is wearing a mask” as a continuous variable, say normalized between 0 and 1, where 0 means one is definitely not wearing a mask, and 1 means that one is wearing is as intended and in the designated public spaces (and the mask is FFP2). Seems like an impossible experiment to do even in principle. I guess some questions can only be answered by developing understanding of the process through sub-experiments. One could definitely do controlled experiments to study how particles move in space, how exposing people to viruses leads to infections (OK, maybe, not sure about this), and whether masks per se can block the entry of particles into the nasal passages etc. I could imagine setting up a process model like a weather prediction model to understand the process. I bet someone has already done all this.

    The masking issue is a good example where most experimental data looking at people wearing masks or not and infection rates in the aggregate will be basically useless.

    • I think Lindsay Marr et al. have done just the sort of mechanistic modeling and testing on other animals that you suggest. The results are quite clear. Well-fitted N95s filter almost all virus-bearing particles and that reduces the infection rate. From my son’s group at Emory, under ordinary circumstances (e.g. no massive doses doing intubations) the infection rate is linear in exposure and disease severity has little connection to exposure dose.

    • This is simply an example more impactful than most of the problem that infests virtually all social sciences research (including a lot of stuff not usually called “social science” in epidemiology). It’s the reification of self reported, aggregated and/or secondary data into what our model requires even when it obviously something else.

      We ask people things like how many serving a day of certain foods they eat or we look at things like whether someone lives or works in a place with a mask mandate. Then we plug the resulting data into a model that assumes those things actually describe each individual’s behavior.

      Or we call a bunch of people on the phone and gets answers from 1% of them which we post-stratify and derive population proportions of various beliefs. Every field has its own ways of covering its eyes and pretending they don’t know how bogus the actual data is. I’m not sure what the source of such magical thinking is among people plenty smart and educated enough to know better, probably the old adage about noone being as blind as the man whose salary depends on not seeing.

    • The mask mandate example here doesn’t really work, because the real question of interest is not whether mask mandates reduce infection but whether masking does. So one ends up asking and answering the wrong question.

      Eh, the real question of interest could be if a mask mandate reduces infection if you’re a decision maker questioning if you want to mandate masks or not. If you’re that guy, it doesn’t really matter that masks would work if everyone complied properly but don’t in practice, your only action is to mandate or not.

      Whether or not masks reduce infection and transmission is indeed the question of interest to individual potential mask wearers. There’s a kind of recursive paradox here, wherein data on mask mandates is being applied to the question of whether or not to comply with said mandates, even though the compliance rates also affect the data. So if nobody wanted to comply, the effectiveness looks low, and then everyone looks at the data and concludes masks don’t work so they don’t comply.

      • somebody –

        > Eh, the real question of interest could be if a mask mandate reduces infection…

        But how do you generalize about whether mask mandates reduce transmission (assuming you don’t know as a fact whether or not masks do or don’t reduce transmission)?

        You can’t – because compliance isn’t some kind of empirical question.

        So I think it’s important to note that absent definitive information about the physics of transmission, studying the impact of mask mandates is inherently limited and prolly useless as Shravan says.

      • Somebody: you wrote:

        “the real question of interest could be if a mask mandate reduces infection if you’re a decision maker questioning if you want to mandate masks or not.”

        So, if you are a decision maker and want to decide whether to institute a mask mandate, you look up the literature and find the study Andrew mentioned, which finds no correlation between mask mandates and infection reduction.

        What would you decide?

        If I were a decision maker, and if I already knew that masks reduce infection, I don’t even need a study on whether mask *mandates* reduce infections, I already know enough to decide whether to institute a mask mandate.

        If the decision-maker uses the null result in the study Andrew mentioned to decide to not institute a mandate, I would suggest firing them for statistical incompetence.

        On the other hand, if the study shows a significant correlation in the expected direction between mask mandates and reducing infection, the decision-maker might make the right decision to institute the mask mandate, but they didn’t really need that significant correlation to decide, if they already know that masks work.

        So, even for the decision-maker, whether mask mandates reduce infection is not going to be a useful guide for decision-making.

        PS I guess the third possible outcome is that the study shows that mask mandates increase infections significantly. I would put that study in the file drawer probably (My priors are just too tight on this one to be influenced by the data), or maybe publish it in PNAS or Nature or somewhere.

        • Well I wouldn’t use the above study because as commenters above have pointed out it doesn’t compare like for like.

          But, suppose you run a midwestern grocery store. You look around and see that at every other store with a mandate, 60% of people are taking their mask off when no one is looking and 20 more percent are wearing it below their nose or below the chin or whatever. In addition, there are frequent arguments between customers and store staff over mask compliance and even some protests.

          * If I don’t institute the mandate, I get happy customers, but covid transmits in my store
          * If I institute the mandate, I get protests and fights and turn away customers, and people still get covid in my store at similar rates (especially because the benefits are likely highly nonlinear with masking rates)

          Yeah, myself I agree I would probably institute a mandate anyways. The theory that masks can reduce disease transmission and severity is strong, and the costs are almost zero, so as a matter of principle I’d support that with my mandate. But the cost to decision makers of instituting a mandate is not zero, so I can see why one would want to weigh that against the practical efficacy.

        • somebody –

          > * If I don’t institute the mandate, I get happy customers, but covid transmits in my store
          * If I institute the mandate get protests and fights…

          Sure, if you construct a scenario with all possible outcomes to support your conclusion and ignore any that wouldn’t, you can use it to support your conclusion.

          No mention of the potential benefit from staff and customers not getting ill. No benefit from customers and staff who appreciate working in a mess risky environment.

        • Hi somebody,

          even in your example, the question isn’t whether mask mandates reduce infection.

          If the shop owner wants to maximize income, the question they want to ask is: what will maximize my income, instituting a mask mandate or not instituting one? Here, some probabilities and their costs need to be estimated.

          – If you let poor customers possibly die due to corona, you won’t lose much income.
          – If you let rich customers possibly die, you could lose a lot of income.

          So I would need to know the customers’ income distribution, frequency or purchase, money spent in each visit, etc.

          I probably want to also factor in the potential impact of customer lawsuits (“you didn’t warn me that if I don’t wear a mask I could get corona, pay me 10 million dollars now”).

          If the shop owner wants to maximize on something else, the calculus will be different but it still never involves the question, do mask mandates reduce infection.

Leave a Reply

Your email address will not be published. Required fields are marked *