Discussion of uncertainties in the coronavirus mask study leads us to think about some issues . . .

1. Communicating of uncertainty

A member of the C19 Discussion List, which is a group of frontline doctors fighting Covid-19, asked me what I thought of this opinion article, “Covid-19: controversial trial may actually show that masks protect the wearer,” published last month by James Brophy in the British Medical Journal.

Brophy writes:

Paradoxically, the publication last week of the first randomized trial evaluating masks during the current covid-19 pandemic and a meta-analysis of older trials seems to have heightened rather than reduced the uncertainty regarding their effectiveness. . . .

The DANMASK-19 trial was performed in Denmark between April and May 2020, a period when public health measures were in effect, but community mask wearing was uncommon and not officially recommended. All participants were encouraged to follow social distancing measures. Those in the intervention arm were additionally encouraged to wear a mask when in public and were provided with a supply of 50 surgical masks and instructions for proper use. Crucially, the outcome measure was rates of infection among those encouraged to wear masks and not in the community as a whole, so the study could not evaluate the most likely benefit of masks, that of preventing spread to other people. The study was designed to find a 50% reduction in infection rates among mask wearers.

Here’s what happened in the study:

Among the 4862 participants who completed the trial, infection with SARS-CoV-2 occurred in 42 of 2392 (1.8%) in the intervention arm and 53 of 2470 (2.1%) in the control group. The between-group difference was −0.3% point (95% CI, −1.2 to 0.4%; P = 0.38) (odds ratio, 0.82 [CI, 0.54 to 1.23]; P = 0.33).

And here’s how it got summarized:

This led to the published conclusion: “The recommendation to wear surgical masks to supplement other public health measures did not reduce the SARS-CoV-2 infection rate among wearers by more than 50% in a community with modest infection rates, some degree of social distancing, and uncommon general mask use. The data were compatible with lesser degrees of self-protection.”

As Brophy writes, this is an unusual way to summarize such a study that had a non-statistically significant result (in this case, an estimated reduction of 20% in infection rates with a standard error of 20%). Usually such a result would be summarized in a sloppy way as reflecting “no effect,” or summarized more carefully as being “consistent with no effect.” But it is a mistake to report a non-statistically-significant result as representing a “dubious treatment” or no effect. And that’s what Brophy is struggling with.

Brophy explains:

Incorrect interpretation of “negative” trials abounds, even among thoughtful academics, regardless of their personal beliefs about the studied intervention. . . . while the editorial accompanying the trial concludes that masks may work, it does so while also implying that the trial itself was negative, stating “. . . despite the reported results of this study, (masks) probably protect the wearer.”

I agree with Brophy’s criticism here. There should be no “despite” in that sentence, as the results of that study do not at all contradict the hypothesis that masks protect the wearer.

Brophy continues:

The results of DANMASK-19 do not argue against the benefit of masks to those wearing them but actually support their protective effect.

I guess I agree here too, but if you’re gonna say this, I think you should emphasize that the data are also consistent with no effect. Otherwise you can be misleading people in the other direction.

But ultimately there are no easy answers here. It’s similar to the struggles we have had when communicating probabilistic election forecasts. There’a also some interesting discussion in the comments section of Brophy’s article.

2. Experimental design and experimental reality

But there’s something else about this experiment that I hadn’t noticed at first which I think is also relevant to our discussion.

Recall the data summary:

Among the 4862 participants who completed the trial, infection with SARS-CoV-2 occurred in 42 of 2392 (1.8%) in the intervention arm and 53 of 2470 (2.1%) in the control group. The between-group difference was −0.3% point (95% CI, −1.2 to 0.4%; P = 0.38) (odds ratio, 0.82 [CI, 0.54 to 1.23]; P = 0.33).

A rate of 2% . . . that’s pretty low. The study began in April, 2020, a time when there was a lot of well-justified panic about coronavirus. Before all the lockdowns and social distancing, we were concerned about rapid spread of the disease.

I have two points here.

First, studies like this are usually designed to be large enough to have statistically significant results. I don’t know the details of the design of this study, but if they were anticipating, say, a 10% rate of infection rather than 2%, then this would correspond to a much larger number of cases and much more precision about the relative effect size.

Second, as Brophy notes, the main motivation for general mask use is not to protect the mask-wearer from infection but rather to protect others from being infected by the mask-wearer. In either case, you’d expect the overall effectiveness of masks to be higher in settings where there is more infection. Just as with “R0” and the “infection fatality rate” and other sorts of numbers that we’re hearing about, “the effectiveness of masks” is not a constant—it’s not a thing in itself—it depends on contexts. You’d expect masks to be much more effective if you’re in a busy city with lots of social interactions and regularly encountering infected people than if you’re a ghost, living in a ghost town.

These two points get lost in the usual way that this sort of study gets reported. The first point is missed because there is an unfortunate tendency not to think about the design once the data have been collected. The second point is missed because we’re trained to think about treatment effects and not their variation.

117 thoughts on “Discussion of uncertainties in the coronavirus mask study leads us to think about some issues . . .

  1. My prior on mask-wearing is that it has to protect the wearer. At one end of the spectrum, we are protected simply by having hair in our noses. At the other end, we are highly unlikely to be infected through an N95 mask. In the middle, any barrier that creates a more tortuous path for the virus can capture larger infected particles. Is it really any more complicated than that?

    I have been searching for the evidence behind the meme that “general mask use is not to protect the mask-wearer from infection but rather to protect others from being infected by the mask-wearer” without any luck. If anyone has seen any studies that support that idea – rather than just restating it – I would love to read them!

    • > to protect others from being infected by the mask-wearer

      Just like you don’t really need an RCT to know why a mask would help the person wearing it, I don’t see why you’d need a similar study to know that a mask would act to protect those around. If you sneeze, cough, spit, etc., the mask will make it harder for those particles to get from your face to other folks around you.

      Whether mask wearing is *more* effective for yourself or people around you, that’s a hard question to answer in large part because of what Andrew points out above, that “effectiveness” is itself context dependent.

      And of course there are other important questions, like do particles carry the disease, how much would a mask stop particles of different sizes and from what direction, etc.

      But the physical principles behind the mask itself make clear why it would help protect both yourself and others, conditional on the answers to those questions.

      • It’s not hard to make an educated guess as to whether they are more effective on the way out or on the way in. The droplets on the way out end to be big and then shrink very quickly before being inhaled as aerosols. Even good masks don’t work so well for ~0.3 micron particles. So if the mask fits well, it will work better for exhaling than for inhaling.
        For masks with mediocre fits, an opposite effect has been shown (I can’t locate the reference) due again to a simple physical effect. When you exhale you blow the mask away from your face, which can create leaks. Inhaling tends to seal it up. A good N95 stays sealed even on exhaling. A crappy mask leaks either way. But a so-so mask leaks more on exhaling.

        • I’ve read others factors as well. The first is that a mask, even if it doesn’t seal tightly, alters the flow of air, with a result that even the particles that pass through cleanly on the way out travel less far, which in the end will have an exponential effect: Maybe one person 8 feet away from an infected person doesn’t get infected where he might have otherwise a had the particles traveled farther, and maybe that person doesn’t infect the two people he might have infected otherwise, and so on.

          The second is that wearing a mask increases the humidity level behind the mask, which makes the droplets and the aereosolized particles (of course those terms are fairly arbitrary) heavier and more likely to travel less far for that reason as well.

          I don’t have the scientific knowledge to evaluate those claims but they seem logical to me at face value and they were promoted by people who had the expertise to weigh in with some credibility (unfortunately I don’t remember the references). I’d appreciate being given any evidence those concepts are wrong.

    • Matt Skaggs: Many of the studies on masks are based on examining dispersion (source control, or protecting others). The data I’m aware of which is most supportive of masks is observational and more consistent with source control.

      IMO — part of the problem is that proper use of masks for personal protection is more difficult and time consuming or expensive. And it’s easy to make mistakes (touching the front of the mask then your face or your food).

      Here’s an old letter (April 2020!) discussing it … I’m pretty sure I’ve seen other meta analysis, but this does discuss why source control is especially appealing as a target. (It’s also obvious from first principles.)

      https://erj.ersjournals.com/content/early/2020/04/27/13993003.01260-2020

  2. Taleb argues that the study is too underpowered given the possibility of a non-negligible false positive rate, and that the study does provide significant evidence that masks protect the wearer based on PCR test results (PCR tests apparently have high specificity i.e. low false positive rate): https://fooledbyrandomnessdotcom.wordpress.com/2020/11/25/hypothesis-testing-in-the-presence-of-false-positives-the-flaws-in-the-danish-mask-study/

    Some argument about his methodology for the latter claim in the comments on that post (one of his calculations seems to disregard uncertainty in the estimate of the population probability that a non mask wearer would get a positive PCR test result), but the direction of his conclusions seems plausible to me.

    • I want to like Taleb, but he seems to be a little fast-and-loose with his calculations, and they always seem to favor his viewpoint. For example, he insists on using a one-tailed test to calculate p, which definitely presupposes that masking is beneficial. Also, he possibly double-counts the infections by adding the PCR+ and the hospital-diagnosed covid cases (the 5 vs 15 values he uses)—the study states those are and/or, so they are not mutually exclusive.

      Honestly, I’m not smart enough to know if he is right or if he’s just extremely confident.

      But I do agree with him that the antibody testing is insufficient to draw any conclusions.

  3. “I have been searching for the evidence behind the meme that “general mask use is not to protect the mask-wearer from infection but rather to protect others from being infected by the mask-wearer” without any luck. If anyone has seen any studies that support that idea – rather than just restating it – I would love to read them!”

    This appears to driven by an oft-quoted study years ago regarding the effectiveness of N95, surgical and cloth masks in protecting wearers from respiratory diseases (specifically influenza) in a hospital setting.

    Notably, patients weren’t masked. It showed that cloth masks were essentially useless, surgical masks somewhat protective, and N95s very protective to the wearer. It’s pretty much the only real study that was out there.

    There are some good reasons to not extrapolate this study to the general public and covid 19. HCWs working full shifts in hospitals during flu season when some of the patients are there due to being severely ill with the disease isn’t a great analogue for a person going to the grocery store a couple of times a week for maybe 30-45 minutes shopping and other sources of infrequent and relatively short potential exposure. And asymptomatic infection is much less a problem with the flu because symptoms present themselves much more rapidly than with covid 19 (generally no more than 24 hours rather than 3-5 days) so the chance of someone exposing you when they’re not aware they’re ill is much less.

    I think (but am not certain) that this study had a lot to do with the reluctance to suggest that cloth masks would protect people from covid-19. When there was greater awareness of the role of asymptomatic transmission combined with the fact that cloth masks can reduce exposure from people shedding the virus in small droplets during exhalation we started seeing recommendations that masks be worn to cut down transmission by asymptomatic and presymptomatic transmission. OK, so maybe the wearer isn’t protected by a cloth mask (if the study mentioned above is considered relevant) but it will highly reduce the odds of someone who is infectious but aware from transmitting it to others. So we’ll emphasize that.

    Also remember that the purpose of surgical masks, in particular, are to protect the patient being cut open from the surgeon and staff, not the other way around. Studies around these in surgical situations don’t look into whether or not they protect the wearer.

    Just some thoughts that may or may not be useful.

    • 2% isn’t “pretty low” considering the overall infection rate. In April and May, Denmark registered 9000 cases of Covid-19 infection (according to Wikipedia), and Denmark’s population is just under 6 million. That means the average infection rate was 0.15%.
      The problem with setting up such a study is obviously anticipating the likely infection rate in your study group.

    • “HCWs working full shifts in hospitals during flu season…isn’t a great analogue for a person going to the grocery store…”

      This is true but even without the mask you’re unlikely to experience significant exposure at a retail business unless you’re crowding into the bathroom with ten or fifteen other shoppers.

      This is what complicates the low-grade mask question: they work well in places where they are almost totally unnecessary (large well ventilated buildings and outdoors) but not very well in places where they would be actually be useful (places with less ventilation where people spend more time together in smaller areas and/or are breathing hard).

      • jim –

        >…they work well in places where they are almost totally unnecessary (large well ventilated buildings and outdoors) but not very well in places where they would be actually be useful (places with less ventilation where people spend more time together in smaller areas and/or are breathing hard).

        “Not very well” seems not very useful, to me. Also, “actually useful” seems a bit dubious.

        If they help reduce the risk of transmission, even if small overall but meaningful relatively, it can have an exponential effect that could be quite significant. If it helps to reduce the risk of infection, even if small, I’ll take it particularly given the exponential-ness of it.

        This isn’t a matter of absolute risk. There is no way to effectively eliminate risk absolutely, unless I start a colony on the moon (by myself, after infecting the entire spaceship and all the supplies – and even there I might not get everything right)

        It’s all about making judgements of relative risk. Lemme ask you, if you could be invisible and there were no external social pressure, and no one would know one way or the other, knowing what you know, would you choose to go to the supermarket with no mask, thinking that limiting your potential risk to other and their potential risk to you just ain’t work the minor inconvenience?

        In fact, seems to me that masks work no more or less well depending on the location. It’s just that some locations are inherently more risky than others. And so the same degree of drop in risk looks different, relatively, in different situations.

        • “it can have an exponential effect”

          No, it can’t. When you stomp your feet in Kansas are their earthquakes in the Hindu Kush? No. Small magnitude effects shrink and die. And that’s exactly what you see in the danish study: an effect so small it couldn’t even be measured.

        • Jim –

          The marginal return from one person in a crowded supermarket not getting infected could mean that person not infecting grandma and everyone else at his house the next day for the Christmas party and then all the people attending that party at all the other parties that they all attend on New Year’s eve.

        • jim, I think you may have been interpreting Joshua’s “exponential effect” claim as if he were using the term ‘exponential’ incorrectly, which is the way most people use it. Here I think he meant it correctly, i.e. literally. If, without mask-wearing, each person infects r0 additional people on average, but with mask wearing each person infects (r0-delta) additional people, you have literally changed the exponent in the exponential growth of the virus. Perhaps you can even take it from some value greater than 1 to some value less than 1, in which case you get exponential decay instead of exponential growth.

          This is part of the theme of individual vs societal risk, of which there has been much general discussion recently. Suppose I think I have a 1% chance of getting infected if I live in such-and-such a way for the next two weeks and do not wear a mask, but a 0.8% chance if I live the same way and do wear a mask. From a personal standpoint I might consider this to be such a small difference in risk that I’m pretty indifferent. But a reduction of this magnitude could be huge at the societal level; the difference between a manageable rate of spread, and one that leads to overwhelmed hospitals and patients dying due to inadequate care.

      • “This is what complicates the low-grade mask question: they work well in places where they are almost totally unnecessary (large well ventilated buildings and outdoors) but not very well in places where they would be actually be useful (places with less ventilation where people spend more time together in smaller areas and/or are breathing hard).”

        Would this be what’s known as an argument from assertion?

        • “an argument from assertion?”

          If you call reasoning from evidence and knowledge an argument from assertion, well, yes, you’re quite right.

          Here’s Michael Osterholm (Biden task force member) on masks:

          “Don’t, however, use the wearing of cloth face coverings as an excuse to decrease other crucial, likely more effective, protective steps, like physical distancing”

          “don’t use poorly conducted studies to support a contention that wearing cloth face coverings will drive the pandemic into the ground. But even if they reduce infection risk somewhat, wearing them can be important.”

          https://www.cidrap.umn.edu/news-perspective/2020/07/commentary-my-views-cloth-face-coverings-public-preventing-covid-19

          so many PhDs, so little ability.

        • jim,
          I suspect I am not alone in being unclear on what you are saying. If you’re saying that crowded indoor places are more dangerous than un-crowded outdoor spaces then sure, nobody is disagreeing with that.

          If you’re saying that masks trap particles, and thus reduce risk, outdoors but not indoors, that seems like a crazy statement. I doubt very much that that is what you’re saying. But then what _are_ you saying?

  4. Similarly, (as mentioned already) we think that nose hairs offer some protection from things like allergens, smoke, and maybe even small infectious particles by trapping them in mucous to be expelled by a sneeze or blow (or saline rinse). Has that ever been confirmed in a study with statistically significant results? I would imagine that it is a fairly small effect that would be difficult to detect, but if we forcefully removed peoples nose hairs for many many years and tracked their health, we would probably find a slight increase in diseases or allergy symptoms in them. Then I found this: https://pubmed.ncbi.nlm.nih.gov/21447962/

    In order to really detect and quantify the effect of mask wearing (on number and severity of infections), it would take a very difficult and large experiment.

    The mask thing is so funny because once you have any reasonable level of understanding of stochastic processes and the relevant biology, it is obvious that they offer some protection no matter who wears them and the most protection when everyone wears them.

  5. The simple truth, which this study completely verifies, is that while there is an obvious *logic* to mask-wearing, there is (a) no evidence that mask wearing does a lot of a good; and (b) some evidence that it does *some* good. The case for mask wearing is that the costs are really low, so really low benefits are fine. Even a marginally negative benefit-cost test isn’t that bad… there are a lot more important things to worry about.

    The same logic, by the way, extends to hand washing and elbow-bumping. These aren’t (we think) significant transmission vectors, but the cost of washing your hands just isn’t very high and the costs of bumping elbows rather than shaking hands is, I’m pretty sure, 0.

    The problem here is all one of public relations, not science and not decision theory.

    • On a similar note, the high degree of uncertainty in these results should also induce a behavioral response. Willingness to pay to reduce uncertainty is increasing in uncertainty for any risk adverse individual. A rational person would be more likely to wear a mask as a result of the uncertainty alone.

    • “The simple truth, which this study completely verifies, is that while there is an obvious *logic* to mask-wearing, there is (a) no evidence that mask wearing does a lot of a good; and (b) some evidence that it does *some* good.”

      The study doesn’t even address what is thought to be the major benefit of mask wearing – lowering source transmission. So it’s not clear how the phrase “completely verifies” anything applies, given that it doesn’t even address the whole issue.

      And there are plenty of other problems with the study, which the paper itself cheerfully summarizes.

      • I agree that the study doesn’t address the protection masks give others. But of course if that were *really* the main effect then all sorts of things we see in public addresses would be completely stupid. Why do news reporters wear them giving reports outside? Why should anyone wear them outside very much at all? Why should people leaving their house once a week ever wear them? Note that I’m nt saying it doesn’t work both ways… I think it does… somewhat, and I agree the Danish study completely misses it. But note that even in a period of general mask wearing, a second wave has created worldwide hospitalization rates nearly up to the wave of hospitalizations in the first, maskless wave. If masks so protect, then how did that happen?

  6. The heavily qualified and “scholastic” manner of thinking: reduce *every* matter to a binary choice; fall into the nonsense of applying null-hypothesis significance tests in lieu of exposing oneself by taking a position. Corrupt “evidence” into a preposterously narrow category, indifferent to experience and findings of any sort, other than the crabbed, scholastic setup of the RCT. And guess what? The reported results merely increase confusion, cynicism and reduce the authority of whomsoever labors in such a grotesque fashion over a flea or a penny’s worth of “knowledge”. For by exerting such an phenomenal strain in order to take a mere position in the end of exaggerated diffidence precisely where leadership was most needed (and making their weak and neutered claims in a preposterous tone of near papal infallibility) the whole cast and color of the “authoritative” voice comes out as purely farcical! Saturday Night Live could not have done a better job in making this sort of “science” into a self-satirizing farce!

    It is classic! Worthy of Swift!

    In other words, while they hedge on any statement bearing on the concrete problem at hand (e.g. the utility of masks) they speak with overweening authority to make precise exactly how uncertain they are.

    They are tied in knots, forgetting Aristotle’s good advice, that we “look for precision in each class of things just so far as the nature of the subject admits”.

  7. Cochrane systematic review from November 2020:
    “Key message: We are uncertain whether wearing masks or N95/P2 respirators helps to slow the spread of respiratory viruses.
    Our confidence in these results is generally low for the subjective outcomes related to respiratory illness, but moderate for the more precisely defined laboratory-confirmed respiratory virus infection, related to masks and N95/P2 respirators. The results might change when further evidence becomes available.”

  8. Without having read the actual study, it seems apparent that the reported effect is the intent to treat, and NOT the local treatment effect of mask wearing. Imperfect compliance and the behavioral effects of mask wearing are surely biasing the estimates of mask effectiveness, right?

    For sure, the ITT has policy relevance (despite the obvious external validity concerns). But nothing I’m seeing here is capable of shifting my priors on the ceteris paribus effect of masks on infection risk, unless I’m missing something.

    • I glanced through the original study, and this jumped out at me:

      “Based on the lowest adherence reported in the mask group during follow-up, 46% of participants wore the mask as recommended”

      If I’m understanding this correctly, the non-compliance rate was over 50%! This means that the reported estimates of mask effectiveness are *severely* underestimated! To wit, even if none of the people in the control group wore a mask properly, the effect of wearing a mask among those who wore mask (the *because* of their random assignment to the treatment group (the “LATE”) is at least *twice* as large as the reported effect (-0.3%). Even a modest rate of mask wearing by the control group (these numbers aren’t reported) could potentially yield a very large and very significant effect!

      In term of the effectiveness of mask wearing in preventing Covid, the reporter results can at best be thought of as an extreme lower bound on the true effectiveness.

      The conclusions that people are drawing from this paper seem to be completely out of line with what the reported point estimates actually suggest. What am I missing?

      • What you’re missing, MJ, is that they don’t claim to evaluate the effectiveness of mask-wearing; they claim to evaluate the effectiveness of _recommending_ mask-wearing. It’s right there in the conclusion: “The recommendation to wear surgical masks to supplement other public health measures did not reduce the SARS-CoV-2 infection rate among wearers by more than 50% in a community with modest infection rates, some degree of social distancing, and uncommon general mask use. The data were compatible with lesser degrees of self-protection.”

        There you go: the _recommendation_ to wear surgical masks…

        • And this is quite possibly a more useful thing to measure when you are talking about *policy* recommendations, because governments can mandate or recommend mask use, but they can’t (practically) check whether everyone is wearing them correctly.

          I think there is also a problem of conflating “if I wear a mask (properly) does it protect me, or protect me from infecting others, at an useful level?” [individual-level] vs. “how does mandating/recommending mask use affect the course of the outbreak?” [population-level]

          The first question can assume correct use of the mask (since it implies someone who takes it seriously) but the second cannot.

  9. The pomposity of it would be hilarious if it were not so destructive, of confidence, of morale: “We know next to nothing about X; but we shall quantify the uncertainty that accompanies our alleged next-to-nothing knowledge, so very precisely, with such care; so that you can be confident, dear reader, that what we are “doing” is _science_!”

    • rm bloom: I’ve said this before, but I think it bears repeating as it may be a counterpoint or counterpart to your point:

      It seems that telling people to embrace uncertainty is like telling cats not to embrace birds. Even those who pay lip service to that message will then indulge in certainty about uncertainty (2nd-order quantifauxcation*), e.g., as when they persist in calling CIs “confidence intervals” or “uncertainty intervals”, or when they insist on treating hopelessly muddied or ambiguous results as demonstrating qualitative differences in interventions or lack thereof.

      *See Stark “pay no attention to the model behind the curtain” at https://www.stat.berkeley.edu/~stark/Preprints/eucCurtain15.pdf
      – one of the few essays I encourage most everyone concerned with stat and modeling abuse to read and circulate widely. You can read most any section group independently of the rest; for motivation you might start with the group on pages 8-12, which is about the points that (with 20/20 hindsight) I most wish I had covered prominently in my teaching and writings.

      • Sander,

        What I like to say is not “embrace uncertainty” but “embrace variation and accept uncertainty.” With rare exceptions (for example, watching the plot of a story unfold), uncertainty is upsetting. But we need to accept it nonetheless. Variation, though, is pleasant, so that I think we can effortlessly embrace, once we realize it’s ok to do so.

      • Thank you for the excellent article! I cannot help myself, but, I always respond (as though on cue, I cannot help myself) to the description of subjective probability, such as which appears at the top of the paper, “The subjective theory is about what I think. It changes the subject from geometry or physics to psychology”. I ask, “Isn’t ‘what I think’ (about the coin) linked to what I have seen before (of coins and the outcomes of flips)?” I gin up in my head the best reference class I possibly can right there and then. It’s not ‘what I think’, it’s ‘what I think I’ve *seen* before’. Bayesianism is frequentism in disguise.

        • Sander, Rm:

          See Beyond subjective and objective in statistics (with discussion and rejoinder).

          Beyond this, I have problems with Stark’s article in that he offers a caricature of Bayesian statistics, and he also says, “there are practical issues in eliciting prior distributions, even in one-dimensional problems. In fact, priors are almost never elicited—rather, priors are chosen for mathematical convenience or habit.” This is warmed-over mush from the 1970s and entirely ignores the fact that just about all statistical procedures, Bayesian or otherwise, are chosen for “mathematical convenience or habit.” I’d add “computational convenience” too.

          Again, Stark writes “subjective (aka Bayesian),” which is just wrong.

          Look. If you don’t want to use Bayesian methods, fine. No need to talk about them at all. But don’t bullshit about them. If you don’t want to do the research and find out what modern Bayesian methods are like, then just recognize your ignorance!

          And I just loooove that he thinks that “Sir Francis Bacon’s triumph over Aristotle” should have put Bayesian inference to rest. Tell that to the psychologists, political scientists, economists, physicists, etc., who use Bayesian methods every day. Sorry, dude, can’t fit your pharmacokinetic model: Francis Bacon won’t let you do it.

        • Bacon: “…the human mind is more affected by affirmatives and actives than be negatives and privatives; whereas by right it should be indifferently disposed toward both….and therefore it was well answered by one who when the painting was shown to him hanging in the temple of such as had paid their vows upon escape from shipwreck, and was pressed to say whether he did not now acknowledge the power [of the gods], “Yes,” asked he in return, “but where are they painted that were drowned after paying their vows?”

          Aristotle: “look for precision in each class of things just so far as the nature of the subject admits”.

        • Rm:

          Francis Bacon is just fine. (I’m American so I’ll call him Francis Bacon, not Sir Francis Bacon or Lord Bacon or whatever.) My problem is using that as a justification to not do Bayesian data analysis. That’s just dumb. There are some problems were you can get by without using prior information, and there are some problems where prior information makes a big difference. Bullshit about subjectivity has nothing to do with this. Logistic regressions, differential equation models, splines, and all the other methods we use are subjective too but they can still be useful.

        • Thanks Andrew for the counterpoint. I think however you have missed the main points of Stark’s article,. Its shortcomings in describing Bayesian approaches do not even slightly invalidate its criticisms of current modeling practices in our research settings. Did you read the pages 8-12 I recommended? The attack there is on the assumption of a data-probability model that cannot be justified from the physical reality, the same problem I attack in https://arxiv.org/abs/2011.02677. Those problems apply to both frequentist and Bayesian analyses.

          Excusing priors with “all other methods are subjective too” dodges our responsibility to provide a justification for each model and method we use based on ‘objective’ (empirically documented) or at least mutually agreed assertions, e.g., that trials in Denmark may be good for imputing encouragement effects in Sweden but not so good for imputing those effects in Alabama. And it dodges the fact that quantification of such qualitative statements (as done by a prior for effects) will inevitably introduce spurious information via the model form and (often unseen) hyperparameter settings. The resulting spurious precision may be unavoidable – software needs formulas, not vague qualitative assertions – but that’s no excuse for ignoring the problem. It arises in our sampling models too, but at least one can audit the study design and conduct to gauge the problem; elicited priors are at much higher risk of injecting delusional expert overconfidence into the mix.

          So I emphasize to you: In our reality, “embracing variation” becomes another slogan as empty as “embrace uncertainty” if it only leads to overly precise claims and formulas for variation or uncertainty (including overly precise statements of uncertainty about variation). Failure to constructively engage Stark-Freedman type criticisms thus aggravates the problem you have rightly labeled ‘uncertainty laundering’. So do deceptive sales labels like “uncertainty intervals” for intervals that don’t begin to capture warranted uncertainty. And so do so-called “confidence intervals” whose alleged coverage percent hinges on nonexistent randomization, as well as to “posterior intervals” whose alleged coherency hinges on nonexistent exchangeability.

        • I agree with Sander in nearly all of his argument here. I don’t have anywhere the depth of knowledge that you all have. I don’t bill myself as an expert. The questions I have raised over the last 20 years in other fields tally with the concerns that Sander Greenland addresses in his writings.

        • Sander:

          Here’s an example where we used Bayesian methods to generalize from one drug to another. Here’s an example where we used Bayesian methods to generalize from one state to another. Here’s an example where we used Bayesian methods to assess patterns of nonsampling error in opinion polls. These analyses are not perfect, as no analysis is perfect.

          Regarding pages 8-12 of Stark’s article: I recognize that many bad models are out there. Not just imperfect models, but models whose use can actively impede statistical understanding. Stark mentions problems with using the Poisson distribution, and this is a point that my coauthors and I make in Regression and Other Stories. We’re in agreement there. In other cases, much can be learned from models.

          To use a model does no require believing in the model. As I wrote in my paper with Hennig, I think the terms “subjectivity” and “objectivity” are not helpful when used in statistics, as each of these terms represents a set of different goals. I am not “excusing priors.” Priors are part of a probability model.

          I agree that it is a coherent stance for Stark to simply refuse to use probability models except for problems where there is a physical randomization as in coin flips. By following this rule, he limits the set of problems he can work on, but that is not a problem: there are enough problems in the world that involve physical randomization or physically defined probability models that he and his students can keep busy for them for their whole career. That’s fine. There’s another set of problems for which there is no physical randomization or physically defined probability models. These problems include the analysis of test scores in psychometrics, the study of drug effects using pharmacokinetics, the analysis of elasticities in economics, almost all of statistics in sports and business, and even much of the physical sciences, which make a lot of use of descriptive models and curve fitting.

          It’s fine for me for you and Stark and me to sit down and draw a sharp line—ok, a blurry line—between those examples that involve clearly justified probability models and those examples where the model is constructed. And it’s fine for youall to denote those latter examples as “engineering problems” or some such term to emphasize the need there for human model building. My colleagues and I have recently expressed our view that more attention needs to be paid to these steps of model building, validation, and expansion. Except in rare cases, a statistical model is a work in progress, not a platonic object. When considered as an engineering project, I think the right analogy is not a bridge but rather a flotilla that is continually being altered in response to changing conditions. (“Flotilla” isn’t quite right because it might imply a human opponent; I’ll try to think of a better analogy.)

          All this has very little to do with prior distributions, and I think Stark does his argument no favors by rehashing decades-old characterizations of Bayesian statistics which were wrong back in the 1970s and are even more inaccurate today. With regard to the points on pages 8-12 of his article, I agree there is value in specifying the assumptions of statistical models. I’ve seen many examples of statisticians justifying bad models by characterizing them as “objective” (as if, just because a model is given the label of objectivity, it must be correct or even useful) and I’ve seen many examples of statisticians justifying bad models by characterizing them as “subjective” (as if, just because a model represents one person’s belief, we should trust its results). I agree with Stark that the ultimate justification of a model has come from reality, not a mathematical property that gets labeled as objective or a personal feeling that gets labeled as subjective.

          With regard to my slogan to accept uncertainty and embrace variation: I have demonstrated what is meant by these in my books and applied research articles. I think that accept uncertainty and embracing variation is a much better approach than denying uncertainty and variation, which is what is done by many practitioners. For example, I’ve written about the ludicrousness of some public opinion researchers who want to analyze polls as if they are probability samples—a pretty ridiculous choice when response rates are below 10%. I think that probability modeling can be a very valuable tool for modeling survey data, but only when nonresponse is modeled too in some way. How exactly to model nonresponse is a challenge, but that’s life, whether we’re analyzing data on public opinion, or baseball, or pharmacology, or business, or anything else. We need both internal models of data and external calibration where possible.

        • Thanks Andrew – in reply to your long reply to my 2nd comment (which did not have a “Reply” button):

          First, I think you ended with a diversionary straw man when you stated “that accepting uncertainty and embracing variation is a much better approach than denying uncertainty and variation, which is what is done by many practitioners.” That’s a “mom and apple pie” statement to which one was arguing otherwise (who does?). I was instead arguing that this kind of highlighting can become empty sloganeering if one does not address head on the kind of concerns about default models and conventional modeling strategies that Stark, me, you and many others have raised. And nowhere do I see it as more of a problem than in the false precision and sense of confidence about having captured uncertainty than in the usual presentations of interval estimates, whether with “confidence” or “probability”.

          Then too, I am not sure the point of your list of examples of your Bayesian analyses. I have plenty of examples of my own (well they are semi-Bayesian in that nuisance parameters are left unmodeled to avoid problems such as those documented by Robins & Ritov SIM 1997 and Ritov, Bickel et al. Stat Sci 2014). You and I agree that Bayesian models are great tools, and (even) Stark would have to agree in principle given that Bayes procedures are frequentist-admissible. The problem is that as such modeling spreads thanks to (say) Stan and such, it ends up getting misused, just as with purely frequentist methods. I don’t see where anyone disagrees with that in principle; do you?

          The split comes with prior specification. The data-probability model can be built up from a narrative about the forces and mechanisms that produced (caused) the data – and Stark and I would agree that ought to be done in principle (again, see my https://arxiv.org/abs/2011.02677). This task mainly hinges on carefully studying the study itself: its design, its conduct, the snags in execution (like nonresponse) etc. Background information can enter but more in qualitative ways (e.g., ‘may we assume age effects are monotone, or not?’). Would you disagree?

          The problem with priors is that by definition they require summarizing background information in the form of a joint probability distribution on precise parameter values. That’s hard to do well even for experienced Bayesian statisticians, some of whom churn out analyses that make convenient prior-independence assumptions that are absurd in the face of what is actually in that background; I gave a simple then-common and perhaps still-common example in
          https://onlinelibrary.wiley.com/doi/abs/10.1111/0272-4332.214136.
          But, as with bad data models, framing assumptions as abstract properties of probability distributions makes it unlikely that context-expert collaborators will see that something absurd yet consequential is being assumed. Conversely, the statistician may rely on elicitations that are given as if consensus if not fact, when they are really just overconfident expressions of local biases and delusions of consensus.

          So to be responsible about Bayesian methods, we have to face that statisticians often supply bad models, context experts often supply bad opinions, and the two can synergize to produce sophisticated modeling exercise that on close scrutiny are misleading junk (maybe a few of my own could qualify). I’ve found checking the background literature for myself is the only safe detector and preventive for both problems – something not enough statisticians have time to do to a meaningful extent. That’s not a problem of priors alone, but the more informative a prior, the more it requires a much deeper exploration of the background, demanding as it does quantification of very vague and highly correlated uncertainties.

          The point is, priors introduce even more avenues for assumptions of convenience to do damage, with less moderation by the data at hand. That’s why I think Bayesians have become the chief obstacle to final acceptance of Bayesian methods, for denying or not having taken responsibility for the additional complexities that arise in applications – not math complexities (those just require math), but complexities underlying the topic under study, which need to be recognized and taken account of in the prior.

          Your analyses may well be exemplars of how that should be done, and your book may be a bible for how to do it (although I would have to question it to the extent it relies on uncalibrated posterior predictive checks). But the issue Stark is on about is one I see too, and not limited to but intensified by Bayesian methods: How do we deal with the fact that most users don’t have the time or integrated expertise for translating between the study context and model to do what we would all find an acceptable modeling analysis. Stark’s paper gives some very real, important examples where that problem is clear enough, and I know I and I bet you have plenty more examples.

        • Some of the discussion in that linked article is interesting and thought-provoking for sure. But there are several passages that I personally find silly and/or off-base.

          Under climate models he says, “The IPCC wants to treat uncertainties as random” as something which is, prima facie, absurd. What I presume he means is that he objects to using probability to quantify uncertainty. This suspicion is confirmed a few sentences later where he writes, “Mixing measurement errors with subjective probabilities doesn’t work. And climate variables have unknown values, not probability distributions.”

          This is some hard core Frequentist Dogma here my friends…If this were correct, literally no hierarchical Bayesian analysis I have ever seen “works”.

          His explanation of one literal interpretation of what likelihood functions do – in the context of a wind turbine bird mortality analysis – is humorous and makes some important points. But as Andrew says, there is a very very narrow scope of problems for which his objections would not apply, especially since he rejects exchangeability arguments out of hand for some reason or another.

        • Yeah my reaction is like Andrew’s lol. Certainly there are problems with statistics and models, but I don’t think this paper is really demonstrating much that is actually helpful.

          It’s getting buried in things like epistemic vs. aleatory and whatnot. Which sure, they’re neat, but demonstrate the problem dude.

          When I got to “Avian-Turbine Interactions” I was pretty annoyed with it already, but I figured he might tell us how the person with the model at least messed up. Instead of doing that he just points out how under his own philosophy it’s not possible to do anything — which is incredibly unhelpful advice. Just tell us why the model was wrong and how wrong it was and how you know this! For all I know the model is working great, or badly, who knows!

          And like how does Stark even really know “People cannot even accurately judge how much an object weighs with the object in their hands. The direct physical tactile measurement is biased by the density and shape of the object—and even its color.”

          If it’s not based on a randomized control trial of all people and objects, I don’t know why he’d trust it. Anyway I gave up after page 14 (the climate thing, where yet again he critiques the existence of models without even doing the courtesy of showing where the model messes something up).

          I guess this is like the glass-half-empty version of all models are wrong. All models are wrong so we just give up.

      • It’s an interesting piece but I agree with Andrew that is ridiculously unfair to the Bayesian approach. And this is not an irrelevant critique as one of the main points seems to be that the unification of epistemic and aleatory uncertainties in a probabilistic frameworks is fundamentally impossible (not just difficult and often done wrong).

        > The theory of equally likely outcomes is about the symmetry of the coin. The frequency theory is about what the coin will do in repeated tosses. The subjective theory is about what I think.

        It doesn’t sound so bad when one writes “The subjective theory is about what I know”. That makes the “why should I care what your internal state of mind is” objection absurd.

        > There are arguments in favor of the Bayesian approach from Dutch book: if you are forced to cover all possible bets and you do not bet according to a Bayesian prior, there are collections of bets where you are guaranteed to lose money, no matter what happens. According to the argument, you are therefore not rational
        if you don’t bet in a Bayesian way. But of course, one is not forced to place bets on all possible outcomes.

        Of course, one is not forced to be rational.

        Unrelated to all things Bayesian, I also find the following fragment puzzling:

        > The usual model of tosses as fair and independent implies that all 2^n possible sequences of heads and tails in n tosses of a fair coin are equally likely, implying probability distributions for the lengths of runs, the number of heads, etc. If you compare this model to data, you will find that if the number of tosses is sufficiently large, the model does not fit accurately. There tends to be serial correlation among the tosses. And the frequency of heads will tend to be ‘surprisingly’ far from 50%. [footnote: While imprisoned by the Nazis in World War II, John E. Kerrich tossed a coin 10,000 times; it landed heads 5,067 times.]

        The footnote doesn’t seem to give an example of the model not fitting accurately when the number of tosses is large (p-value 0.16).

        • Yes, I don’t understand the coin footnote either. Aside from the fact that no sensible person actually believes a real coin is ideal, the WWII prisoner anecdote seems not to convey the lesson the author thinks it does. For 10,000 flips of an *ideal* coin, we’d expect 5067 heads or more about 9% of the time, I think, and we’d expect an excess a of at least 67 heads *or* tails from 5000 about 18% of the time — not outlandishly improbable enough that I’d conclude that the coin is unfair.

      • I don’t follow Stark’s division between “aleatoric and epistemic uncertainties”, aka random processes and uncertainty arising from ignorance. His go-to example is the coin toss, which he characterizes as aleatory. But earlier, he remarks:

        For instance, coin tosses are not random. If you knew exactly the mass distribution of the coin, its initial angular velocity, and its initial linear velocity, you could predict with certainty how a coin would land.

        Enter the theory of nonlinear dynamic systems, aka chaos theory: even moderate uncertainty in measuring these parameters results in a seemingly random outcome via a thoroughly deterministic process! And with that, his distinction between randomness and uncertainty comes crashing down.

        The fun thing is that we’ve been able to model coin tosses as random quite successfully, even though the toss isn’t actually a “random mechanism”, but merely one that amplifies uncertainty. The strength of statistics lies in identifying similar processes (my coin toss is similar to yours, and all the ones that have come before), and using that similarity to reduce uncertainty without actually understanding the underlying process completely. (It’s been known that betting on a coin toss gives even odds long before chaos theory.) Obviously you have to consider whether this similarity holds, or you’ll be taken by a cheat who uses a double-headed coin — or by a researcher who chooses an unsuitable statistical model.

        And there’s obviously some uncertainty on how suited a particular statistical model is to the process it’s supposed to modeling. But I don’t believe that there is a fundamental difference between “randomness” and that kind of uncertainty.

        P.S.: Having read Feynman’s account of the Challenger investigation, and how probabilities for launch success were arrived at, this is so far departed from reality it’s funny:

        A big difference between PSHA and these other applications is even if you’ve never launched a vehicle to the moon before, the spacecraft is an engineered system and you have a pretty good idea of its properties, the environment it’s operating in, and so forth and so on.

        P.P.S: Andrew’s next blog post on Quine digs deeper at these issues, if you identify the model with the “man-made fabric” of knowledge: if all you ever do is adjust some parameters and never question the model, you may be making bad choices.

        • Mendel:

          Yes, the Stark article has the fatal combination of: (a) the author writing about something he doesn’t know about, (b) him not realizing he doesn’t know what he’s talking about, and (c) he didn’t send it to anyone who could’ve informed him about (a) and (b).

        • It is not surprising for Andrew – and others on this blog – to focus on Stark’s comments on Bayesian analysis. I won’t address that as I have nothing to contribute compared to everyone else. However, I viewed Stark in a much more general way. Note that his first example was cost-benefit analysis, something that I can speak to. And, I found his critique on point about that. The requirement that all benefits and costs be reduced to a single dimension is neither obvious nor innocent. As with your discussion about Bayesian analysis, there are plenty of examples of cost-benefit analysis done well, and there are many examples where it has been harmful, where the analysts have gone well beyond what their “science” really allows. I viewed Stark has raising these issues in a much more general way, across all disciplines. In that, I found it a worthwhile contribution. I think his overstating his position ultimately got in the way of his points. His views are so extreme that they cause everyone to think about how unfair his conclusions are. But, he is reacting to the automatic, and often subconscious, application of principles (like cost-benefit analysis and probability models) without an appreciation for their underlying assumptions. Of course, many people are not guilty of this. But much of our training has made us prone to the problem.

        • Thanks Dale for stating all that…

          So many of the preceding comments seemed to be blowing up Stark’s side attack on Bayesian methods as if it were the whole paper, rather than sticking to the central issues he raised and you and I both recognized as his main points. His paper is not about attacking Bayes or promoting frequentism, even if passages can be taken that way (and thus veered the discussion toward a political rally against it).

          Focusing on the paper’s shortcomings is unfair, as its main points are general. Those points are close to those many of us have made at times and bear repeating (as the literature is filled with examples like those it decried). Sure there are some passages we think are misguided, but overall the main points are well taken and well made for a general audience. As Rothman would often say, “Don’t let perfection be the enemy of the good” (or the Good, which is a related story…).

        • Sander:

          I don’t know how you can state so confidently what Stark’s paper is “about.” He spends some time in the paper with uninformed attacks on Bayes, attacks that he easily could’ve left out of the paper, or he could’ve run the paper by someone who actually does Bayesian data analysis to get a second opinion. He chose not to do so. I’d be happy if he were to re-release a paper without that stuff and then we could discuss the merits of what remains.

        • Andrew: My confidence comes from having read the paper carefully. The paper states what it is about on p. 1 and doesn’t even mention Bayes or subjective probability until page 3. Little of the paper is about Bayes. The main examples it attacks aren’t formal Bayesian analyses, and its recommendations and conclusions don’t mention Bayes either way but focus instead on basics that apply to all modeling.

          From p. 1 it’s about
          “the role of statistical models in science and policy. In a vast number of cases—most, perhaps—impressive computer results and quantitative statements about probability, risk, expected costs, and other putatively solid facts
          are actually controlled by a model behind the curtain, a model that nobody’s supposed to look at carefully or pay attention to. What appears to be impressive ‘science’ is in fact an artificial amplification of the opinions and ad hoc choices built into the model, which has a heuristic basis rather than a tested (or even testable) scientific basis… the models have the power to persuade and intimidate, but not the power to predict or to control…Relying on ungrounded, untested, ad hoc statistical models is quantifauxcation, a neologism for the process of assigning a meaningless number, then pretending that because the result is quantitative, it must mean something (and if the number has six digits of precision, they all matter). Quantifauxcation usually involves some combination of data, pure invention, invented models, inappropriate use of statistics, and logical lacunae. It is often involved in ‘informing’ policy.”

          Then, among its many recommendations at the end is
          “Be skeptical of hypothesis tests, standard errors, confidence intervals, and the like in situations that do not involve real randomness from random sampling, random assignment, or random measurement error. The numbers almost certainly result from ‘cargo-cult’ calculations and have little to do with how well the data support the conclusions.”

          The only inference I can draw from your obsessing with its comments on subjective probability is that you didn’t read through the entire paper carefully and dispassionately (although I expect you think you did). Instead you have failed to note that he has plenty of contempt for common misusage of frequentist modeling and that the paper is about a foundational problem that afflicts all types of statistical modeling for policy.

        • Sander:

          Stark writes that we should be skeptical of probability statements in “situations that do not involve real randomness from random sampling, random assignment, or random measurement error. The numbers almost certainly result from ‘cargo-cult’ calculations and have little to do with how well the data support the conclusions.”

          “Almost certainly,” huh?

          What the hell does he know about that? Where does his “almost certain” come from? Does it apply to the study of pharmacokinetics and toxicokinetics, where there is typically no random sampling or random assignment, and where the biggest uncertainties do not come not from random measurement error? Does it apply to analysis of public opinion, where survey response rates are regularly below 10%? Does it apply to the study of environmental risks? Does it apply to the analysis of educational testing? Image reconstruction? Sports analytics? Business forecasting? Etc etc etc?

          As a person who happens to do statistics in areas where random sampling, random assignment, or random measurement error are not central, damn straight I think it’s b.s. for Stark to call this “quantifauxcation.”

          I’d just love it if the statistics department at the University of California were to issue an official statement that the statistics they teach does not apply to various problems in biology, psychology, business, image analysis, and so forth. Perhaps they could remove the name “statistics department” and change it to the “department of random sampling, random assignment, and random measurement error.” And make it reeeallly clear to their colleagues in other departments that they don’t want to teach any students who might possibly want to apply statistics to surveys with response rates below 100% or business forecasting, or differential equations in biology, or all sorts of other things.

          I’m fine with certain statistical models using exact random sampling etc. used in some limited set of problems, and of course I’m a big fan of the work of you (Sander) and others on modeling biases in measurement and estimation. I’m not fine with Stark trashing most of applied statistics based on, among other things, an uninformed conception of Bayesian inference, which happens to be one of the tools (along with cross-validation etc.) that we can use to handle these sorts of uncertainties.

          Also, to say I’m “obsessing” over Stark’s comments on subjective probability . . . that’s ridiculous. There’s lots about the article I don’t like. That’s not “obsessing.” I can contribute to the discussion of an article by pointing out where it went wrong. Similarly, if you see a mathematics article with 100 formulas and you see two that are clearly wrong, it makes sense to point them out. That’s not “obsessing”; it’s scholarly criticism. I trust that when I make mistakes in my published work, people will point out those mistakes. When they do so, I don’t say they’re “obsessing,” I thank them for showing me where I went wrong. There is no obligation in scholarly or scientific communication for readers of a paper to be “dispassionate.” We should just be accurate.

          I agree with you that Stark has some good points in his article and I hope he can extract them and write a new, shorter, version, with the good points and without the errors. It might help for him to bound his criticisms and make it clear that he does not consider all of psychometrics, pharmacokinetics, business forecasting, sports modeling, etc. to be “quantifauxcation.” He might also consider that statistical methods are widely used by the U.S. government and the opinion polling industry despite the fact that response rates are well below 100%.

        • The “misguided passages” can be a red flag. How much should we trust the examples and explanations about things we don’t know much about? While this doesn’t make the arguments in the rest of the paper wrong wrong, it surely doesn’t help to build credibility either.

          This quote of Michael Crichton came to my mind:

          “Briefly stated, the Gell-Mann Amnesia effect is as follows. You open the newspaper to an article on some subject you know well. In Murray’s case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the “wet streets cause rain” stories. Paper’s full of them.

          In any case, you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate about Palestine than the baloney you just read. You turn the page, and forget what you know.”

        • Andrew: This is getting more bizarre with each round – it’s turning into a “blind men describing an elephant” tale. And now you’ve brought in all the UC stat departments even though the paper has just one author from one of many UC campuses (Berkeley’s, the department that hired a black Bayesian named Blackwell back when blacks and Bayesians were almost nowhere to be found in American math or stat departments).
          I gave this quote from the paper:
          “Be skeptical of hypothesis tests, standard errors, confidence intervals, and the like in situations that do not involve real randomness from random sampling, random assignment, or random measurement error. The numbers almost certainly result from ‘cargo-cult’ calculations and have little to do with how well the data support the conclusions.”
          My point was that this quote was being critical of standard frequentist statistics. But you quoted it back as:
          “Stark writes that we should be skeptical of probability statements in “situations that do not involve real randomness from random sampling, random assignment, or random measurement error. The numbers almost certainly result from ‘cargo-cult’ calculations and have little to do with how well the data support the conclusions.” ”
          as if it were a frontal attack on our work with Bayesian methods for nonrandomized settings. But hypothesis tests, standard errors, and confidence intervals are standard frequentist methods that are forced on most researchers from Stat 1A class to publication time as if they are the core measures of uncertainty – or worse, are deployed as measures support when their foundational logic is entirely refutational.
          So do you actually disagree with the actual quote? For that matter, do you actually disagree with your rewrite of it? I don’t, at least if I expand the start to
          “Be skeptical of probability statements until you have taken the time to go through their derivation carefully and can see their justification within the application context, especially in situations that do not involve real randomness from random sampling, random assignment, or random measurement error.”
          Do you disagree with that?

        • Hi Sander, speaking just for myself (obvs) I like Stark’s central message which seems to me to be “be skeptical of any and all models”. I think we could also formulate as, “beware the numerous assumptions invoked when applying ideas from probability and statistics to the real world”.
          What I see you driving at here is that all of our methods for characterizing/quantifying uncertainty (whether standard errors, or posterior distributions or whatever) rely on the invocation of certain assumptions in the background. I don’t think Andrew or anyone else disagrees.
          What’s interesting to me is that despite all his special vitriol for Bayes (yes I see where he critiques classical stats too but he clearly has a thing out for Bayes), this perspective is already there within the Lindley/De Finneti ‘subjectivist’ (I hate those terms but whatever) approach: “Probability does not exist!”

        • Sander:

          That’s great that UC Berkeley hired David Blackwell in the 1950s. That department continues to have some excellent researchers. I don’t really think they would issue a statement that the statistics they teach does not apply to various problems in biology, psychology, business, image analysis, and so forth, nor do I think they’re planning to rename themselves the “department of random sampling, random assignment, and random measurement error.” My point was such statements are the logical implications of Stark’s article and his use of terms such as “almost certainly” and “quantifauxcation.”

          You rephrased Stark’s statement as:

          “Be skeptical of probability statements until you have taken the time to go through their derivation carefully and can see their justification within the application context, especially in situations that do not involve real randomness from random sampling, random assignment, or random measurement error.”

          I would remove “real randomness from” in your sentence. Otherwise I think it’s good, and I agree with it. If Stark had said that, and had not had all the anti-Bayesian stuff (which bothers me not so much by its attitude but rather because it’s so ill-informed), then I’d be cool with his article.

        • Chris: Thanks for those comments. For those who don’t know, not unlike I Andrew I spent 30 years writing about, defending and teaching Bayesian methods (well again, mostly semi-Bayes in recognition of the practical limitations of pure Bayes), all for nonrandomized settings. So the right question would be why unlike Andrew I would find Stark’s article mostly meritorious, despite what Andrew rightly points out is its unfair treatment of Bayes. You saw why: It doesn’t give any quarter to bad frequentist modeling either. Bad modeling has spread like a pandemic, likely driven by the advent of rapid user-friendly software (to which both Andrew and I have contributed) right there on our terabyte-drive laptops.

          I like Andrew’s suggestion that a revision of the paper to be more Bayes-accurate would be good, but I don’t see that as coming and the paper is still a good read. And Bayes deserves special treatment: As I said in my comments above, when we turn to Bayes methods or frequentist analogs like penalization, we take on the responsibility for careful contextual justification of the priors, along with the data model. It’s not really that hard to do technically, it just requires time to read the contextual literature in more detail than many can afford. Instead some Bayesian literature falls back on elicitation, which I see as a disaster in guaranteeing we mix into our results investigator prejudices, cognitive biases and statistical misconceptions (like claiming “this coefficient is almost certainly zero” because all past studies reported “no association” because p>0.05 but a summary interval shows a substantial association is far more consistent with prior observations).

          So my point is that Bayes promoters like us should stop whining about unfair attacks, take responsibility for Bayesian as well as frequentist method abuse, and campaign for radical change in how all these methods are taught and enforced by journals.

          Academic/philosophical footnote: As for “probability does not exist”, I think a careful reading of DeFinetti reveals a much more nuanced background for this catchy slogan (which I have repeated at times). It’s a draw into a fascinating thinker, but it alienates many who can point to the Born rule in quantum mechanics and the frequency behavior it predicts with staggering accuracy to say “you expect me to read this guy?” One answer is to respond with “yes, it leads into the Qbist interpretation, which is one fair contender among the many”. And that’s exactly what the radical subjective interpretation of probability is, as are other Bayesian and frequentist interpretations in general: Fair contenders among many, each with many subgroups that have their own pros, cons, and uses. As always I argue that the key to better science and application is to be able to shift freely among these often-warring and yet often parallel perspectives.

        • Hi Sander, thanks for the thoughtful engagement. In terms of practice, IME in many/most applications it is not really all that hard to come up with a reasonable prior from a bare-bones regularization point of view. Moreover, rather than leading to more spurious results, my observation is that being forced to specify a prior, and having some kind of regularization in there (any kind!) leads to more conservative, sensible inference most of the time. For complex models, especially those suffering computational problems, I often find that the prior model specification is where I learn the most. It tells me, conditional on this model and these data, where some qoi has to be for anything to be sensible, or to avoid multi-modality, or whatever. Sometimes this is something I really would rather not have be the case, that other methods/procedures would have swept under the rug! Prior predictive modeling is a formalization of this stage that seems to me increasingly indispensable in complex applications.

          I believe it is reasonable to do some prior sensitivity analyses, especially where strong disagreements about some quantity of interest are well-grounded. Again, this undermines the interpretation of a Bayesian procedure as leading to *the one and only* posterior probability distro but rather it is *a* posterior distro, conditioned on whatever assumps one wants to make.

          I’m not really very sold on QBism, I just see probability as a formal tool for quantifying uncertainty. In my book, it is very powerful but not Universal or the only way to think about uncertainty, but then again I am an anti-Platonist (really anti-Pythagorean ;)) when it comes to math so…

        • > I’m not really very sold on QBism,

          It doesn’t seem fundamentally different from the usual representation of partial knowledge about the quantum state in the form of a density matrix.

          “Quantum measurement is nothing more, and nothing less, than a refinement and a readjustment of one’s initial state of belief.”

          That refinement is the Bayesian updating of our knowledge about the underlying state, the readjustment is the change in the state produced by the measurement. When we have a pure state, “we learn nothing new; we just change what we can predict as a consequence of the side effects of our experimental intervention. That is to say, there is a sense in which the measurement is solely disturbance.”

          I don’t say this is not the right way to look at the uncertainty in the description of a quantum system. Quite the contrary, it’s the right way to look at the uncertainty in the description of physical systems in general. But the “collapse” of a wave function describing a pure state into another wave function describing a new state following a measurement seems exactly the same as in the standard QM description.

        • Or maybe that was Quantum Bayesianism and QBism is something else… those quotes are from an old paper from Fuchs. I have to confess I understand less and less when looking at more recent articles and I don’t even think there is a “definitive” description of it.

        • My favorite example of the non-difference between aleatoric and epistemic uncertainties in some cases is one that everyone else seems to take as an example of the difference: the Ellsberg Urn paradox. For those unfamiliar, the question is do you require less of discount for uncertainty for betting on the color of a ball pulled from an urn which is known to contain 50 black balls and 50 white balls than you do from an urn which contains black and white balls in some unknown proportion. From the first time I ever heard this supposed paradox (in grad school) it seemed to me so obvious that indifference between the two scenarios was so obvious as to make me question the rationality of people who demanded the bigger discount for epistemic uncertainty.

        • Jonathan:

          I’ve written about the distinction between aleatoric and epistemic uncertainties. Perhaps I should post something again for people who haven’t seen it. The basic idea is that one can distinguish between these ideas, and intermediate states, by considering intermediate information. A hard-line position such as Stark’s can be appealing in some contexts, but for most of the problems of measurement, sampling, and causal inference that I’ve worked on live, proababilities live in an intermediate zone. Stark’s principles apply to a very small principality within the kingdom of statistics, and they exclude just about all of sports analytics, most of business analytics, economic forecasting, polling, pharmacology, educational testing, image recognition, signal processing, and lots and lots more.

        • Mendel, more generally if you follow De Finneti at all the distinction between aleatory and epistemic uncertainty is immaterial given exchange ability which Stark dismisses out of hand (without a rationale).

  10. Well whatever this study says the case for mask wearing has always been marginal and this study hasn’t done anything to imrpove that. Should people wear masks? Do they impart some marginal degree of effectiveness? Yes – as Matt Skaggs said, at least some marginal degree of effectiveness is a reasonable prior. But can you hang out with covid-infected friends and work out and expect the mask to protect you? No.

    The simple reality is that aerosol virus particles are significantly smaller than the openings in most non-medical grade masks. Sure, any mask will at least temporarily reduce the flow in or out. But if the local air is saturated with aerosol virus particles, then a non-medical mask simply won’t be effective for very long.

    Daniel Lakeland argued here several times that the mask will be effective against outflow. I believe this claim was based on the presumption that the virus is transferred primarily in large droplets, and thus the mask prevents expectoration of such droplets. I don’t need a study to agree with that. However, if a person is emitting a high concentration of aerosolized particles that are significantly smaller than the weave of the mask, again, the mask won’t be effective for long.

    So, again, a low-grade mask is probably useful in that it’s cheap and has some marginal benefit. However, it’s surely not a panacea.

    The really unfortunate thing about this entire discussion is that we could have produced a lot of medical grade masks since June, but…we didn’t.

    • “we could have produced a lot of medical grade masks since June, but…we didn’t.”
      A scandal, part of the broader scandal, of these years, which …. which absolutely beggars description!

    • “that aerosol virus particles are significantly smaller than the openings in most non-medical grade masks”

      That isn’t an impediment to the physics of how masks work. See e.g. this post I wrote, or better yet the things cited in it.

      Capture efficiency is high for very small particles, thanks to the wonders of Brownian motion.

      • Well I guess the study cited in this article seems to argue against that, because it showed no effect. You can quibble about whether it’s possible that under some contrived condition masks have a large effect. But under average conditions of the average person, they have no effect. This study couldn’t produce data that showed an effect.

        Let’s not get ourselves tied in knots being hopeful: medical personnel wear medical grade masks for a reason. Right?

        • Nearly all of the situations in which I have worn a mask are also situations where I have physically distanced myself at least 6 to 8 ft. I am speculating that most wearing masks also are mindful of their distance. Physical distance seems to be the critical variable.

          Then again, I may be making a point that is of little relevance to this thread. LOL

        • Yeah, my feeling on the masks is that they are worth using, but I’m not at all convinced that they are effective enough *to go into situations I wouldn’t otherwise go into*.

      • “That isn’t an impediment to the physics of how masks work”

        +1. Really liked the blog post you wrote! I knew about the electrostatic attraction, but the Brownian motion part was new to me.

        Based upon the physics, I’m surprised that cotton masks seem to do so little.

    • Speaking of incorrect priors!
      1. As others have pointed out “aerosol virus particles are significantly smaller than the openings” completely misrepresents the actual physics of how masks work, even non-electret masks. These aren’t sieves! Either inertial or Brownian deviation from streamline flow will cause impact and usually binding.
      2. You don’t seem to realize that there’s a major difference between the size distribution of exhaled and inhaled particles. At this scale evaporation very quickly shrinks the particles whenever the relative humidity is not close to 100%, i.e. almost always. So catching a bunch of big droplets on the way out drastically reduces the number of small aerosols present.

  11. Come on. Another problem is that people assigned to wear masks often chose not to. Only 46% of volunteers in the mask group told the researchers they followed all the rules about wearing masks in public, 47% said they “predominantly” wore their masks, and 7% said they didn’t follow the rules. The discussion above is counterfactual. What would be the analysis had the people in the treatment group been compliant. They were not.

    PS my source for this is https://www.latimes.com/science/story/2020-11-20/face-masks-didnt-stop-coronavirus-spread-in-danish-clinical-trial

    • You beat me to it by a few minutes. The simple arithmetic of the Wald estimator suggests that the true effect of wearing a mask is much, much higher than the reported effect!

        • Correct. The estimate represents the average causal effect of asking someone to wear a mask, regardless of whether or not that person is currently wearing a mask or if they follow the advice of the researchers.

          The frustrating thing is that it appears that the researchers had the ability to calculate an estimate of the effect of wearing a mask (among those who followed the advice), but they chose not to calculate it or the data that we would need to calculate it for them. It’s a pretty dramatic failure, so much so that I suspect I must be missing something.

        • The researchers did include the rate for those wearing the masks who complied fully (46%) as 2.0%. The source I’m using is https://www.acpjournals.org/doi/10.7326/M20-6817

          One thing that jumped out is that’s a higher rate than the intervention group overall, using the paper I think the rates by adherance are 2.0% full compliance (22 out of 1100ish), 1.6 “predominantly” (18 out of 1124ish) and 1.2% did not wear a mask (2 out of 167ish). This could be random variation or that categorising adherance based on lowest of 4 weekly surveys with only exactly, predominantly and no as options doesn’t capture accurately the level of compliance by each participant.

          For me the larger issue is that p values and statistical significance aren’t meaningful unless the methods are sufficiently robust. Here most of the analysis includes 18 out of the 95 infections from antibody positivity who did not provide antibody results at the start (unclear if one or both tests from the paper). The home antibody tests had relatively low specificity compared to positivity. The couple of weeks lag from infection to detectable antibodies means the infection detection window contains a large period without the intervention. Taken together it’s unclear what we can learn from the study apart from the importance of good trial design.

        • “The researchers did include the rate for those wearing the masks who complied fully (46%) as 2.0%.”

          That’s not a meaningful subsample, though. This group contains both “compliers” (those who change their behavior as a result of being in the treatment group) and the “always takers” (those who would wear a mask regardless of treatment). In order to assess the effect of masks, we need to estimate the change in infection risk among those who wore a mask because of their inclusion into the treatment group. We can recover this estimate if we knew the proportion of the untreated group that wore masks. Without that information, we cannot calculate the effect of masks.

        • Do we know whether the people who complied more, complied more because they were more likely to be in circumstances where infection was more likely?

        • No, but randomized selection into the treatment group means that this won’t have an effect on the estimates. The concern here would be if mask wearing or treatment induced people to change their exposure to the virus. But while this might bias our estimates of the effect of masks on preventing transmission, the estimates would still be relevant for estimating the effect of masks as a policy intervention tool.

        • MJ –

          > No, but randomized selection into the treatment group means that this won’t have an effect on the estimates.

          I don’t quite understand. The sample was randomly selected – meaning that the treatment group was representative. But that doesn’t mean that there wasn’t different behavior with that sample, whereby within the treatment group those who would be at more risk chose to wear masks more. What am I missing? Did they do an analysis to control for that kind of effect – not sure what to call it but maybe interaction effect or effect of a moderating variable?

        • @MJ

          > No, but randomized selection into the treatment group means that this won’t have an effect on the estimates.

          I afraid I don’t understand. The sample was randomly selected – meaning that the treatment group was representative. But that doesn’t mean that there wasn’t different behavior within that sample, whereby within the treatment group those who would be at more risk chose to wear masks more. What am I missing? Did they do an analysis to control for that kind of effect – not sure what to call it but maybe interaction effect or effect of a moderating variable?

        • Joshua:

          If I’m understanding you correctly, you’re concerned that compliance may be a function of underlying infection risk? That’s possible. I think the result would simply be a case of heterogeneous treatment effects. It wouldn’t affect the estimates of the “local average treatment effect” that I’ve been discussing because compliance is baked in to that estimate, but it could be relevant to policy.

        • MJ –

          > It wouldn’t affect the estimates of the “local average treatment effect” that I’ve been discussing because compliance is baked in to that estimate…

          I was over-thinking myself into confusion. Sure, there’s no reason to think that compliance as a function of risk would be different across the arms of the study.

  12. > the data are also consistent with no effect

    Doesn’t the fact that the 95% CI includes +0.4% also imply that it’s consistent with a negative effect of masks?

  13. My take is that people (in general) have trouble grasping that a virus adheres to mathematical forms. That is, though people grasp the concept of modeling transmission, hospitalizations, and deaths, they have trouble grasping this can be modeled because the virus itself follows rules. In this case, when I read the study, I thought: the virus attaches in crowds and it uses those attachments to spread through localized populations where there are sustained contacts. Then I thought: but that’s well known because that’s how viruses always spread … so why is it hard to grasp that masks in public are not the same as longer term exposure and that the virus is constructed that way. It’s two ways of looking at the same field, right: you have contacts over a space, number in and out over time, means a crowd will generate exposures which then spread through family/work/social groups.

    We’re currently living through a vastly predictable surge caused by people getting tired of isolation and doing both layers of group penetration: more exposure to crowds and more exposure to family/work/social groups where longer exposure means if you wore masks, if someone is infected, that will spread through the related groups. That second part I think people get. I dont see much sign that people grasp how simple it is: the same space divided by time and numbers. You can then assign susceptibility and other attributes.

    But then I cant fathom why if over 40% of all reported deaths are from or literally in nursing homes, that we couldnt devote effort to isolating them, because that is literally a case of the virus looking for an entry point into a group.

    So, to me, masks and handwashing prevent the spread of viruses from casual contact, something that is well known, but they dont prevent the spread over prolonged contact and they dont overall have much effect if people engage in their regular behavior because the virus will attach to people so it can get into their localized longer contact groups. And that would be consistent with the potential for a negative effect because, for example, you avoid the person coughing, but you stand next to or touch the same thing as the person wearing a mask, and of course the person coughing may have asthma (like I have cough-variant asthma and sometimes I cough) while the person wearing a mask is actually ill.

    IMO, we saw a perfect example of this form of mental disconnection in NYC: on the same day the Supreme Court ruled that NYC couldnt single out religious worship for restriction, NYC held a staged ‘parade’, reduced in size but still involving way more than 1000 people. Yeah, they wore masks. They also rehearsed in groups for extended periods of time and passed around each for extended periods. So, they literally held a spreader event that consisted of multiple spreader events.

    I know, for example, that TV shows are currently shooting using zones (you cant cross out of your work zone), but they actually rely on $$: if someone in make-up or hair is exposed, that would shut down a show, which means you just threw nearly 200 people out of work for a few works, which matters in TV because most trades/crafts are paid when they work. People are scared to be the one who gets a production shut down. That could cost you your living. Is that standard applicable to NYC holding a parade and saying that’s safe because everyone work masks and they tried to maintain distance rules? It clearly segments.

    Perhaps the problem is that people cant see the math, that they cant understand the virus doesnt think but you can attribute ‘thought’ to it by noting that it follows the same patterns as groups. It’s difficult, at least for me, to identify the exact level where the complication arises; we see a virus that exists as a physical thing and we see that when it exists in people we can project ‘virus in people’, but we dont see (IMO as well) that the virus is treatable as a larger entity which interfaces statistically with the behaviors of its host populations and with their physical world. It survives on surfaces for a reason: it picks up a casual contact and that, in some cases, leads it into a group where it can grow.

    I have to say some things really bother me. Example: I see talk about vaccinating people in nursing homes … if you isolate even reasonably well, you inoculate the staff and keep out anyone not inoculated. Then you can inoculate where there’s an outbreak. You want to inoculate the conduits for transmission into groups, and one reason is that if you inoculate vulnerable groups, then you could (should?) see a surge in less vulnerable groups because they’ll increase the family/social/work behaviors that invite the virus into their local groups.

    Can you imagine actually testing the efficacy of just masks without behavior change?

    Anyway, this is a topic on which everyone has their 2 cents. As in, I dont see lack of compliance as invalidating the findings: if mask wearing really mattered, you’d see it.

  14. MJ to Joshua:
    “If I’m understanding you correctly, you’re concerned that compliance may be a function of underlying infection risk?”

    Infection risk may go up with some who “comply” (because it makes them feel safer and so they take more risks); and infection risk may go down with others who “comply” (because it reminds them of the danger they face and so they take fewer risks). Who knows who is in which group? Maybe these two sub-categories are miscible and we float from one to the other and back and forth as the feeling of vulnerability waxes and wanes from day to day and week to week.

    • Right. It’s probably too much to ask of the data to calculate the ceteris paribus effectiveness of masks. But the authors could estimate the effect of wearing a mask on infection risk (soaking up the behavioral effects of mask wearing), which is probably the relevant policy question anyway. But the authors aren’t even doing this, even though it seems to be computable given their data! It’s very frustrating.

      • Is it possible to parcel out the efficacy of the various interventions and mitigations which seem plainly correlated with reversals in the case rate? For instance, I see reversal finally in North Dakota (where the charlatans must be taking a break from their otherwise tireless campaigns). But what’s driving the reversal? Is it simply that the size and frequency of group gatherings is reduced greatly? Can it be the masks? It’s surely something!

        My you-tube/charlatan-inspired friends assured me at the time, that the first round of improvements (when case rates dropped in the late spring) were due to the natural propensity for epidemics to “peter out”. Some “expert”-of-the-month (a physics professor in Israel) saith. Oh and to back it up, “Dr. Snow didn’t actually *cure* cholera, dontcha know!”.

        • Well pandemics — at least respiratory ones like this* — do “peter out” but not *that* fast — the last five (1889, 1918, 1957, 1968, 2009) lasted 1-2 years, not just a few months.

          It’s possible that people in the Dakotas did finally start taking it seriously. OTOH, the infection rate may actually be high enough there to be suppressing transmission; per-capita deaths are not as high as NY/NJ, but IFR is probably quite a bit lower (NY/NJ seem to have been outliers in the US), so % infected is likely higher.

          *the Black Death had recurring outbreaks for centuries, but that had a non-human-to-human vector via fleas…

        • I guess the South Dakota model doesn’t even work in South Dakota?

          A few months ago you wrote:

          > Maybe hospitals would have been overloaded all over the place if the US had followed a Sweden/South Dakota model. I don’t think we know enough right now to determine where the truth falls on the spectrum between ‘hospitals would have been overloaded in a lot of places on a Sweden/South Dakota model’ and ‘only a few high-density parts of the US were ever at risk for overloaded hospitals

          Where would you say truth is more likely to fall on the spectrum between

          “hospitals would have been overloaded in a lot of places on a South Dakota model”

          and

          “only a few high-density parts of the US were ever at risk for overloaded hospitals”

          given that hospitals are overloaded in South Dakota, the archetypical example of a low-density part of the US on a South Dakota model?

        • And of course, had the hospitals in places like South Dakota been overloaded earlier on, the outcomes would have likely been relatively worse.

        • >>given that hospitals are overloaded in South Dakota

          Are they? The South Dakota department of health page suggests otherwise (34.9% available hospital beds, 26.0% available ICU beds, as of today’s update.)

          Those statewide numbers aren’t necessarily incompatible with an individual hospital being overloaded, though (but surely patients are being transferred?)

          It’s also interesting that SD (which did probably less than any other state) peaked at lower hospitalizations-per-million than New York did. I wonder if that’s just an age effect (since NY was hit before they really knew, older people weren’t more careful, whereas they were in SD) or a density effect (the surge was spread out more so a lower peak of at-one-time hospitalizations) or something else?

        • > OTOH, the infection rate may actually be high enough there to be suppressing transmission;

          On what basis do you say that – given that you have to control for behavioral changes to make that ststenr.

          > per-capita deaths are not as high as NY/NJ, but IFR is probably quite a bit lower (NY/NJ seem to have been outliers in the US), so % infected is likely higher.

          It seems that the decrease in IFR has more or less flattened out since August. And, it may go back up to the extent that now hospitals are flooded and healthcare workers overwhelmed and people are hesitant to seek medical help.

        • >>On what basis do you say that

          Well, I did say “may actually be” (not “is”). There has to be a limit somewhere (reinfection looks to be rare enough to be irrelevant at the population level), and the number of cases per population and the % positive at peak IIRC suggest an infection rate much higher than say NYC in spring.

          And things are now improving without any real measures being put in place (and the weather is still getting colder — now that Sweden has had a large second spike, perhaps their improvement in early summer was seasonality)?

          Of course, that could just be due to individual behavior… I don’t have a good picture of how seriously people in SD are taking it.

          Really, I think both effects have to be in play. Obviously even in South Dakota *some* people are taking it seriously, and obviously more people are immune now than were 2 months ago.

          I suppose the real question is whether the change since say 2 months ago is *primarily* behavioral or *primarily* immunity. I’m kind of skeptical that that many people have changed their minds, given what I see in my area (where we actually do have a mask mandate) — people’s views on COVID seem to be fairly fixed — but who knows.

          >>It seems that the decrease in IFR has more or less flattened out since August.

          Sure, but I was talking about March-April vs. later. The IFR may well be basically the same now as in the July-August summer surge, but still well below the Northeast (especially NYC) in March-April.

  15. I note that this is not the only COVID study where the control infection rates were very low. The same is the case with the vaccine tests. This has two implications. The first is the lower power of the study, as both control and treatment numbers are so low. Another concern is that the people who agreed to be part of the study seem to have a substantially lower infection rate than the general public. I can think of several stories here, mostly that they are more concerned about COVID and thus more careful to avoid infection in many ways. The implication is that they might not be a good indicator of the population as a whole. The result of this study may be that “among people who are particularly careful not to get infected, the additional effect of wearing a mask is not significant”

    • Another concern is that the people who agreed to be part of the study seem to have a substantially lower infection rate than the general public.

      This is not true for this study.
      In April and May, Denmark registered 9000 cases of Covid-19 infection (according to Wikipedia), and Denmark’s population is just under 6 million. That means the average infection rate was 0.15%.
      ~2% is the infection rate observed in the study, it’s ten times higher, not “substantially lower”.

Leave a Reply

Your email address will not be published. Required fields are marked *