What do I think of this Bayesian analysis of the origins of covid?

A colleague links to this recent article, “A Bayesian Assessment of the Origins of COVID-19 using Spatiotemporal and Zoonotic Data,” written by economist Andrew Levin, and asks, “What do you think of this analysis? Is it sound? Does it mean a sensible person should be very convinced it was a lab leak?”

Here’s the abstract of the paper in question:

This paper uses Bayesian methods in conjunction with spatiotemporal and zoonotic data to evaluate the odds ratio for two hypotheses regarding the origin of the COVID-19 pandemic, namely, an accidental laboratory leak of a chimera virus or the transmission of a natural virus from an infected wildlife mammal. The overall Bayes factor is decomposed into 4 components: (1) the odds that the outbreak would occur in the People’s Republic of China (PRC); (2) the odds that the outbreak would occur in Wuhan, conditional on its location in PRC; (3) the odds of observing the spatiotemporal pattern of confirmed COVID-19 cases with no known link to the specific wholesale market where wildlife mammals were being sold, conditional on the outbreak taking place in Wuhan; and (4) the odds of observing the spatiotemporal pattern of confirmed vendor cases at that market, conditional on the outbreak taking place in Wuhan. These four conditional Bayes factors are estimated as 2.3:1, 20:1, 27:1, and 12:1, respectively, and hence the overall odds ratio is 14,900:1, indicating overwhelming evidence in favor of the hypothesis that the pandemic resulted from an accidental lab leak. This conclusion is robust to alternative specifications of the detailed statistical analysis.

My answer to my colleague’s question posed above is, I don’t know.
I corresponded with the author of that paper as well as the author of the paper linked here. It’s hard to compute Bayesian probabilities for this problem, not so much because of the priors but because of the likelihood, which is the probability of the data given the model. One problem is the selection of what is considered to be data, the other problem is that the model (“lab leak” or “wet-market leak” or whatever) is not clearly specified–in statistics jargon, these are “composite hypotheses.” This is not a criticism of this particular paper per se; it’s just a general difficulty with this sort of analysis. It’s not clear to me that Bayesian inference is the right way to attack this sort of problem. But I’ve been intimidated by the technical biological details in all these analyses so I haven’t looked at them personally.

94 thoughts on “What do I think of this Bayesian analysis of the origins of covid?

  1. I’ve tried to explain and frame Levin’s analysis here.

    https://michaelweissman.substack.com/p/big-news-on-covid-origins

    That also includes a couple of points where i disagree with Levin.

    On the composite hypothesis issue- yep, absolutely. I try to do a lowest-order account of that in my own analysis by dividing zoonosis into market-spillover vs. all other types, since a bunch of the evidence (tending both ways) is specifically about the market sub-hypothesis. Also I divide lab leaks into lab-modified or not.

    https://michaelweissman.substack.com/p/an-inconvenient-probability-v57

    Levin’s analysis is specifically about the much-touted market version, not about other less-discussed less specific zoonotic accounts.

  2. Anything claiming to be “Bayesian” but not doing actual Bayesian statistics (fitting models to data and then doing an inference from a posterior distribution, justifying all steps especially the prior) is untrustworthy. I have never seen a convincing example of it, and I don’t think there can be.
    A prominent string theorist argued for the existence of a multiverse with this approach (94% “probability that the multiverse hypothesis is correct”). This was considered a funny joke (rather than a serious argument), even by other string theorists.

    If you want to make an argument for a lab leak, then don’t use numbers that you pulled out of a hat.

    Here is a joking take on it: https://www.smbc-comics.com/comic/bayesian

  3. It’s not easy to assess this stuff since it isn’t science. But I would have thought an expert in Bayesian analysis would be competent to address the underlying “philosophy” of this approach even if the technical biological details are tough. In fact I think this is a problem for public understanding of science, since it suggests (to me) the possibility of dressing up pseudoscience with a veneer of “sciency”-sounding arguments in a long rambling discourse (562 references!) to fool the public, and being allowed to get away with it.

    Can’t help thinking that this approach is similar to those old “Intelligent Design” “arguments”, especially its use for specified events with very low likelihoods – like evolutionary events. As far as I can see, the features of the Covid19 virus are compatible with a natural origin and there are pretty robust evidences for where the virus might have come from and how the particularly (apparently) controversial features might have arisen. These events are improbable in a statistical sense but that’s evolution for you – unlikely things happen and then the probability collapses to 1. Or it could be a lab-leak with a collapsed probability of 1.

    Here’s my cynical view. For whatever reason that I don’t understand, this controversy has become a (typically for the US) factionalised circus with a manufactured “left” (scientists, apparently) created by groups that benefit from cheating the public out of knowledge that would be useful for them (public) to make informed choices. We know that’s the case (a manufactured “left” equated, in this case, to scientists) since Michael Weissman tells us so: ” I think the left, particularly the scientifically oriented side of the left, is not doing our cause any favors by tying our reputation to claims that look highly likely to be false…” (it’s not clear who “our” refers to do, but it certainly doesn’t include me). Since science generally has a strong evidence base, especially as we move through and past the periods of uncertainty within which misrepresentation and conspiracy theorising thrive, anti-science agendas need some way to counter scientific evidence. Global warming anti-science produced quite a bit of pseudoscience (often from emeritus professors, but mostly engineers rather than physicists), and someone has hit on the delightful wheeze that “sciency”-sounding stuff can be dressed in a “Bayesian” wrap to face down the science.

    Actually, the original Richard Muller assertion of a lab leak based on the “unlikelihood” of a particular twin-Arg coding sequence was a sort of “Bayesian” analysis without the numerology – contrive your “prior” (very unlikely this nucleotide sequence would occur naturally), observe the extant reality (the nucleotide sequence is there) and hey presto, it’s a watertight lab-leak deduction! The sad thing about all of this is that it seems (to me) to be an attempt to shut down rational discourse of the sort that we all really need access to.

    • I agree with this! It seems like a great cover for motivated reasoning, use some Bayesian analysis to come to your preferred conclusion. What is missing here is any input from relevant experts (virologists, genomics, epidemiologists, etc) to inform the analysis.

    • Chris:

      I disagree with your statement that this stuff “isn’t science.” It’s some sort of approximate scientific model. It’s not deductive reasoning such as we learn in intro physics class, where we can work out the motion of a pulley or whatever, but it’s still science: it’s reasoning about the outside world based on hypotheses, models, and experiment.

      I do agree that I, as an expert in Bayesian analysis, should be competent to address this approach. It would just take effort on my part. Understanding an analysis done by others takes work! I do it a lot in this blog, so you might say it comes naturally to me. But it doesn’t happen by itself. It takes concentration. To put this in terms that I’ve used in other posts, I can just reel of a comment like this–it’s coming out of my inner chatbot, as it were–but to figure out what’s going on in these covid analyses, I’d have to put in some intense focus regarding these covid models, something I just don’t really feel like doing. If it seemed important enough, or if someone were to pay me enough, or ask me nicely enough, or whatever, I could imagine doing it, just as I could imagine going out to the track right now and running four miles, something I could also do and which would be good for me and then I’d feel better about myself after etc. Instead, though, I’m writing this comment and doing five pull-ups, both of which feel like a pleasant small contribution (to the discourse and to my health, respectively).

      Your second paragraph is a good summary of why I didn’t feel ready to evaluate these covid-odds probabilities: as you point out, much depends on how the hypotheses and data are specified, and that was the point where I was getting stuck and gave up. The Bayesian analyses made me uncomfortable, not for any political reason but because I couldn’t quite pin down what were the hypotheses and data.

      Finally, I disagree with your cynical view. Or, let me say, I’m sure your cynical view applies to some prominent people who have inappropriately politicized covid (I’d say that some politicization is appropriate, for example when appropriating credit and blame for various good and bad decisions made at the time). But I think it’s reasonable for people to put these analyses out there, in a form that epidemiologists can critique.

      • Andrew –

        But I think it’s reasonable for people to put these analyses out there, in a form that epidemiologists can critique.

        Sure, it’s reasonable, but there’s a resulting problem.

        Imagine a scenario where a high profile economist puts out a Bayesian analysis of climate change being a hoax, and says it’s highly probable. And imagine that analysis is based on an assumption that goes against the existing scientific evidence.

        So there’s a ton o’ publicity around that analysis, amplified on Joe Rogan and picked up Trump who says “Some scientists are saying that climate change is a hoax.”

        Or here’s an example where a Civil Engineering academic posts a published theory on a high profile blog
        that climate scientists have the causality reversed regarding CO2 in the atmosphere and climate change:

        https://judithcurry.com/2023/09/26/causality-and-climate/

        The difficulty here as a system is that it then becomes incumbent on climate scientists to go around playing whack-a-mole.

        I don’t know of, and don’t advocate for any way to prevent this. But there’s an increasingly troublesome systemic issue. In some ways breaking down preexisting barriers to communication of science is great. But it’s also, imo, in balance having a harmful impact.

      • Fair enough – it would be useful though to have some informed view of the use of Baysian analysis for comparing “probabilities” for disparate events that must have quite different probabilistic underpinnings.

        For example there is some evidence that the Covid19 sequence might have arisen as a result of genetic recombination amongst precursor viral species (or even precursor viral and infecting bacterial species). The likelihood of the particular recombination event(s) that produced the exact Cov19 viral sequence is small in itself, but it seems to me that this likelihood is small in the same way that producing a specified deal from a 52 card deck is small – there are gazillions of potential recombination encounters in the wild that might eventually interface with human populations and the likelihood of recombination events will actually be high in the same way that the probability of producing some deal from a 52 card deck – after all some deal must be produced and this only appears unlikely if a particular deal has been prespecified. The probabilistic assessment of a “lab-leak” is of quite a different nature and it’s not obvious how one can meaningfully rank these “probabilities”.

        I might also add, addressing those familiar with the nitty gritty details of this controversy (which I am not) that there are some features of the Covid19 sequence that seem to me to argue against a lab-leak. For example the glycosylation sites are rather odd from a lab-leak perspective. Maybe someone who has studied this controversy could indicate whether there is any evidence that the Wuhan virologists engaged in introducing glycosylation sites and what the evidence is for this – Michael Weissman perhaps if you read this?? I’m curious to explore this from the scientific point of view

        • Chris wrote: “I might also add, addressing those familiar with the nitty gritty details of this controversy (which I am not) that there are some features of the Covid19 sequence that seem to me to argue against a lab-leak.”

          I don’t understand. Is it not the case that one lab-leak scenario is that the following occurred: (1) researchers gather a sample of a virus and take it to their lab; (2) researchers make many copies of the virus; and (3) researchers drop a flask/leave a door open/fail to properly decontaminate/etc and the virus, essentially unchanged from its wild configuration, is released into the community?

          Here’s a discussion of how such releases can occur: https://kffhealthnews.org/news/article/lab-leak-biohazard-wastewater-book-excerpt-pandoras-gamble-alison-young/

          Bob76

        • Bob76 –

          I don’t understand. Is it not the case that one lab-leak scenario is that the following occurred: (1) researchers gather a sample of a virus and take it to their lab; (2) researchers make many copies of the virus; and (3) researchers drop a flask/leave a door open/fail to properly decontaminate/etc and the virus, essentially unchanged from its wild configuration, is released into the community?

          I’d that’s what occurred, would you not expect to see an early outbreak centered at that lab, and among lab workers, and their families?

          It seems there is some scientific dispute as to whether the early outbreak was centered at the market, and I’m not able to really analyze that dispute, but as far as I know there’s no clear indication of an early outbreak centered at the lab or among lab workers and their contacts.

        • Bob76- That was indeed a highly plausible hypothesis early on.
          It’s been superceded by ones involving some lab modification, mainly for two reasons.
          1. The very recent sudden insert of an unusually coded FCS. That fits known lab plans quite well but is a poor fit for zoonotic evolution. (This is a real factor but often overstated in LL cases.)
          2. The striking similarity of a restriction enzyme segment pattern to the ones used in artificial chimeric viruses. The pattern is found in ~1/600 natural coronaviruses. By itself that at first seemed too shaky to use. Then further drafts of the DEFUSE proposal were found, including plans with stunning similarity to the observed pattern. 6 segments, check. Commercial vendors which means each segment should be under 8knt, check. Recognition sites left in for making future swaps, check. Continued WIV use of BsmBI/BsaI enzyme pair. Sort of check- the draft budget mentioned only BsmBI.
          So sequences fiddled with in the planned way have become more probable than straight gathered sequences.

        • Regarding Weissman’s reference to the restriction sites, this was a poorly received paper by authors who also think that SARS1, Omicron, Ebola and a host of other epidemics are lab-engineered or lab-leaked. Even pro-lab leak scientists (Francois Balloux and Alina Chan) find this paper to be weak. As Alex Crits-Christoph describes in the series of comments below, not only are the restriction sites (sequences marking where the genome is cut by the molecular scissors) found in other naturally occurring viruses, unrelated nearby sites are also found to be occurring naturally. “For the engineering hypothesis, this would have to imply that someone not only modified the RE sites to match natural viruses, but also unrelated nearby sites as well – an even more ludicrous proposition that I do not think even these authors can defend.” https://www.biorxiv.org/content/10.1101/2022.10.18.512756v1#comment-6019875558
          Faced with this, lead author Bruttel resorted to saying that the other genomes were published by Chinese researchers to cover up their tracks with regards to SARS2. For good reason, leading coronavirologist Ralph Baric called this work “pathetic” in his Congressional testimony.

        • Response to Michael at 11:59. I was responding to the statement that some sequence info argues against a lab leak. My point was that I expect that labs contain viruses that are identical/highly-similar to their wild cousins. So, finding wild-like virus in a lab leak seems quit possible.

          I don’t know enough to get into issues like the FCS. I understand how it can be regarded as evidence for the lab leak hypothesis.

          Re Joshua at 11:15.

          I agree that it would be more likely that someone often at or near the leaking lab would be infected than would a person who only walked by once (on his way to the market). Of course, if you look at the Fort Detrick event that I cited, it was the people downstream in Fredrick, Maryland who were exposed, not the lab workers. Similarly, if soot particles covered with virus went up a lab’s chimney the human exposure might occur miles away.

        • BW- Those factors, generally describable as either multiple comparison issues or as uncertainty about P(pattern|LL) were in fact serious and account for why I did not initially use the RE pattern for a BF. After Kopp FOIA’d the DEFUSE drafts, the situation changed. Instead of a diffuse set of lab possibilities that might overlap the observed pattern, it became a well-specified lab plan that hit the zoonotically rare pattern right on the dot. That’s when I started using it for a BF, quite conservatively. Your points are seriously out-of-date considering this firm new evidence.

          I’m not particularly interested in the ad hominem arguments since the sequences speak for themselves.

        • Appending the following to Weissman’s comment about the six segments because this was posed as a question to Baric in his Congressional testimony.

          2876 So the first thing, what these are — these
          2877 lines describe naturally occurring BsmBI sites in the SARS
          2878 coronavirus 2 genome. Now, one of the first things you
          2879 notice is that those same sites are present in many of the
          2880 bat strains that exist. So if they are engineered, if you
          2881 use them to engineer SARS2, they wouldn’t normally be in the
          2882 same location in the bat strains.
          2883 The second thing is, they do count six pieces, but one of the
          2884- pieces is about 8 KB and the other is about 300 base pairs.
          2885 If you look at any of the molecular clones that I’ve
          2886 engineered, with SARS, they’re usually 5. KB apart, so that
          2887 you have five or six KB pieces that you can work.
          2888 Having a tiny little piece like that, if I looked at it, that
          2889 would irritate me, like, to no end, and we would silence it
          2890 one of those sites. And then separate this, so that the
          2891 fragments are of equal size. The first size piece is also
          2892 too small, and so it leaves larger pieces, and the larger
          2893 clones are unstable with passage.
          2895 ASo you would want it more equally distributed,
          2896 unless there was a region that was super toxic. If there was
          2897 a toxic region, then you would have a little piece. There’s
          2898 no toxic site there.
          2900 ACSo this is biostatistical BS, in my opinion.
          2901 And they come up and say that the pattern here is unique, and
          2902 they do that by comparing most of the pattern to clade 2 and
          2903 clade 1B coronaviruses.
          2904 So the statistical number that they have for the ones that
          2905 are far away is much more, and it gives them statistical
          2906 power to make the claim that it was engineered.
          2908 AAnd it’s a pathetic piece of work. By the
          2909 way, you can see how I engineered the SARS-CoV-2 genome since
          2910 it’s published, and you will see that it’s completely
          2911 different than this.

          So the claims Weissman is making are these. That there are six segments shows that the virus was engineered. But, it was done in a low probability way from how Baric as a coronavirus engineer would do it. Although, there are naturally occurring patterns that match the SARS2 virus. And by extension, Baric is in on the conspiracy. And per Bruttel, the Chinese researchers who published the other genomes, and incidentally ahead of Bruttel’s paper, are in on the conspiracy with the WIV as well in anticipation of an eventual such paper.

        • BW- Here’s another way to look at the issue, capturing the actual chronology. Instead of calculating P(pattern|LL)/P(pattern|ZW) let’s calculate
          P(DEFUSE draft|LL)/P(DEFUSE draft|ZW) since the draft was observed last.
          All those comments you mention ridiculed the idea that any lab would put together a chimera in anything like that way. How likely was it WIV had even been thinking of chimers? If they were, wouldn’t they almost certainly have used easier no see’m wiping out the recognition sites? I checked before posting my first version with strong zoonosis proponent Friedemann Weber and he emphasized the improbability of leaving the sites in. He wrote that maybe they’d be left in if the plan was to keep swapping new things in but that was a hassle and would require making sure that the sequences didn’t disrupt the AA coding.
          So I didn’t use any factor along these lines.

          Then much later the DEFUSE draft plans came out to:
          Use 6 RE segments
          for repeatedly making new swaps
          while being careful not to disrupt AA coding.

          What are the odds that exactly those plans would show up for a Wuhan project if the observed pattern wasn’t planned? All those guys who ridiculed the whole idea made it very clear that under the Z hypothesis they thought the likelihood of there being any such plans was negligible.

          BTW, unlike for some other features there’s no natural selection argument to confound things. There’s strong selection for this pattern in a lab chimera scenario but, unlike FCS features, it’s entirely transparent to natural selection.

        • Weissman seems to be saying that they left the restriction sites in. And I’m saying that SARS2 has restriction site sequences identical ot other viruses in nature. It wasn’t left in in SARS2 anymore than it was left in in the other viruses. There must be something about the lab leak position I can’t understand here, or it involves an even more elaborate conspiracy. Most likely the latter.

        • BW- As I discuss in painful detail in my blog, even if one were to grant that all the RE sites appearing in related sequences that happened to be present or absent in SC2 occurred naturally one still gets a low probability that the right type of pattern would emerge. It’s bigger than 1/600 = 0.0017 but less than 0.028. I ignore all simulations that give less than the empirical 1/600. You only get to 0.028 by starting with a specially selected Crits-Christoph alleged ancestral sequence. Starting with a common ancestor found without trying to maximize this probability gives 0.0066. In my Bayes factor I use an intermediate value 0.0065 that should be reasonably close.

          As for Baric’s tastes, it seems this wouldn’t have been his favorite recipe, especially for a one-time sequence. Most likely he didn’t make it. For anyone who wanted to swap things in and out piece by piece it helps to have all the BsmBIs on one side and the BsaIs on the other, so you can do some swaps without digesting the whole thing. The segment in between, the one that irritates Baric, would only be reachable by double digestion, so anyone explicitly planning repeated swaps would want it to be small and boring.

        • Weissman makes a concession that the SARS2 ‘engineering’ is not to Baric’ taste and so he probably didn’t do it. Yet, apparently Bruttel and co discovered a rare pattern. The comments to the paper are revealing. “One criterion for IVGA is even spacing of cut sites, yet the SARS-CoV-2 BsaI/BsmBI fingerprint includes a tiny 643nt fragment within nsp13/helicase. This region is identical to SARS except for a single I->V mutation, so why would anyone engineering this virus want to separate this fragment to then do nothing to it? I prefer to believe this is just a cherry-picked pair of restriction enzymes and random noise. One could argue that this is similar to the less even iWIV1 cuts (Figure 2B), but iWIV1’s shortest fragment is still much larger than SARS-CoV-2’s and is only so short because of plasmid instability reported by Zeng et al. This unstable region is, however, very far from SARS-CoV-2’s short fragment, so again I see no reason why anyone would need this short fragment for engineering.”
          So, it is not just Baric who finds the very short fragment problematic for an engineered virus. But this is how Weissman applies Bayesian reasoning. Lab leak preferred narratives get high weighting. The vast number of factors favoring natural origin get low weighting. Two words describe this approach – motivated reasoning.

        • The Bruttel paper argues that the maximum fragment size in SARS2 is unusually small. A second comment addresses this issue “The distribution of random fragment lengths is a beta distribution. (Roach JC. Random subcloning. Genome Res. 1995 Dec;5(5):464-73. doi: 10.1101/gr.5.5.464. PMID: 8808467.) Maximum fragment length is not a robust statistic – it has high variance” . In response, co-author Washburne resorts to special pleading about maximum fragment size being a relevant statistic only for engineered viruses. But at the same time, this special pleading was not used to address that there is that tiny fragment that an engineer would not choose. This is purely motivated reasoning and data mining. At this point, all of the other background of the authors also becomes relevant – calling multiple epidemics lab engineered or lab leaked, impersonating mainstream authors in anonymous paper reviews etc.

      • Andrew, but science is always at least partly inductive right? All observations are samples of sorts. We can’t be sure the next observation will be similar. Even Newton’s laws had to be modified eventually.

        I guess we are counting on you as a Bayesian expert to eventually come up with an opinion. I think your “composite hypothesis” criticism is very telling: a good way to stack the deck.

        As you say, this is a formidable task to analyze it. It seems to me virologists and all people who work with viruses have to weigh in. Even the author says you can’t reach a conclusion on one study, more is required.

        This is a working paper. I guess a blind review is now impossible.

        Good luck with this!

    • Chris- You seem to be suspicious about my political alignment. FEC donation records are accessible. So are records of the international SDI boycott, an article I co-wrote for Bull Atom Sci, and of Barbara Boxer’s nominees for a Nobel Peace Prize. Or records of my draft conviction and a subsequent post-prison NYT op-ed. Or a series of blogs on Daily Kos.

      • I really don’t have any suspicions about your political alignment, Michael – in any case what does “political alignment” have to do with science? It’s more a suspicion about the dreary manufacture of “sides” and then labelling scientists who publish properly evidenced research as “left” as if the science of Covid origins is one that must be politicised.

        Actually, whenever I see the labelling of individuals as “left” or “leftist” I’m reminded of what Robert Hughes wrote way back in 1993 in his book “Culture of Complaint”:

        “In the last fifteen years American conservatives have had a complete, almost unopposed success in labelling as left-wing ordinary agendas and desires, that in a saner polity, would be seen as ideologically neutral”.

        That was over 30 years ago and here we are still :)

        Anyway – I’ve prolly posted too much on this thread and will give everyone a break now

        • Chris- We’re agreed on this. I absolutely can’t stand the politicization of this scientific issue. I’m just trying to face the fact that it has become intensely politicized, although not with 100% alignment. That fact stares you in the face once you get involved in this scientific question. I don’t like denying facts either about viral origins or about social behavior, although the social ones are trickier to understand.

          E.g. how did defending research that Obama banned and Trump restarted get to be left-coded? But thats a topic for fuzzy speculations another day.

    • Actually there are tabulations of the fraction of RR’s that are coded CGGCGG. In Asian betacoronaviruses it’s less than 1/10,000. So that would be an extremely strong piece of evidence if that CGGCGG were in a special LL region (it is, FCS) and there were no independendent reasons to think that FCS was special under Z. There are, however, independent reasons to think that the FCS is not a normal part of the sequence even under Z: the lack of AA changes from relatives in nearby S2 and the failure of the initial strains to have the D614G mutation that’s heavily favored once the FCS is there.

      So instead of (100000/a few) this LL-favoring BF becomes much more modest. A very recent one-step insert of a 12 nt FCS sequence is obviously right down the LL alley but (conditional on selection) not so crazy for Z either. I end up with about a factor of 7 for this feature by looking at CGGCGG sequences in potential sources of transcript-switching inserts.

    • Chris: My speculation regarding the reason, is that a certain class of pundits have taken to promoting the “lab leak” issue as a way to establish their centrist, moderate credentials. It’s specifically much less about the scientific merits of the idea, than the political circus as a social process. It’s these pundits using it to “signal” that they’re not doctrinaire liberals, but instead are heterodox independent-minded thinkers. That they’re not in the right-wing fever-swamp, but also are untainted by the disease of reflexive close-mindedness which is presumed to afflict said doctrinaire liberals. It’s a very “safe” topic to do this, as compared to, say being “open-minded” about (forgive me for mentioning this) allegations that the Democrats stole the 2020 Presidential election and all the associated junky Bayesian statistical arguments in favor of that idea. Though who knows, under Trump II, we might now see a revival of those arguments too!

      • Seth:

        I disagree, at least when it comes to the two people who sent me these particular articles. I don’t know either of these people personally; it’s just my judgment from my correspondence with them that they are driven by a mixture of a sense of importance of the topic (wanting to minimize the chances of new outbreaks of new diseases) and annoyance at how the covid origins issue has been discussed. I’m not saying there are no political motives; I just see no evidence that the political motives are to position themselves in some way. These two people are not pundits.

        If you’re talking about a stance taken by a politician, or a news media figure, or some sort of public intellectual like Niall Ferguson, that sort of motivation could make some sense, but I don’t see it here. Also, it took a lot of work for them to write those papers; to satisfy the goal of positioning themselves as centrists, it would be enough for them to just take a position regarding covid origins and maybe write a page on the topic.

        An example of position-taking without any work would be law professor and public intellectual (I’d say “would-be public intellectual” but it seems that he has a lot of influence!) Adrian Vermeule, who promoted 2020 election denial conspiracy theories without going to the trouble of trying to evaluate them.

        • Andrew: I was responding to what Chris said about “For whatever reason that I don’t understand, this controversy has become a (typically for the US) factionalised circus …”, and giving my speculation as to the reason for the circus. It’s pundit-driven, in a way somewhat analogous (though not 100% identical, I’m just talking about media frenzy) to how the Satanic pedophile ring stories led to an armed guy invading a pizza parlor. I’m not at all asserting that every single person involved is such a pundit, most especially the very rare people independently writing long analysis. Indeed, that armed guy was apparently personally totally sincere in his investigation of the issue, at great personal cost to himself. And note, he did perform an empirical investigation (granted, not a good one, but he apparently tried to the best of his ability).

          For illumination of how I see “centrist” punditry driving the issue, I’m drawing a contrast with the 2020 election fraud claims. Even though there’s very fertile ground to write “heterodox” statistical investigations there, you just don’t see – or at least I just don’t see – a *widespread* similar type of “centrist” argument claiming that it’s imperative for the left to restore the trust Americans have in their electoral system, which then involves facing up to the “fraud leak” which might have taken place and caused the pandemic, err, stolen the result. Again, *widespread* – not that this never happens, but comparatively, it’s not a viewpoint that’s seen outside of fringes.

          And, sigh, of course every analysis – lab leak, election fraud, Satanic pedophile ring – needs to be evaluated on its merits. But they don’t take place in a vacuum either.

        • Seth:

          I disagree with your claim that there is “fertile ground to write ‘heterodox’ statistical investigations” supporting the claim of massive election fraud in 2020. Serious people have looked into such claims and there’s nothing there.

          Also, I don’t think that anyone would claim that “it’s imperative for the left to restore the trust Americans have in their electoral system.” Knowledgeable people on the left, right, and center recognize the possibility of major problems in the electoral system, and it would not be good to inappropriately “restore trust.” There’s no evidence of serious fraud in 2020, but that shouldn’t make us confidence that there will be no problems in the future. In addition, there are many problems in the electoral system other than fraud. Remember 2020, when they didn’t count all the votes in Florida? Or various cases of vote suppression? These aren’t fraud, exactly, but they are problems. And there could be major fraud in the future. Vigilance is important.

          Similarly with pandemic outbreaks. There’s debate about what happened with covid in 2019, but everyone’s concerned about future outbreaks, which everyone recognizes could come from various sources, including labs and wild animals.

        • Andrew: I think you’ve misread the intent of my comment. I meant it along the lines of, imagine an alternate world where the ground facts are the same, but for some social reason, many non-fever-swamp pundits want to prove their “independent thinker” bona-fides by criticizing the viewpoint that the 2020 election was accurate, by using the same argument types deployed for lab leak discourse – e.g., “close-minded” people who want to shut off “debate” are bad, we must always be intellectually open, that review which showed no evidence is just one paper and it’s flawed this way (“biased writer” is always a good one), what about this other piece of alleged evidence, on and on. In such an alternate world, I’d contend many people who are not pundits would be inspired to generate extensive analysis “suggesting” fraud, and that’s what I meant by “fertile ground”. Not that the claims are accurate, but that it’s relatively easy to work up a poor statistical “proof” (that’s of course a big topic of this blog!) from various election data. Remember, it doesn’t matter how many are debunked – there’s always another one which can be done. Which of course must be careful evaluated on its merits …

    • There have subsequently been several CGG-CGG insertions and even a CGG-CGG-CGG insertion (all in frame coding arginines) during the pandemic. Probably more — I haven’t looked for a year and a half at least. That’s only looking at one site with common insertions that have pretty blah impacts on viral fitness. If the SARS2 insert happened all at once, it would probably be subject to much stronger selection.

      This shouldn’t be surprising. CGG is prone to trinucleotide repeat expansion. That’s what causes Fragile X Syndrome. People who watch medical dramas on TV probably know about CAG repeats that cause Huntington’s disease; similar thing. CGG and CAG both encode for amino acids (arginine and glutamine) that are polar and often found in disordered loops — exactly the sorts of things you expect to find in RNAs encoding diverse olfactory receptors and so on. And if you go hunting for possible donors for a 12-nucleotide insert in host RNAs, that’s what you tend to find… too many candidates that work well to ever be certain one is right.

      The rarity of CGG in bat coronavirus genomes (and in human coronavirus genomes, unsurprisingly) is the product of centuries of mutation through cycles of viral replication. Comparing an insert that was acquired probably not many years prior to the pandemic to genomic compositions in equilibrium is something that wouldn’t fly in Intelligent Design circles. If it weren’t for the appeal of spooky mad scientist fantasies, folks would realize how tiny this evolutionary gap is and how you don’t need a God to fill it. The most plausible lab leak scenario has always just been an accident with a natural virus.

      David Baltimore considered the idea and rejected it quickly because it was a dumb idea. Same story for Kristian Andersen and a BamHI restriction endonuclease site in the SARS-CoV-2 genome. Some people still hang on to this stuff for 5 years not because they are good ideas, but because they sound good to people who haven’t thought about it at all.

  4. Can you share where you got the “lab leak is a low likelihood event” idea from? I hope somewhere more specific than “tv/movies”.

    That seems to be a key point of misinformation, so we should track it to the origin.

    • The misinformation is the “lab leak is a high probability event”. See what relevant experts in virology, etc think of the issue rather than economists and other non-biological ‘experts’ say.

      • Not at all, but there is nothing preventing you from informing yourself, as I did back in 2020, and have shared on this blog many times since (including only a few weeks ago).

        But I am not interested in convincing of anything here. Rather I would like to know where this “low likelihood” idea is coming from.

        So far there are two responses from apparent “believers” with no source.

        Is it really just people who think movies/tv are reality? Thats been my working assumption.

        • Anoneuoid –

          How many labs studied SARS-CoV-2 during the pandemic?

          How many outbreaks do you see associated with those (thousands?) of labs doing that work over 5 years?

          What was your predictive model for some of those labs go have a leak? How well did the unfolding reality match your prediction?

        • The misinformation is the “lab leak is a high probability event”. See what relevant experts in virology, etc think of the issue rather than economists and other non-biological ‘experts’ say.

          What expert told you this?

          For a comment below, I was able to easily find multiple experts and data saying it is common.

          Like some kind of leak happens monthly (or even more frequently). Coronaviruses in particular leak every year or two. This includes *even covid itself* which has leaked at least twice since we knew about it.

          Where did you get this idea it is rare? Was it tv/movies or did someone tell you this?

    • ‘Can you share where you got the “lab leak is a low likelihood event” idea from? I hope somewhere more specific than “tv/movies”.’

      It’s interesting that you ask the question in that way, because a lab leak is what I’d expect to see from tv/movies. How many B-movies have there been where the “monster” is a product of human folly, e.g. radiation, toxic waste, mad science experiments, etc.

      • Right. And since we don’t see monsters/zombies all the time its presented as some rare, exceptional event.

        And just look at the responses, its like asking 10 yr olds where they heard a curse word.

        Really, where did this idea come from?

      • I hate to do this but there are lots of genuine pre-2020 priors from expert source. So I’m just cutting and pasting from my blog here because so many refuse to read it. This assage describes general priors, another is more specific to WIV.

        ****
        Now let’s turn to lab leaks. The number of labs doing risky research has grown dramatically in recent decades. For example, Demaneuf and De Maistre show a growth of a factor of ten in the number of BSL-3 labs in China between 2000 and 2020. The book Pandora’s Gamble amply documents that pathogen lab leaks are common, including in the US. A more recent summary describes over 300 lab-acquired infections and 16 lab pathogen escapes over a two-decade period. These are almost always caught before the diseases spread. Nevertheless, in 2006, the World Health Organization warned that the most likely source of new outbreaks of SC1 would be a lab leak, confirming that the danger of lab leaks was large according to consensus expert opinion. In 2012, Klotz and Sylvester warned of lab leak pandemic dangers in a Bulletin of the Atomic Scientists article.

        There’s an important caveat, however. So far as we know, all of the past epidemics that came from labs (e.g. 1967 Marburg viral disease in Europe, 1979 anthrax in Sverdlovsk, 1977 influenza A/H1N1) were caused by natural pathogens. That’s not surprising, since until recently nobody was doing much pathogen modification in labs. The main modern method was only patented in 2006 by Ralph Baric, who was to have done the chimeric work on bat coronaviruses under the DEFUSE proposal. Without lab modification, only ZW and ZL would be viable hypotheses.

        We know, however, that lots of modifications are underway now in many labs. In the same year Anthony Fauci conceded the possibility that such research might cause an ”unlikely but conceivable turn of events …which leads to an outbreak and ultimately triggers a pandemic”. The dangers were perceived as substantial enough for the Obama administration to at least nominally ban funding research involving dangerous gain-of-function modifications of pathogens.

        When that ban was lifted under Trump in 2017, Marc Lipsitch and Carl Bergstrom raised alarms. Lipsitch wrote: “ [I] worry that human error could lead to the accidental release of a virus that has been enhanced in the lab so that it is more deadly or more contagious than it already is. There have already been accidents involving pathogens. For example, in 2014, dozens of workers at a U.S. Centers for Disease Control and Prevention lab were accidentally exposed to anthrax that was improperly handled.” Bergstrom tweeted a similar warning. Ironically, Peter Daszak, head of the EcoHealth Alliance, who became extremely dismissive of the lab leak possibility after Covid hit, gave a talk in 2017 warning of the “accidental &/or intentional release of laboratory-enhanced variants”.

        Similar warnings came from China. In 2018 a group of Wuhan scientists, mostly from WIV, wrote “The biosafety laboratory is a double-edged sword; it can be used for the benefit of humanity but can also lead to a ‘disaster.’ ” An extensive 2015 NIH-sponsored “Risk and Benefit Analysis of Gain of Function Research”concluded that lab modified coronaviruses could risk “increasing global consequences” by “several orders of magnitude.”

        Perhaps the most authoritative work came from the Global Preparedness Monitoring Board which issued a prescient report from the Johns Hopkins Center for Health Security. Although that report’s many authors include at least one who has emphatically ridiculed any thought that SC2 could have come from accidental release, on Sept. 10, 2019, just before the pandemic started or was known to start, the GPMD report warned

        Were a high-impact respiratory pathogen to emerge, either naturally or as the result of accidental or deliberate release, it would likely have significant public health, economic, social, and political consequences. Novel high-impact respiratory pathogens have a combination of qualities that contribute to their potential to initiate a pandemic. The combined possibilities of short incubation periods and asymptomatic spread can result in very small windows for interrupting transmission, making such an outbreak difficult to contain…’

        Biosafety needs to become a national-level political priority, particularly for countries that are funding research with the potential to result in accidents with pathogens that could initiate high-impact respiratory pandemics.

        It is hard to see how such warnings would make sense if expert opinion held that the recent probability of a dangerous lab leak of a novel virus was negligible. For at least the last decade the prior probability P0(LL) of escape of a modified pathogen has not been negligible.
        ****

        • Here are some more I found after about a 30 min search:

          From Jan. 1, 2015, through June 1, 2020, the University of North Carolina at Chapel Hill reported 28 lab incidents involving genetically engineered organisms to safety officials at the National Institutes of Health, according to documents UNC released to ProPublica under a public records request. The NIH oversees research involving genetically modified organisms.

          Six of the incidents involved various types of lab-created coronaviruses. Many were engineered to allow the study of the virus in mice. UNC declined to answer questions about the incidents and to disclose key details about them to the public, including the names of viruses involved, the nature of the modifications made to them and what risks were posed to the public, contrary to NIH guidelines.

          […]

          April 2020: A UNC scientist underwent 14 days of self-quarantine at home after a mouse bite caused potential exposure to a strain of SARS-CoV-2, the virus that causes COVID-19, that had been adapted for growth in mice.

          https://www.propublica.org/article/here-are-six-accidents-unc-researchers-had-with-lab-created-coronaviruses

          On Dec. 11, the CECC verified that an assistant researcher, case No. 16,816, had contracted COVID while working in a P3 (Biosafety Level-3) facility.

          https://www.taiwannews.com.tw/news/4382708

          “Nearly every SARS case since the original epidemic has been due to lab leaks — six incidents in three countries, including twice in a single month from a lab in Beijing,” writes Dr. Zeynep Tufekci for The New York Times.

          In 2007, foot-and-mouth disease, a virus able to “devastate livestock,” escaped via drainage pipe leak from a U.K. lab with the “highest biosafety rating,” reports Tufekci. And don’t think America hasn’t made its own mistakes. In 2012, the U.S. Centers for Disease Control and Prevention reported “11 laboratory-acquired infections across six years,” typically in BSL-3 labs, a biosafety rating one step down from the maximum BSL-4. “In each instance,” writes Tufekci, exposure was not realized until “lab workers became infected.”

          https://theweek.com/coronavirus/1001991/lab-leaks-have-happened-before-and-probably-more-often-than-youd-think

          DR. GOTTLIEB: That’s right. These kinds of lab leaks happen all the time, actually. Even here in the United States we’ve had mishaps. And in China, the last six known outbreaks of SARS-1 have been out of labs, including the last known outbreak, which was a pretty extensive outbreak that China initially wouldn’t disclose that it came out of lab. It was only for- it was only disclosed finally by some journalists who were able to trace that outbreak back to a laboratory.

          https://www.cbsnews.com/news/transcript-scott-gottlieb-face-the-nation-05-30-2021/

          At biological research facilities across the United States and around the world, hundreds of safety breaches happen every year at labs experimenting with dangerous pathogens.

          https://www.theguardian.com/commentisfree/2023/may/30/lab-leaks-shrouded-secrecy

          In the fall of 2019, workers at a veterinary research center in the northwestern Chinese city of Lanzhou began to fall ill with a disease that caused fever, muscle aches, and other symptoms. Workers at a nearby plant that made brucellosis vaccines had been using expired disinfectant to treat waste gas; the gas was contaminated with aerosolized Brucella bacteria and wafted on southeast winds to the research facility. Eventually over 10,000 people were infected with the disease, which can cause long-term illness. This was just one of 16 times a pathogen escaped from a laboratory setting between 2000 and 2021, according to a new study in The Lancet Microbe.

          An international team of researchers looked for all the cases of infections acquired in a laboratory or times a pathogen accidentally “escaped” from a laboratory setting. They found 309 laboratory-acquired or -associated infections from 51 pathogens; eight of these cases were fatal, including one of “mad cow” disease. The 16 incidents they found of a pathogen escaping a lab setting included well-publicized accidents such as the time where a West Nile researcher became infected with the first SARS virus in 2003 after handling contaminated samples in Singapore. He went on to expose 84 contacts and risked re-igniting the 2002-2004 SARS epidemic, by then quiet in Singapore.

          https://thebulletin.org/2023/12/a-new-study-reports-309-lab-acquired-infections-and-16-pathogen-lab-escapes-between-2000-and-2021/#post-heading

          Laboratory-acquired infections (LAIs) and accidental pathogen escape from laboratory settings (APELS) are major concerns for the community. A risk-based approach for pathogen research management within a standard biosafety management framework is recommended but is challenging due to reasons such as inconsistency in risk tolerance and perception. Here, we performed a scoping review using publicly available, peer-reviewed journal and media reports of LAIs and instances of APELS between 2000 and 2021. We identified LAIs in 309 individuals in 94 reports for 51 pathogens.

          https://www.thelancet.com/journals/lanmic/article/PIIS2666-5247(23)00319-1/fulltext

        • Michael –

          It is hard to see how such warnings would make sense if expert opinion held that the recent probability of a dangerous lab leak of a novel virus was negligible.

          Was their finding that the risk of “a” lab leak would be negligible, or that the probability of a lab leak as the origin of SC-2 was negligible?

          It seems that there were lab experimenting with SC-2 during that pandemic. Do you know of outbreaks associated with that experimentation?

          Above, there’s a link provided where Gotleib talks about outbreaks of SARS from lab leaks. How were those leaks clearly established? Was there something different about the evidence for those leaks that allowed for a definitive determination, in contrast to SC-2?

        • Joshua-
          “Was their finding that the risk of “a” lab leak would be negligible, or that the probability of a lab leak as the origin of SC-2 was negligible?”
          We’re discussing priors here. All the official sources said the priors were substantial. Some, like Lynn and Klotz and Lipsitch quantified that, Since these were genuine priors they can’t refer specifically to SC2.

          “It seems that there were lab experimenting with SC-2 during that pandemic. Do you know of outbreaks associated with that experimentation?”
          is that a typo for SC-1? There were about 10 separate SC-1 lab leaks that caused human illness and, IIRC, one death.

          “Above, there’s a link provided where Gotleib talks about outbreaks of SARS from lab leaks. How were those leaks clearly established?”
          At the point those outbreaks occurred there was no SARS-1 circulating. The small number of people affected included lab workers and families. I forget if sequencing was also used but I think it was in some cases.

          ” Was there something different about the evidence for those leaks that allowed for a definitive determination, in contrast to SC-2?”
          Yes, at least two things. SC-1 is much more virulent so it’s harder to miss the initial cases. The lower R0 and near absence of asymptomatic transmission allowed SC-1 to be controlled by non-pharmaceutical interventions. That left essentially no background to obscure the obvious signal of the lab leaks.

          That’s why WHO concluded that the remaining SC-1 danger was primarily from new lab leaks, not recurrent zoonosis.

        • Michael –

          , Since these were genuine priors they can’t refer specifically to SC2….

          So then I’m confused by this statement of yours:

          It is hard to see how such warnings would make sense if expert opinion held that the recent probability of a dangerous lab leak of a novel virus was negligible.

          Which I took to be referencing the origins of SC-2, specifically. Seems there’s no incongruity in saying that the probability of SC-2 originating fell a lab leak could be negligible even if probabilities of any viruses leaking anywhere aren’t negligible – contingent on the technical specifics of SC-2.

          That wasn’t a typo. It is my impression that many labs were experimenting with, or researching SC-2 during the pandemic. Am I wrong about that? If I’m not, given that there were many, has there been evidence of (localized) outbreaks of SC-2 associated with those labs. Obviously, since there would be a background rate of spread it might be hard to identify localized outbreaks associated with a lab leak but I would think there might be some way to do so.

          The lower R0 and near absence of asymptomatic transmission allowed SC-1 to be controlled by non-pharmaceutical interventions.

          Makes sense. So then wouldn’t the higher RO and asymptomatic spread make it more likely that if there were a lab leak with SC-2, you would see a large collection of unusual illness from infections associated with the lab, and workers at the lab, and hospitals near the lab, and communities surrounding the lab, very early in in the pandemic?

          Anoneuoid –

          That’s quite an extensive list of lab leaks! Now I’m certainly no statistician but it seems to me that just talking about the numerator without specifying the denominator is of limited value. IOW, it has limited value in informing the probabilities of a specific leak. If the numerator is big but the denominator is enormous, wouldn’t that LOWER the probabilities that this one, specific virus being worked on in a lab, would start a pandemic?

          At any rate, in your 30 minutes of searching what did you find out about the denominator?

      • Joshua- I cite specific papers from before 2020 giving estimates of yearly probabilities of human transmissible viruses for labs under a range of conditions.

        The main relevant feature of SC2 is that it had a very high R0 right from the start. Given that WIV planned the work at BSL-2, I think the leak probability approaches 1. The more questionable probability is that their planned chimera would get such a high R0. But similar or smaller odds apply to human transmissible natural viruses.

        SC2 spread rapidly through ordinary social paths, e.g. the Biogen meeting in Boston. Your argument that later after other labs started working on it we’d see leaks from them standing out and being documented is certainly creative!

        The only obvious way to detect a later leak above background would be by special sequence features. I’ve seen claims that omegas might be from some further lab work but see no particularly good reason to believe them since by then there were millions of immunocompromised people generating interesting variants. There was also therapeutic molnupiravir use generating weird variants.

        • Michael –

          Your argument that later after other labs started working on it we’d see leaks from them standing out and being documented is certainly creative!

          Why thank you. :-)

          Although, I didn’t make such an argument, actually. I asked for information as to that question. I don’t make arguments about SC-2 origins much, because I lack the knowledge or the brains to do so. But I do ask questions.

          The only obvious way to detect a later leak above background would be by special sequence features.

          Wouldn’t a super-spreader pattern be an indication of a leak? If probabilities of a leak from labs working on SC-2 are high, and there were many such labs doing that around the world for 5 years, wouldn’t we expect to see super-spreader events associate with labs – widespread outbreaks among lab workers, their families, medical facilities nearby, etc?

          Certainly there were many suprer-spreader events identified (you mentioned on in Boston), so there was some ability to identify such events against a background rate. And more specifically, my question was about such a widespread association EARLY ON, with the lab at WIV – presumably before there was a widespread background rate of spread. Again, given the high R0 and the severity of infections (especially early on), wouldn’t odds lean towards such an outbreak to be discovered clearly associated with WIV have been likely to occur? I doubt that there was sequencing done to demarcate all those super-spreader events. As I understand it, a major plank in many of the lab-mediated spillover arguments was that the animal market was, in fact, just such a super-spreader event. So how would a leak starting an outbreak be differentiated from a super-spreader event?

  5. One thing that makes me wary of a lot of purportedly Bayesian arguments is that I’ve seen them used in apologetics and anti-apologetics. Take John Earman’s “Hume’s Abject Failure” which is supposed to counter Hume’s “On Miracles,” or Richard Carrier’s use of Bayes’ Theorem to try to argue against that Jesus of Nazareth never existed at all, not even as a mundane human around whom legends were built. It’s easy to fool oneself into thinking one has a solid argument based on math when one really just has a “mathy” argument that depends on shaky premises.

  6. Many of the points raised above are discussed in my blog with empirical supporting evidence. These include:

    How often do viruses leak from labs? Lots of data. E.g. WHO officially considered lab leaks the main danger for reawakening SARS-1 based on how often it was leaking.

    How often do nasty pandemics start in nature? Obviously that’s something we count, rather than saying “gazillions of tries and tiny chance on each try” and then ending up with uselessly huge error bars.

    On glycosilation: Not my area and I forget if previous WIV publications screwed around with that. The WIV’s DEFUSE proposal definitely included explicit plans to fiddle with the glycosilation near the receptor binding motif.

    On the post hoc detailed pattern: That can be and has been used by both sides to argue against the other. Of course the proper way is to balance the details required for both sides. For the most part, details then drop out of the likelihood comparison. One also has to be careful to condition on observation, i.e. zoonotically rare features that increase transmissibility are not so rare in observed major diseases. Some LL types fail to do that.

    I see two broad ways in which such probabilistic estimates go wrong, in addition to more specific problems. One is that people love to do specific precise calculations of exact models getting pseudo-precise factors by ignoring the fuzzier uncertainty around the choice of model. The other is to hide behind that fuzzy uncertainty to pretend that nothing can be estimated, and therefore one should just stick with ones priors.

    I’ve seen the former type of error used by both sides. The latter one tends to be used more by the zoonosis types.

    Part of why my big blog is so hard to read is that I try to include the both the empirical data and the fuzziness for each factor in a crude hierarchical way. The fuzziness (which brings the LL odds down from extreme values) is needed to be realistic. It strikes people who are used to doing school homework problems as “unscientific”. Tell it to Fermi.

    • On glycosilation: Not my area and I forget if previous WIV publications screwed around with that. The WIV’s DEFUSE proposal definitely included explicit plans to fiddle with the glycosilation near the receptor binding motif.

      Thanks Michael, do you have a link for that “DEFUSE proposal” and if it’s a grant application, can u indicate where in the document it discusses glycosylation plans?

    • > The fuzziness…is needed to be realistic. It strikes people who are used to doing school homework problems as “unscientific”. Tell it to Fermi.

      Fermi generated predictions for empirical data within a factor of 10. Your analysis, as far as I can tell, generates nothing.

      • Anon- Among the later-confirmed actual predictions that early LL types (not me) made were:
        1. That there were plans in Wuhan to work on coronaviruses similar to SC2
        2. Those plans would include adding an FCS
        3. Those plans would include making 6-segment chimeras with segments easily available commercially.
        None of these were predicted by Z.

        • Even assuming those are correct, none of those depend on the specific odds ratio, so why care about the numbers at all? It’s just pseudo-precision. Unless you have numerical predictions.
          This is a general problem with pseudo-Bayesian analysis.

        • By “work on coronaviruses similar to SC2” you are referring to things with relatively low similarity to SARS-CoV-1. This is a huge number of viruses that includes many things no more similar to SC2 than SARS-CoV-1 is.

          On the other hand, labs in multiple cities elsewhere in China were actually working with viruses similar to SARS-CoV-2!

          Sequencing one found in a pangolin: https://www.mdpi.com/1999-4915/11/11/979 — this remains one of the most closely related viruses to SARS-CoV-2 and was the favorite lab leak theory until BANAL-20-52 was sequenced.

          Growing another in suckling rat brains: https://pmc.ncbi.nlm.nih.gov/articles/PMC6135831/ — this is the basis for another popular lab leak theory from Li-Meng Yan et al.

          The most popular lab leak theories these days have something to do with vaccine research in Beijing.

          It would’ve been even more of an eerie coincidence had SARS-CoV-2 in one of the cities hosting that kind of research rather than Wuhan.

          “Adding an FCS” that doesn’t appear in nature isn’t something that’s in DEFUSE. It’s not particularly well written so it’s easy for conspiracy theorists to misread it like that, I guess. It talks about looking for cleavage sites and other features in sampled viruses. It talks about looking for things one mismatch away from a functional FCS. It doesn’t talk about inserting 12 nucleotides no one’s ever seen before into a virus no one’s ever seen before. If this is a reasonable experimental plan, you’ll have no problem finding a precedent for it somewhere else in the entirety of scientific literature.

    • “It strikes people who are used to doing school homework problems as “unscientific” ”

      Two points:
      1 – Dude, really not cool.
      2 – Is that how you see the readers of this blog?

  7. I don’t get it. If this analysis, and the analysis Andrew blogged about the other week, are that great, the authors should be tripping over themselves to submit to a high impact factor journal asap and have it subjected to peer review. Just like the rest of us chumps do. Or perhaps I have missed something??

    • Anon:

      Setting aside judgments about the quality of these particular papers, it seems very reasonable for researchers to post their analyses on blogs and preprint servers in order to get feedback before submitting to journals. Other times, we submit an article to a journal and also post it online. Feedback is good, also the journal reviewing process takes awhile so it can be good to get your results out there. I have a bunch of unpublished papers here. I’d be happy to for them to get the benefit of peer review and be published in high-impact journals; in the meantime, I post them, chump that I am!

  8. I left a brief critique of the paper on Twitter:
    https://x.com/tgof137/status/1886597906897232256

    In short, it’s possible to come to exactly the opposite conclusion while looking at the same data — you can come up with a bayes factor of more than 15,000 in the opposite direction.

    Which of those 2 bayesian analysis is correct? The answer depends entirely on the quality of the data analysis, and has nothing to do with that final step where you just multiply a few numbers together to get the answer.

    • While it is much more fashionable for the lab leak side of the covid origins debate to make their case via multiplying fractions, there are several people who have made an effort to analyze the covid origins question via Bayesian analysis and have come to the conclusion that zoonosis is more likely. Here are links to some of them:

      Eric Stansifer:
      https://ermsta.com/r/covid_decision_20240217.pdf

      Will Van Treuren:
      https://tinyurl.com/yrvzp8mk
      https://tinyurl.com/4xbutw2k

      Daniel Filan:
      https://docs.google.com/document/d/1qzLC55jRfdS55oSqXJZTFItsvFsawWgNlgLxWqhCuyo/edit?tab=t.0#heading=h.p8ipjsffq6wq

      Scott Alexander:
      https://www.astralcodexten.com/p/practically-a-book-review-rootclaim

      David Johnston:
      https://docs.google.com/spreadsheets/d/1IWZzk9m1LrmMdXw0nBVx4GS6piGZ8EsI6BKVqO2Ltuw/edit?gid=0#gid=0

      The most common opinion of experts is that covid has a natural origin:
      https://gcrinstitute.org/covid-origin/

      I’m not sure that any scientist supporting a natural origin of covid has published a full bayesian analysis of the topic, however — they tend to focus more on high quality analyses of the data, rather than guessing at fractions and multiplying them together.

      You can still find some threads that illustrate how scientists think. Here’s one from Trevor Bedford, in early 2020, that sketches out some broad reasons why scientists thought covid was likely natural:
      https://x.com/trvrb/status/1230634351794089984

      Bedford pointed to 5 factors about the virus, of which 4 pointed towards a natural origin and one (the location in Wuhan) pointed weakly towards a lab origin. That continues to be true today — the first 4 factors continue to point towards a natural origin, while the location of the outbreak in Wuhan lends weak evidence towards the possibility of a lab leak.

      You can add more factors to that calculation, but I don’t think that anything that’s been discovered since 2020 would flip the conclusions. The evidence for zoonosis at Huanan market only got stronger, and there have been no strong arguments made showing that the virus was engineered.

      Generally speaking, Bayesian analyses that argue for a lab leak do so not by adding good arguments in favor of engineering, but by omitting evidence, such as ignoring all the data indicating a market outbreak.

      • I address those closely linked analyses in Appendix 1 of my blog. They all share at least two errors.

        In each case they obtain Z-favoring odds by combining a huge likelihood factor (from Peter’s analysis) that is exclusively connected to the HSM market story with a likelihood factor for Wuhan that is only appropriate for the non-market versions. (See the point Andrew made about how you have to be careful with composite hypothesis!) Even using a generous estimate of the market Wuhan likelihood factor shifts all of these an order of magnitude toward LL.

        Even without that shift, each one of these would favor LL except for the enormous likelihood factors (>= 1000) derived from Peter’s analysis of the early cases cluster near the HSM. That analysis already had huge problems, as described in these J. Royal Stats Soc A publications:

        https://doi.org/10.1093/jrsssa/qnae021
        https://doi.org/10.1093/jrsssa/qnad139

        Now Levin’s work (despite it’s problematic factor of 12 from vendor stall analysis) shows that conventional epidemic modeling of the same home location case data that Peter used does not only not give Peter’s enormous Z factor, it gives a substantial LL factor! The key differences are that Levin did not discard the case timing data and did not invent a special new unverified analysis method for the occasion.

      • Peter- I’m glad you linked to the Feb 2020 Bedford tweets because I’d been looking for them unsuccessfully. They’re literally sketchy, as in using a hand-drawn schematic sketch of probability densities.
        The argument tho sketchy wasn’t so bad at the time. Every one of its points has since collapsed.

        1. Virus group: the one planned in DEFUSE and other WIV grants, even to the approximate % difference from SC1. As for the Proximal Origin paper he cites, we now know from FOIA’d messages that its authors didn’t believe it even after it was published.

        2. RBD: good for humans, doesn’t work so well in any of the intermediate hosts. AA sequence evolved slowly in humans compared to after any other species jump.
        Other hosts: none found and none even confirmed as plausible. No trace in 80,000 animal samples.

        3. Market cases: weak at best. See Levin. Also other cities (Beijing…) where the first cluster was at a wet market even tho that couldn’t have been the spillover site.

        4. Internal market data: whoops, Bloom found the wrong sign of correlation!

        5. Wuhan location: He concedes this favors LL.

        So the Bedford tweets beautifully illustrate how a reasonable person could initially lean toward zoonosis. They also clarify why a reasonable person would change their mind as compelling new data came in relevant to all the points as well as to some other factors not considered at the time.

        • I disagree on every point.

          1. Bedford says that WIV1 would be the most likely leak. A better way to phrase this would be that the average lab leak would have a known backbone. And, in the DEFUSE grant, that’s exactly what they propose — putting novel spikes into WIV1 and SHC014 backbones. Now you can imagine that they skipped that step and went straight to making novel full length clones, but you need to put a probability on it, because it’s more likely that a lab leak would use a known backbone. Your bayesian analysis neglects this factor.

          2. As of early 2020, the lab lesl theory said the RBD was engineered and the natural origin theory said it was natural. Then we found the RBD in pangolin viruses, and the LL theory said that SARS2 is a pangolin/bat virus chimera. Then we found the RBD in bats. So the early natural origin theory won.

          Of course there’s a new lab theory saying that part was found in nature but other parts of the virus are engineered. You can try to argue that the odds are now equal, but that’s still wrong because your analysis ignores the odds of finding that RBD in nature, given X secret viruses found.

          3. Levin’s analysis of Wuhan case data is terrible — p-hacked algorithms and multiple logic errors. See my thread linked above. Many people have tried to debunk the Worobey and Pekar papers but no one has yet succeeded.

          4. Internal market data supported zoonosis weakly in 2020 and supports it even more strongly today, now that we have DNA from covid susceptible animals comingled with covid positive samples. Bloom offers no reasonable argument why correlation would be required or expected. And we may have another RNA analysis published soon offering more evidence that certain animals were infected.

          5. The location in Wuhan is still one of the few arguments that offer any support at all for the lab leak theory.

          The problem here is not that scientists are ignoring Bayesian reasoning. It’s that you see all the evidence in a completely different way from scientists.

          I don’t know how to resolve that argument, but I don’t think more spreadsheets with fractions will help. No one will agree on which factors to count, or which fractions to give them.

          I am confident that science will converge on the correct answer, given enough time, even if that answer is lab leak. If the true answer is zoonosis, though, I have zero confidence that some people will stop making lab leak theories.

        • Michael –

          Peter says above:

          The problem here is not that scientists are ignoring Bayesian reasoning. It’s that you see all the evidence in a completely different way from scientists.

          This relates to most central aspect of my questions for you. I’ll extract the implausiblity (imo) of many scientists across many institutions in multiple countries colluding to hide the truth (and lie) about what almost certainly the most important event ever, involving the field where they’ve invested their careers studying and researching and experimenting. I’ll extract it because I can’t quite tell whether you’re saying they colluded and lied or not (unlike with people like Alina Chan, who crossed the conspiracy theory bridge at some point).

          But still I’m left with a question as to how you evaluate the probabilities that a plurality of the people with the most relevant experience and expertise would reject your analysis on technical terms (when presumably you lack a similar contextualized expertise and experience that would help inform perspective).

          That’s not a fallacious “appeal to authority.” I don’t think they’re necessarily right because of their experience and expertise. It’s possible that someone without the most relevant background could have insight they lack (group think and all of that). But I think there’s a probability aspect here – where someone proposes explanations for highly technical issues that most of those with the most relevant background say is “implausible.”

          So what I don’t understand is how someone could present a comprehensive probability assessment without even attempting to address that aspect of the probabilities array.

          (It only gets more questionable when people reference aspects like the authors of PO saying things they putatively don’t belive, based on extracting words like “implausible” – which can have a variety of meanings – assigning only one meaning to it (i.e., a meaning among to “impossible”) and ignoring what they said in other places where they explicitly acknowledged a role uncertainty).

          Again, this parallels so much of what I’ve seen in the climate change wars, where very smart people with related expertise present probabilistic technical arguments about why climate scientists are wrong about climate change. Since I’m neither smart nor expert enough to evaluate their technical arguments, I put into my own probability assessment, the question of whether so many climate scientists would be either dumb as rocks or colluding to lie when they say that ACO2 presents a risk we should address. Of course it’s possible, as many “skeptics” argue, that all those climate scientists are lying or stuck in “group think. But how probable is that?

        • peter- I’m sure this is long past getting tedious for all but the most OCD readers, but let me just answer two sentences.

          “Bloom offers no reasonable argument why correlation would be required or expected.” Except he shows that 4/5 actual animal coronaviruses do show positive correlations with their hosts. That means it’s moderately unlikely for SC2 to fail to do so if it came from a host. Just plain empirical odds, no fancy theory.

          “Many people have tried to debunk the Worobey and Pekar papers but no one has yet succeeded.” Stoyan & Chiu and I have published papers in a well-respected statistical journal debunking Worobey. These are so far not answered in any reviewed publication.
          Three major coding errors in Pekar were found by a pubpeer contributor, with their corrections reducing the Bayes factor from 60 to 4. Pekar et al finally acknowledged these errors and published a correction.
          What about the other Pker errors? Here the logic is so simple and obvious that I thinl any remaining readers can grasp it directly. To quote from my blog:

          ***
          The Bayesian analysis of Pekar et al. calculates conditional probabilities of different observations for the N=1 and N=2 hypotheses, with more specific results required for N=1. Specifically, only the N=1 hypothesis is required to give “the mutation separation and relative clade size”. This error was described by He and Dunn almost immediately after the Pekar paper came out, with a recent updated follow-up. I cannot emphasize enough how serious this error is. It is obviously essential that the same observations be used for each hypothesis. For example, if one were to update the odds for deciding which of two suspects committed a burglary, it would not be correct to use the ratio P(drives car|suspect 2)/P(drives blue Toyota|suspect 1). That would be a fundamental error in logic, precisely analogous to the Pekar et al. error.
          ***
          How big a numerical effect does correcting that obvious logic error have? The simulations on pubpeer show that even after tweaking parameters to favor the Pekar conclusion the odds go from from 4/1 to less than 1/5. In other words, the conclusion reverses if one just uses basic logic.
          This one is not a judgment call or a modeling issue. It’s basic logic applied to Pekar’s own model.

        • Michael –

          peter- I’m sure this is long past getting tedious for all but the most OCD readers, but let me just answer two sentences.

          Anyone who finds it tedious can just not read it. What’s the point of having a topic like this opened to public discussion only to have people worry about being tedious if they discuss it?

        • “Except he shows that 4/5 actual animal coronaviruses do show positive correlations with their hosts.”

          If this is something that’s important to you, you should spend more time looking at the figure this comes from. It’s Fig 3 in this paper for people following along — https://academic.oup.com/ve/article/10/1/vead089/7504441

          Worth thinking about how much you’re reading into one sample, where that sample comes from, how the virus (maybe viruses) on that machine differs from SARS-CoV-2, and how that makes it difficult to compare the data. It’s a virus with relatives sampled because they give dogs, raccoon dogs, and foxes catastrophic diarrhea, which is probably how that virus was transmitted. And, again, it’s one sample.

        • Michael – I find it amusing that you are trying to derive a bayes factor against zoonosis from Jesse Bloom’s paper which says things like:

          “the fact that SARS-CoV-2 does not show an association with a plausible host animal species does not disprove the possibility of SARS-CoV-2-infected animals. But if SARS-CoV-2 had shown similar patterns to some of the other animal coronaviruses in the 12-January-2020 samples, that would have provided moderate circumstantial evidence in favor of SARS-CoV-2-infected animals—but this is not the case”

          At best, this argument would give you a null result, a bayes factor of 1, not some factor leaning against a zoonotic origin.

          Bloom makes 2 major claims in his paper.
          The first is that the number of SARS-CoV-2 reads are too low, in the wildlife shops, to be evidence of infection.
          The second claim is that the number of SARS-CoV-2 reads in these samples is not correlated with the number of reads of animal DNA in these samples.

          But those 2 claims are in contradiction with each other! If your complaint is that the numbers are too small and random to analyze, then you can’t also complain that you can’t find clear patterns in them. The paper notes that Bloom’s desired correlations were not found for other viruses which had a low read count. For instance:

          “In contrast, the quantitative content of SARS-CoV-2 and rat CoV Lucheng-19 is low in all 12-January-2020 samples, and to the extent that there are associations with animal genetic material, they are weak and contingent on how the data are quantified (e.g. linear versus log scale)”

          That doesn’t even get into the other methodological flaws that Bloom has made in various analyses here, such as trying to combine all the January 1st and January 12th samples into a single correlation, even though the read counts tended to be much higher for the January 1st samples.

          Absent another market environment to calibrate this whole correlation analysis, it’s hard to know what we should even expect to find. Consider the over-all hypothesis Bloom states early in the paper:

          “For instance, if the amount of SARS-CoV-2 material was strongly associated with the amount of material from raccoon dogs, that could suggest that raccoon dogs were infected. On the other hand, if the amount of SARS-CoV-2 was more associated with material from animals that are not infectable (say largemouth bass), then that would simply suggest that SARS-CoV-2 was spread widely around the market environment by January 2020.”

          Notice that Bloom states that a positive correlation is required to support the raccoon dog hypothesis, but also states that a positive correlation with largemouth bass would simply be meaningless. When he ran his full market analysis, the top 5 correlations were with bass, cow, pig, catfish, and sheep. Humans did not crack the top 5 list. Should we conclude that fish, not humans, were the dominant host for covid at the market? Or does this just mean that his style of correlation analysis is not as useful of a tool as he thinks it is?

          If you do find this particular analysis enlightening, I suppose I’d be willing to accept your bayes factor of 1 on the value of the samples within the market. Hell, one of the 2 Rootclaim judges even gave the market samples a bayes factor of 1, and thought they did not lean strongly enough towards either hypothesis. The within market data is not quite as clear as the case data outside the market (which is an obvious bullseye right on the market).

          If you’re going to try to pretend that all evidence favoring a market origin can be disposed of, because of Bloom’s paper, I think that’s vastly overreaching and you need to find much better arguments.

          I would personally weight the within market data as leaning towards zoonosis, as we can also look beyond correlations and see other patterns within the market (i.e. the carts and cages infected for shop 6/29, the infected drains for shop 6/29, the patterns of case spread, the history of illegal citations of that shop, etc.). But I accept that Bayesian analysis is inherently subjective, and not every judge of the matter will get the same numbers for every single piece of data.

        • The whole correlation thing doesn’t survive a simple thought experiment. You can take a perfectly sterile room and put someone alone in it for two weeks with a course of a SARS-CoV-2 infection in the middle. Swab random surfaces in the room at random times throughout this period. You’re almost certain not to find any correlation between human and SARS2 reads with sufficient sampling, because human nucleic acid molecules are deposited a fairly constant rate throughout the two weeks, while the rate of depositing SARS-CoV-2 molecules varies by several orders of magnitude over the course of an infection.

          The expected correlation between human and SARS-CoV-2 reads is roughly zero. This isn’t just a thought experiment, either. Someone could test it with data from the challenge trial and model the expectations for infections in Huanan market.

  9. Perhaps I should be more literal and verbose, less tongue in cheek with my comments

    I think both of our points can be true. Note that I didn’t suggest he or anyone else shouldn’t blog about the issue. No problem with that. And I know all too well how peer review can be both slow and frustrating, and a far from perfect filter.

    A lot of people seem to care a fair bit about the issue under discussion. At most levels, I honestly don’t give a hoot one way or the other – we can’t change the past, and I’m too jaded to be surprised by anything in the world anymore. But on another level, it’s obvious that gathering and dissecting information of the origin of any massively traumatic event probably would be useful information for the future.

    So I think it’s fair to wonder whether and when a publication might emerge, given the gravity of the event and the ability to lessen the likelihood of a similar outbreak happening again. Not many of us have the time to read through a hundred page pdf or the like, unless we are deeply invested in the topic. But the rest of us, the wider public and policy makers included, might well be interested in a distilled paper-sized version. Isn’t that how we should go about big important sciencey stuff like this?

    And thanks for joining me in the chump club, Andrew. It’s an honour

    • At most levels, I honestly don’t give a hoot one way or the other – we can’t change the past, and I’m too jaded to be surprised by anything in the world anymore.

      These pathogens are still being collected into these labs near major population centers (and some also modified in various ways, but it isn’t really relevant).

      Even if you don’t care about infectious disease per se, it is clear that can be used as a reason for authoritarians to shut down businesses, tell people where to go, what to wear, and so on. This is not something you care about?

      • “Even if you don’t care about infectious disease per se, it is clear that can be used as a reason for authoritarians to shut down businesses, tell people where to go, what to wear, and so on. This is not something you care about”

        Again (but this time to a different responder), I said no such thing. If you care to read the next sentence I wrote:

        “But on another level, it’s obvious that gathering and dissecting information of the origin of any massively traumatic event probably would be useful information for the future”

        …you would see the crux of my point. If this is so important, which it seems like it clearly is, then shouldn’t this be distilled, submitted for publication, peer reviewed, and so forth? That’s how it works in our industry, no? The IPCC or IPBES reports, for instance, are not based on blogs.

        What’s wrong with wondering if and when this might happen?

        • There was already plenty of published evidence *before* covid that these labs are leaking all over the place.

          The only real difference after covid is people (multiple in these blog comments alone) are spreading misinformation that such leaks are rare. Before that people mostly just didn’t discuss it.

          Move the labs to islands/deserts and make it like a military deployment, ie the staff quarantines for a month before coming home.

          The researchers studying this aren’t going to stop doing dangerous stuff. It isn’t in their nature.

          Look at what it took to stop them from using their mouths to pipette (suck up liquid into a tube, to transfer from place to place) dangerous pathogens:
          https://apps.dtic.mil/sti/tr/pdf/AD0640356.pdf

  10. This is a working paper, right? Not yet been peer reviewed. Shouldn’t people who work with viruses be the ones to ultimately decide this issue, if it can be decided? Are we really going to decide this on the basis of mathematics? To his credit, the author says more research is needed, we shouldn’t decide on the basis of one paper. With so much politicizing, it is vitally important.

  11. I am sympathetic to Bayesian thinking on this question because I think it helps you think through questions like “given there was a leak from the lab, what are the odds that the virus would spread at a public space soon after” or even “given one lab worker accidentally caught covid, what are the odds they would have been sick enough to seek medical attention before spreading it to another person”

    I can’t find the paper I’m thinking of, but remember a paper coming out in 2021 or 2022 that claimed to “prove” the wet market hypothesis because they confirmed all the people who are first confirmed to have Covid all got it after going to the market or coming into contact with someone who went to the market. It spent pages and pages on this, but didn’t once address if this was where the virus actually spilled over. You could similarly do a study showing that all the first people in Massachusetts known to catch Covid all went to the Biogen conference, but that has nothing to do with origins. Just where it was first spread enough that it could be detected.

    • I think the paper you’re curious about is this one: https://www.science.org/doi/10.1126/science.abp8715, though it doesn’t “claimed to “prove” the wet market hypothesis”, instead arguing “the Huanan market was the epicenter of the COVID-19 pandemic.”

      More subtly, I’m not sure your claim “all the first people in Massachusetts known to catch Covid all went to the Biogen conference, but that has nothing to do with origins. Just where it was first spread enough that it could be detected.” Of course such data wouldn’t prove the Biogen conference was where the virus originated, however it would accurately identify the Biogen conference as the early *Massachusetts* epicenter.

      To the extent that the epicenter is unrelated to the origin of the virus, I suppose it’s conceivably possible that two different lab workers went to the market with slightly different genetic variations of Covid, infected individuals at the market in such a pattern that it mirrors zoonotic transfer, but didn’t cause outbreaks near where they lived, children went to school, or the population outside the WIV.

    • Your recollection is wrong. The paper is here — https://www.science.org/doi/10.1126/science.abe3261 — the earliest case in Boston was confirmed 1/Feb/2020. The earliest USA sequence with the mutation associated with the Biogen conference is a sample collected in early March; the earliest sequence with that mutation was from a sample from France in late February.

      A relevant thing from that paper is dating the most recent common ancestor of the Biogen lineage to mid-February. The same can be done for early-pandemic sequences in Wuhan. Michael Weissman is all over this comments section declaring what is and isn’t “bullshit” and his website currently cites favorably “estimate the spillover time via the MRCA giving a value ‘between August and early October 2019’.” In that sentence, “August” comes from the number 2019.58 in Fig 3 of this paper — https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0301195 — I am not going to declare whether it’s “bullshit” or not, but it’s 100% wrong that SARS-CoV-2 and the bat virus BANAL-20-52 share a common ancestor in August 2019, which is what that figure shows!

  12. Most of the core reasoning used for the HSM market case involved spatial statistics, evolutionary modeling, etc. It was not mostly virological.

    The experts in spatial statistics (Stoyan and Chiu) say that part is bullshit. (Levin uses standard modeling to confirm and quantify that.)

    The Pekar paper on “phylogenetics” claimed that evolutionary patterns showed a natural origin. it was based entirely on complex Bayesian mathematical modeling, since found to be filled with both math and coding errors. The legendary leader in computational biology of evolution, Nick Patterson, has blogged that it’s obviously a lab leak.

    Virology star Jesse Bloom has not weighed in on the overall evaluation but has shown that key nucleic acid data that have been described in press releases as proving a market origin actually point away from a market origin. (One of Levin’s factors quantifies that fairly small effect.)

    Math is just a tool used on all sides of the question. It makes assumptions, arguments, and errors more explicit and transparent.

    • This comment by Scott Alexander is appropriate in this context.
      “I don’t like getting in fights, and boy was this a fight.

      And I don’t like making sweeping generalizations about The Nature Of Pseudoscience – it’s too likely to be incredibly embarrassing if it later turns out I was one in the wrong.

      But I do feel like there’s a method going on here. It’s nothing sinister, just that the lab leak people have 100x more zealots and energy, and there are some strategies that make sense in that position, which no single individual necessarily chooses, but which are very noticeable from the other side.

      The most glaring is the constant focus on “as of one minute ago, the case for the opposite side is in SHAMBLES”. The “as of one minute ago” makes it hard to trust institutional consensus or published papers – what if they just haven’t caught up to the new evidence? The “in SHAMBLES” is always a couple of papers that have “now debunked” the best papers of the other side. These come out on a regular schedule. They’re usually by people in unrelated fields – the ones I saw on COVID origins were by computer scientists, physicists, and agricultural scientists. They’re usually either preprints, or published in weird journals in unrelated fields. But they sure do look like Scientific Papers and have lots of equations in them, and they always end with “…and therefore this one peripheral argument in So-And-So Et Al 2020 is wrong.”

      Once you collect, I don’t know, ten of these, you can spam a bunch of opposing discourse with “This didn’t even consider these ten new papers, all converging upon the fact that this case has now been debunked”. The very prestigious researchers who wrote the original paper probably won’t respond, because they don’t have time to respond to pre-prints by agricultural scientists. So it does kind of look, to an outsider, like all of the top papers of the side with more institutional support are debunked. Even if you spend hours and hours talking to the scientists involved and trying to figure out the flaws, it doesn’t matter, because there will be a new set of papers like that a few weeks later.”

      https://www.astralcodexten.com/p/highlights-from-the-comments-on-the-5d7
      —-

      Aside from that, Worobey and Debarre show that Stoyan/Chiu is an exercise in ‘how tight can I make my kernel bandwidth so that even the location of the London cholera epidemic of the past won’t be successfully identified’

      • Re: “This didn’t even consider these ten new papers, all converging upon the fact that this case has now been debunked”

        This is what this lab-leak johnny-come-lately argued on X — https://x.com/sigridbratlie/status/1876642607717077108 — citing 3 papers while whining to Angela Rasmussen, “This is simply not true! I am pointing to papers in recognized scientific journals, all of which you are surely aware of. Below are a few examples that contradict your conclusions. Is your main strategy to pretend that legitimate contradictory science doesn’t exist?”

        Angela Rasmussen is an author of at least one paper that specifically cited and discussed all three of the cited examples of papers she was allegedly ignoring.

        Re: Stoyan/Chiu… they test the nonsense hypothesis that the region where you estimate the centroid of coordinate of cases to be necessarily includes the location of the outbreak (they also apparently made some mistakes with coordinates). This is obviously nonsense… if ascertainment is 100% the region will shrink and almost certainly not include the actual outbreak site… if you knew the location of all cases circa the end of February, the centroid would be somewhere in the middle of Wuhan, Italy, and Iran perhaps.

        I sent them a note pointing out as much and they ignored it, instead pedantically ridiculing Worobey et al for daring to use 1000 bootstrap samples rather than 999.

    • Micheal –

      The legendary leader in computational biology of evolution, Nick Patterson, has blogged that it’s obviously a lab leak.

      Virology star Jesse Bloom

      You have said you don’t want to appeal to authority, but you’ve essentially done it before and now you’re doing it again.

      As a matter of probabilities, their qualifications matter, but so do those who have the most directly relevant expertise and experience, and you kind of just wave all of thst away with a nod towards conspiracy ideation.

      So you want to count the “star” quality of some “experts” but broadly dismiss the expertise of probably a plurality of scientists who have the most relevant domain knowledge.

      In one sense that’s fine. It’s all part of the process. But I don’t see how you can do that, and lay claim to a probabilistic assessment, without even attempting to quantify the probabilities that only the experts you agree with are trustworthy (stars even!) while the majority of scientists with relevant expertise are just remarkably wrong (and unwilling to admit it, and I guess lying about what they actually believe).

  13. There’s a lot I could say about this analysis. The most significant thing with respect to the numbers that are multiplied together to come up with the 14,900 ratio is the spatiotemporal analysis. This part consists of two likelihood ratios (27.1, 11.95) multiplied together to favor lab leak 274-to-1.

    First, the analysis of the distribution of unlinked cases within Wuhan discounts the fact that these cases were near the largest wildlife market in town (Huanan market). Furthermore, the fact that the outbreak was recognized in the first place because of cases linked to this market is not considered [1]. If you take a population-weighted distribution of random potential locations for an outbreak to start, the centrality of cases around Huanan market is roughly a 1 in 100 coincidence. If you consider the odds of a random person in Wuhan visiting Huanan market specifically, rather than a location in its neighborhood, on a random day, it’s roughly a 1 in 1000 coincidence [2]. Instead of a likelihood ratio on the order of 0.01-0.001, we get 27.1 due to an ad hoc model in which a lab leak means more randomly distributed cases on one side of a river.

    The fact that a few cases unlinked to the market reside across the river, but hardly any linked cases do, isn’t particularly consistent with a lab leak across the river. But it’s perfectly consistent with the fact that people rarely commute across the river to visit this market. This isn’t a point of speculation; there’s plenty of data on how people moved about in Wuhan in papers modeling disease transmission and the effects of mitigation and eventual lockdown.

    Where is the model comparing whether unlinked cases are better described by proximity to Huanan market than by proximity to the Wuhan Institute of Virology?

    Second, the analysis of the distribution of cases within this market effectively discounts the fact that these cases were on the same half of the same level of the market where wildlife sales were concentrated. What are the odds of that happening by chance for a random introduction? Roughly 1 in 4. What are the odds of that happening for a wildlife introduction? Pretty close to 1 in 1. Instead of a Bayes factor of 0.25, we get a Bayes factor of 11.95 because one reasonable model (early cases predict locations of subsequent cases) is compared to an unreasonable model (everyone is infected by raccoon dogs).

    Lastly, another factor left out of this analysis is that all of the genomic diversity of SARS-CoV-2 known to exist in in Dec 2019 was found in Huanan market (in the same corner of the market where wildlife susceptible to SARS-CoV-2 was sold). This was certainly also true in some Wuhan hospitals at the time samples were taken (1/Jan/2020), but that might be the only other place in the world. There’s no evidence that this was the case anywhere else.

    [1] To preempt someone complaining about ascertainment bias: there are also other datasets beyond case maps in the WHO report that demonstrate that this part of Wuhan was the epicenter of the outbreak.

    [2] Based on Wuhan’s population and the WHO report’s figure of 10,000 visitors per day.

Leave a Reply

Your email address will not be published. Required fields are marked *