“Positive Claims get Publicity, Refutations do Not: Evidence from the 2020 Flu”

Part 1

Andrew Lilley, Gianluca Rinaldi, and Matthew Lilley write:

You might be familiar with a recent paper by Correira, Luck, and Verner who argued that cities that enacted non-pharmaceutical interventions earlier / for longer during the Spanish Flu of 1918 had higher subsequent economic growth. The paper has had extensive media coverage – e.g. it has been covered by the NYT, Washington Post, The Economist, The Boston Globe, Bloomberg, and of course NPR.

We were surprised by their results and tried to replicate them. To investigate the question further, we collected additional data to extend their sample from 1899 to 1927. Unfortunately, from extending their sample backwards, it seems that their headline results are driven by pre-existing differential growth patterns across US cities. We found that NPI measures in 1918 explain growth between 1899 and 1914 just as well as 1914 to 1919 growth. Further, we also found that the 1914-1919 growth result is mostly driven by estimated city level population growth that happened from 1910-1917. The approaches that we have taken to deal with these spurious results leave us finding only a noisy zero; we cannot rule out either substantial positive or negative economic effects from NPIs.

Here you can find our comment as well as the data we used and the replication code.

First off, I don’t find this depressing. We all make mistakes. If Correira et al. made a mistake on a newsworthy topic and it got lots of press coverage, then, sure, that’s too bad, but such things happen. It’s a sign of cheer, not depression, that a paper appearing on 26 Mar 2020 gets shot down on 2 May. A solid review in 5 weeks—that’s pretty good!

Part 2

Now it’s time for me to look at the articles. First the Correira et al. paper:

It seems funny that they use 1914 as a baseline for an intervention in 1918. Wouldn’t it make more sense to use the change from 1917 to 1921, or something like that? I guess that they had to make use of the data that were available—but I’m already concerned, given that more than half of any changes are happening before the treatment even occurred.

Beyond that, the pattern in the graph seems driven by a combination of relatively low mortality and high growth in western cities. I could imagine lots of reasons for both of these factors, having little or nothing to do with social distancing etc.

And check this out:

Just pretend they did the right thing and had the y-axis go down to 0. Then you’ll notice two things: First, yeah, the flu in 1918 really was a big deal—almost 4 times the death rate compared to early years. Second, it was only 4 times the death rate. I mean, yeah, that’s horrible, but only a factor of 4, not a factor of 10. I guess what I’m saying is, I hadn’t realized how much of a scourge flu/pneumonia was even in non-“pandemic” years. Interesting.

OK, now to the big thing. They have an observational study, comparing cities with more or less social distancing policies (“school closure, public gathering bans, and isolation and quarantine”). They make use of an existing database from 2007 with information on flu-control policies in 43 cities. And here are the pre-treatment variables they adjust for:

State agriculture and manufacturing employment shares, state and city population, and the urban population share are from the 1910 census. . . . State 1910 income per capita . . . State-level WWI servicement mortality . . . annual state-level population and density estimates from the Census Bureau. City-level public spending, including health spending . . . City and state-level exports. . . . the importance of agriculture in each city’s state. . . . we also present tests controlling for health spending and total public spending per capita in a city. . . . 1910 [state] agriculture employment share, the [state] 1910 urban population share, and the 1910 income per-capita at the state level. . . . log of 1910 population and the 1914 manufacturing employment to population ratio. However, unlike in our analysis on the effect of the 1918 Flu on the real economy, here we control for past city-level mortality as of 1917

They discuss and dismiss the possibility of endogeneity:

For instance, local officials may be more inclined to intervene if the local exposure to the flu is higher . . . An alternative concern is that interventions reflect the quality of local institutions, including the local health care system. Places with better institutions may have a lower cost of intervening, as well as higher growth prospects. There are, however, important details that suggest that the variation across cities is unrelated to economic fundamentals and is instead largely explained by city location. First, local responses were not driven by a federal response . . . Second . . . the second wave of the 1918 flue pandemic spread from east to west . . . the distance to the east coast explains a large part of the variation in NPIs across cities . . .

That’s the pattern that I noticed in the top graph above. I’m still concerned that they cities in the west coast were just growing faster already. Remember, their outcome is growth from 1914 to 1919, so more than half the growth had already happened before the pandemic came.

They also look at changes in city-level bank assets from October 1918 to October 1919. But I’m not so thrilled with this measure. First, I’m not quite sure what to make of it. Second, using a baseline of October 1918 seems a bit too late, as the pandemic had already started by then. So I’ll set that analysis aside, which I think is fine. The abstract only mentions manufacturing output, so let’s just stick with that one. But then I really am concerned about differences between east and west.

They report:

Reassuringly for our purposes, other than differences in the longitude and the variation in the local industry structure, there are no observable differences across cities with different NPIs.

Anyway, here are their main city-level findings. I think we’re supposed to focus on the estimates for 1919, but I guess you can look at 1921 and 1923 also:

The confidence intervals for manufacturing output exclude zero and the intervals for manufacturing employment in 1919 include zero; I guess that’s why they included manufacturing output but not manufacturing employment in the abstract. Fair enough: they reported all their findings in the paper but focused on the statistically significant results in the abstract. But then I’m confused why in their pretty graph (see above) they showed manufacturing employment instead of manufacturing output. Shouldn’t the graph match what’s in the abstract and the news reports?

In any case, I remain concerned that the cities in the west had more time to prepare for the pandemic, so they implemented social distancing etc., and they were growing anyway. This would not say that the authors’ substantive hypothesis—social distancing is good for the economy—is wrong; it just says that the data are hopelessly confounded so they don’t really answer the question as implied in the paper.

They do some robustness checks:

These controls are included to account for other time-varying shocks that may be correlated with local NPI policies. . . . city-level manufacturing employment and output growth from 1909 to 1914 to control for potential city-level pre-trends. . . . census region fixed effects . . . longitude. . . . state WWI servicemen mortality per capita . . . city population density . . . health and total public spending per capita in 1917 . . . city-level estimates of total exports and exports per capita in 1915.

They don’t adjust for all of these at once. That would be difficult, of course. I’m just thinking that maybe linear regression isn’t the right way to do this. Linear regression is great, but maybe before throwing these cities in the regression, you need to do some subsetting, maybe separate analyses for slower and faster-growing cities? Or some matching. Maybe Lowell and Fall River weren’t going to increase their manufacturing base anyway?

At this point you might say that I’m just being an asshole. These researchers wrote this nice paper on an important problem, then I had a question about geography but they already addressed in their robustness check, so what’s my complaint? Sure, not all the numbers in the robustness table have p-values less than 0.05, but I’m the last person to be a statistical significance nazi.

All I can say is . . . sorry, I’m still concerned! Again, I’m not saying the authors’ economic story is wrong—I have no idea—I just question the idea that this analysis is anything more than correlational. To put it another way—I could also easily imagine the authors’ economic story being correct but the statistical analysis going in the opposite direction. The tide of the correlation is so much larger than the swells of the treatment effect.

To put it yet another way, they’re not looking for a needle in a haystack, they’re looking for a needle in a pile of needles.

Again, this is not meant as a slam on the authors of the article in question. They’re using standard methods, they’ve obviously put in a lot of work (except for those bizarre stretchy maps in Figure 3; I don’t know how that happened!), the paper is clearly written, etc. etc. Nothing improper here; I just don’t think this method will solve their problem.

Interlude

And that’s where the story would usually end. A controversial paper is published, gets splashed all over NPR but people have questions. I read the paper and I’m unsatisfied too. Half the world concludes that if I’m not convinced by the paper, that it’s no good; the other half believes in baseball, apple pie, Chevrolet, and econometrics and concludes that I’m just a crabapple; and the authors of the original paper don’t know what to think so they wait for their referee reports to decide how to buttress their conclusions (or maybe to reassess their claims, but, over the years, I’ve seen a lot more buttressing than I’ve seen reassessing).

Part 3

But then there was that email from Lilley et al., who in their paper write:

Using data from 43 US cities, Correia, Luck, and Verner [2020] find that the 1918 Flu pandemic had strong negative effects on economic growth, but that Non Pharmaceutical Interventions (NPIs) mitigated these adverse economic effects. . . . We collect additional data which shows that those results are driven by population growth between 1910 to 1917, before the pandemic. We also extend their difference in differences analysis to earlier periods, and find that once we account for pre-existing differential trends, the estimated effect of NPIs on economic growth are a noisy zero; we can neither rule out substantial positive nor negative effects of NPIs on employment growth.

“A noisy zero” . . . I kinda like that. But I don’t like the “zero” part. Better, I think, to just say that any effects are hard to ascertain from these data.

Anyway, here’s a key graph:

We see those western cities with high growth. But check out San Francisco. Yes, it was in the west. But it had already filled out by 1900 and was not growing fast like Oakland, Seattle, etc. So maybe no need to use lack of social distancing policies to explain San Francisco’s relatively slow post-1917 growth.

Lilley et al. run some regressions of their own and conclude, “locations which implemented NPIs more aggressively grew faster than those which did not both before the policy implementation, and afterward.” That makes sense to me, given the structure of the problem.

I have not examined Lilley et al.’s analysis in detail, and it’s possible they missed something important. It’s hard for me to see how the original strong conclusions of Correia et al. could be salvaged, but who knows. As always, I’m open to evidence and argument.

There’s the potential for bias in my conclusions, if for no other reason than that the authors of the first article didn’t contact me; the authors of the second author did. It’s natural for me to take the side of people who email me things. I don’t always, but a bias in that direction would be expected. On the other hand, there could be selection bias in the other direction: I expect that people who send me things are going to expect some scrutiny.

The larger issue is that there seem to be more outlets for positive claims than negative claims. “We found X” can get you publication in a top journal and major media coverage—in this case, even before publication. “We didn’t find X” . . . that’s a tougher sell. Sure, every once in awhile there’s some attention for a non-replication of some well-known finding, but (a) such events are not so common, and (b) they still require the prior existence of a celebrated positive finding to react to. You could say that this is fine—positive claims should be more newsworthy than negative claims—and that’s fair enough, as long as we come in with the expectation that those positive claims will often be wrong, or, at least, not supported by the data.

40 thoughts on ““Positive Claims get Publicity, Refutations do Not: Evidence from the 2020 Flu”

  1. When I initially heard the claim on NPR, the story cited growth during 1919. That could also be explained by merely that at the end of 1918 the cities with the tougher restrictions had suffered a more severe contraction of economic activity that rebounded in 1919. That would not be the type of result that makes the economic effects of the restrictions desirable.

    Unless one has data that starts from the beginning of the pandemic, I can’t see how one has a hope of getting useful insights.

  2. Andrew,

    > The larger issue is that there seem to be more outlets for positive claims than negative claims. […] You could say that this is fine—positive claims should be more newsworthy than negative claims—and that’s fair enough, as long as we come in with the expectation that those positive claims will often be wrong, or, at least, not supported by the data.

    Well said. Thank you!

    As a current PhD student, I see troubling consequences of positive claims being celebrated over negative ones. It encourages fledgling researchers to “drink their own Kool-Aid” and always push for positive results regardless of the consequence — that is what will advance their career, after all, since graduation, post-docs, awards, etc., depend on the number of and repute of published results.

    I’ve seen this for one of my own colleagues — they are disheartened to the point of tears because a major hypothesis of their dissertation proved null. It therefore won’t see flashy publishing, local news, etc. It wouldn’t surprise me if many ignore it as a “one off” mistake or something similar.

    Yet, the null hypothesis is very important. And the research is quality and rigorous. Without going into full details (to protect privacy), it clearly shows that a major assumption in our field is not accurate and needs further evaluating/improving.

    In my opinion, the “positivity” or “negativity” of a result should not matter. Null hypothesis can be just as important as confirmed ones. Sometimes more important. Acting otherwise promotes the wrong attitude toward science, IMO.

    • +1

      Although, it’s probably more accurate to say that the null hypothesis is very UNimportant, being implausible most of the time, which leads to the same conclusion: the size and direction of an estimate do not determine its importance and ought not determine its publishability.

    • Twain said,
      “As a current PhD student, I see troubling consequences of positive claims being celebrated over negative ones. It encourages fledgling researchers to “drink their own Kool-Aid” and always push for positive results regardless of the consequence — that is what will advance their career, after all, since graduation, post-docs, awards, etc., depend on the number of and repute of published results.

      I’ve seen this for one of my own colleagues — they are disheartened to the point of tears because a major hypothesis of their dissertation proved null. It therefore won’t see flashy publishing, local news, etc. It wouldn’t surprise me if many ignore it as a “one off” mistake or something similar.”

      I’ve seen this phenomenon in undergraduates doing a class research project, as well as in Ph.D. students analyzing their research data. The concept that the purpose of research is *not* to “prove the hypothesis” (or as we say in mathematics, “conjecture”), but that it is to figure out, at least to some extent, what is really going on, is hard for many people to accept — I guess it’s to some extent just accepting that reality isn’t necessarily what we want it to be.

  3. The cluster of cities with high pre-1917 growth and almost all strict NPIs does probably scotch any attempt to quantify the benefit. Still, I think you can probably say that if you ignore all the cities with >0.3 population growth, there’s not much evidence that NPIs have a large *negative* effect on economic growth.

      • Yes! At the very least, 1918 studies need to be taken with a lot of caution in looking to see what if anything they can tell us about current conditions. (Things can change a lot in more than a century!)

        • What this can do is make us reconsider our intuition.
          We see short-term bad economic effects and intuit that long-term economic effects are also going to be bad.
          Having an anecdotal example that this needn’t be the case is useful in helping us question this intuitive assumption.
          The conundrum here is that many people have this false intuition, and thus an action to readjust this needs to get widespread attention, and the only way to achieve that is to produce a sensational finding.

          The outcome of that finding, even if overstated, is a net good:
          — nobody will judge the present economic situation based on 1919 data anyway
          — but now, nobody can say “it’s going to be bad” uncontradicted, without having done analysis
          — this intuition readjustment gives people hope
          — the hope is justified, because even a “noisy zero” justifies it

        • I’m not quite sure this is what you should be taking from the latest paper. One, it’s finding on economic growth is noisy, which does not actually justify hope. Two, the differences in the economies of 1918 and 2020 are huge. As noted by the second paper, population growth is an important predictor of economic growth and the population is not growing like it was in 1918 (https://www.census.gov/prod/99pubs/99statab/sec31.pdf, this provides some population growth figures for the largest 75 cities, but google does just as good of a job displaying over population). The economy was much more heavily focused on agriculture and rural life, with about 1/2 the population not living in cities (~80% now live in an urban environment). And it seems that advocates of lockdowns are proposing longer lockdowns than what occurred in 1918. And weighted by population, I imagine that the lockdowns that are occurring now are affecting much more of the population than in 1918. None of that inspires hope.

          There is also the small matter of what people think will happen after restrictions are loosened. This is something I’ve tried to find out about the Spanish flu, but have not gotten a good answer. All the studies looking at NPI used in 1918 look only at the second wave of the flu and don’t consider the mortality from the 3rd wave (it appears that the second paper only controls for 1917 mortality (I can’t open the Correira paper) and doesn’t consider mortality from the 1918 flu, this seems wrong). What we really want to know is if NPI is effective at preventing a resurgence in the long run. It doesn’t seem that NPI is a strong predictor of when the the first wave ended or the third wave began. Is it the case that there was no fourth wave because the US had reached heard immunity after the third wave? If the solution is herd immunity (either through spread or through vaccination), then that should inform our policy. At this point, suppression seems to be off the table.

        • OTOH 2020 populations are much more able to work from home than 1918 populations. Population growth in 1918 might have been a predictor of economic growth but picking at the low growth cities shows little negative economic impact of NPIs. We also have improving availability of testing which would give us a better handle on any resurgence, at least in theory.

        • Regarding working at home: it’s not clear to me that more people have the ability to work at home relative to the past, and for those people who can’t work home, it’s not clear whether the 1918 work environment is less or more conducive to the spread of disease. How many people relative to today lived above/near where they worked (shopkeepers, blacksmiths, drugists, doctors, lawyers, etc.). It’s possible that the share today is larger, but I just don’t think it’s that obvious.

          Regarding population growth: even lower population growth cities still had remarkable population growth relative to what cities have today. So we don’t have the offsetting growth today that occurred in 1918.

          And again, even if we have testing, it tells us how many people have but not whether the disease will just run through the population until herd immunity is reached.

  4. Correira et al deserve our praise: they released their paper in draft form, which allowed it to go through open-source peer review. And clearly, Correira et al released enough detail about their analysis and data that it could be replicated and extended. Issuing a press release at the same time is questionable, as it preempts serious vetting, but it also apparently draws criticism to the work faster. Also, they have a good argument for contributing to an active and urgent debate over quarantine measures. At the very least, it’s better than just having your institution issue a press release claiming important discoveries in a paper currently under peer review at such-and-such journal.

    Not that this post is slamming the authors–I just want to make sure that the open-source theme running through this blog doesn’t get obscured by the equally-important media criticism theme. Speaking in generalities, you say, “A controversial paper is published…” and “I’ve seen a lot more buttressing than I’ve seen reassessing.” In general, that’s true, but here, the paper wasn’t published, a draft was released for scrutiny. And while most authors have reacted defensively or dismissively to criticism, these authors deserve the benefit of the doubt. They’ll certainly have to deal with it now that this second paper is out, which is how the process should work.

    • Michael:

      I agree. As I wrote above: “It’s a sign of cheer, not depression, that a paper appearing on 26 Mar 2020 gets shot down on 2 May. A solid review in 5 weeks—that’s pretty good!”

      I hope that the next step is Correira at el. writing a revision, maybe in collaboration with Lilley et al., explaining why the data are not sufficient to evaluate their hypothesis, also that the New York Times and NPR run correction stories, again not “blaming” Correira at el., just recognizing that research is difficult and that in social science there is sometimes been a problem with people being too quick to believe that causal identification + statistical significance = discovery.

      • I completely agree. To be clear, my comment about it being depressing wasn’t a comment on the work at all, just an inside joke in reference to your earlier blogs about NPR coverage of statistical findings.

      • I too agree that this is how things are supposed to work.And it is good that people rush out papers on coronavirus, and not be too scared of being embarassed by fatal flaws.
        What is also important, though, is to quickly acknowledge fatal flaws. Otherwise, people spend too much of their time trying to figure out whether it’s really a flaw or not. A quick acknowledgement is also in the authors’ interest. For example, if they’d responded with an “Oops— The grad stduents are right; please forget our paper ever existed” within 24 hours, you wouldn’t have people like Andrew Gelman explaining the flaws in their article in great detail to a large audience of smart academics.

    • I agree that it’s appropriate to praise the release of the paper in draft form, but isn’t a little strong to simply praise the authors here?

      The issues identified by Lilley, Rinaldi and Lilley, and expanded on a bit by Andrew, are all pretty obvious. If this were a doctoral thesis, I’d expect a modestly competent advisor to say “are you really comfortable with conclusions being drawn from 1914 data? and aren’t you concerned that inherent growth patterns could be influencing the results you’re attributing to suppression?”.

      • Joseph said, “If this were a doctoral thesis, I’d expect a modestly competent advisor to say “are you really comfortable with conclusions being drawn from 1914 data? and aren’t you concerned that inherent growth patterns could be influencing the results you’re attributing to suppression?”.”

        If this were a doctoral thesis, I’d expect a modestly competent advisor (or at the very least someone on the student’s committee!) to ask these questions at the stage of the dissertation proposal.

      • I made my comment because I wanted to underline that Andrew’s scolding in the post wasn’t over Lilley et al’s mistakes, it was over bad science reporting and the kind of bad behavior he “usually” encounters from authors. Someone skimming the post might interpret the Interlude section, by itself, as Andrew lumping the mistaken in with the bad, so I wanted to emphasize that the authors did everything they were supposed to do, at least up to this point.

        Of course they weren’t supposed to make obvious mistakes, but they’re engaging in a process to catch those mistakes prior to publication, and that’s really all we can ask. It’s not very constructive to ask people to only make non-obvious mistakes.

        • You mean Correira et al, right?

          How is it not constructive to ask people to take steps to avoid obvious mistakes?

          Any researcher with even a passing acquaintance with the reality of the popular press would know how these claims would be received and presented to the public. You seem to be venerating form over function here, that the fig leaf of a pre-print covers for the apparent failure to engage in evenn cursory peer review of their approach before publicly sharing their findings.

          If you don’t want to make a value judgment on the quality of their work, that’s fine, I’m happy to do it. But to ignore its poor quality and then judge the authors praiseworthy that they didn’t compound things by also failing to document sufficiently to allow for the review by Lilley et al, well I think that takes it a bit too far.

  5. Andrew –

    FWIW,

    Methinks you left out “is wrong” as in…

    > This would not say that the authors’ substantive hypothesis—social distancing is good for the economy [IS WRONG] —it just says that the data are hopelessly confounded so they don’t really answer the question as implied in the paper.

  6. To me this looks like more correlation/causation done without a look under the hood. We should look at the actual specific industrial activity before trying to tie it to any specific political action. The cities that occupy the good zone on the graph are clustered on the Pacific. Seattle looks really good on the graph, but the Seattle of that time was very different than the one we know. This was a Seattle without Microsoft, Starbucks, and Boeing didn’t start till 1916. The Pike Street Market was in its infancy started in 1907. The population was 350,000. I believe logging, fishing, and the port were the economy. The same goes for Portland, Spokane, and Oakland. I think social distancing and crowd dilution are quite proper today, but I doubt that sanitation measures drove the demand for lumber that fueled the Pacific Northwest industries immediately after WW I.

    • oncodoc said, “I believe logging, fishing, and the port were the economy [for Seattle]. The same goes for Portland, Spokane, and Oakland. I think social distancing and crowd dilution are quite proper today, but I doubt that sanitation measures drove the demand for lumber that fueled the Pacific Northwest industries immediately after WW I.”

      Looking for cities I know something about: Grand Rapids (almost hidden in the cloud somewhere around (.17, .30) was heavily involved then in logging, the furniture industry, and trade across Lake Michigan (from the port of Saugatuck) with Chicago — which sounds quite similar to the pacific coast cities oncodoc points to.

      • But also, around 1917, automobile companies were recruiting people from Grand Rapids to move to Detroit to work in the car factories. My guess is that Detroit was less susceptible to disease spread than Grand Rapids: In Grand Rapids, the river flooded regularly, whereas in Detroit, a worker at the Chrysler factory could afford to buy a new shotgun house, with large back yard, on high ground.

  7. Andrew –

    > The larger issue is that there seem to be more outlets for positive claims than negative claims. “We found X” can get you publication in a top journal and major media coverage—in this case, even before publication. “We didn’t find X” . . . that’s a tougher sell.

    I wonder about this. Not with respect to publication rates in journals (and then, as a direct function of that, news reports on articles that have been published), but with respect to the direct signal in reporting. At some level, “Previous report on economics of pandemics debunked” is effectively a “positive” claim for Fox or MSNBC, depending on the claim that’s been debunked.

    In fact, we I would guess that someone like you is more likely to put up a post debunking claims than confirming claims, and often debunkers tend to have a fairly high media profile.

    I think we all have a bias in that regard – we tend to have a selective attention distribution mechanism and we remember the positive or negative findings as being more prominent because of our predisposition. I wonder if there have been actual attempts to quantify the proportion of positive vs. negative findings that get picked up in the media, and if it leans positive, is that merely a function of the foundational bias in journal publications towards positive findings.

    Either way, I’m reasonably certain that a signal of “positive” bias would be swamped by the signal of partisan bias, where various outlets use the political implications of findings as selection criteria more so than the positive vs. negative aspect of the findings.

  8. Maybe this is a quibble, but this kind of result is only ‘positive’ because it reinforces the desired current methods, and thus is ‘virtue signaling’. The problem with virtue signaling is that it tends to reflect a distorted perception: skepticism is demoted as a combination of unconscious and conscious behaviors warp methods to generate results that buttress the desired result. This blog has dealt with that effect more times than I can count.

    It also affects how we read. What is the model behind the idea that cities with stricter methods did better afterwards? Is it that fewer people died? Or is it that somehow having stricter methods somehow signaled to people that, yes, this is where you want to grow your business, where you want to settle, etc.? Does that make any sense? Do people think that way: you know, Philadelphia had a big parade and then people died so I’m trusting LA because they didn’t allow a big parade? In 1918? When there wasn’t even radio? When there were barely roads between cities? When you had to take a train or a boat to get to LA from Philadelphia? Or am I supposed to believe that LA proved itself so much better at government that this fact made LA grow. It wasn’t stuff like the film industry moving there so suddenly people all over saw California – most for the first time – seeing that it was warm all year, etc.? Really? What is the mechanism by which they assume growth would be occurring differently because of what they see as major differences in approach to an epidemic in that era?

    Because a study like this is virtue signaling, it gets a free pass at the model level. Look: this study proves there is a benefit to what we’re doing, if only we stick it out, so we should keep locking people up! Isn’t that great!

    • Jonathan:

      I don’t think the paper is “virtue signaling”; it’s just econometrics. It’s econometricians doing the econometrician thing, which is to estimate the causal effect of an intervention or exposure. I call it a positive result not because of its policy implications or its virtue implications, but because they are claiming strong empirical evidence of an effect. Had they found the opposite result—that social distancing etc. had bad long-term economic effects—I would’ve called that a positive result as well.

      • Can we reverse the causality?
        From the plots, it looks like city goverments who had been able to provide the conditions for strong economic growth also thought it sensible to impose NPIs strongly and early.
        Hypothesis: good goverments do NPIs, bad goverments do or don’t.
        That also gets rid of the “backwards in time” causality that you’ve been highlighting.

    • To expand on what Andrew said, he’s contrasting positive claims with refutations, and a refutation doesn’t mean the claim is necessarily wrong, it means that the claim is unsupported.

      Positive claims get the press.

      I don’t really know why I’m commenting; Andrew covered it above and in the final paragraph of his post.

  9. When I was a personality and social psychologist not so long ago the better journals and their reviewers were explicit that they favored (1) positive results (2) striking results, and (3) multiple studies. Replications and failures to replicate were not appreciated. But authors could mention briefly replications and failures to replicate in articles that contained 1-3 above. Perhaps researchers in other fields could strive likewise. On one hand it might lack timeliness (it would depend on having some positive, striking, multiple studies- based results to report). Perhaps that could be addressed by having an outlet strictly for replications and failures to replicate. The downside there would be that publishing in it would lack career benefits so who would spend their energy/time/resources doing it? (grad students maybe), other than the usual people with a Mission!

    • If someone is on a mission to test and replicate previous work, more power to them.

      It’s not clear why social science and psychology researchers feel that results from a single study are evermore etched in stone once they appear in print. In physics, chemistry, biology and geology people test previous work many times over using different contexts and approaches. it’s standard operating procedure. No one likes to see their work undone. But everyone loves to be the person who uncovers and error or mistake.

      • They don’t necessarily feel that single study results are etched in stone. What they feel is that “journal pages are a scare resource.” In particular, their own journal.

  10. I have 2 questions/comments: first, glancing through the Lilley paper, I noticed that the mortality control used in the regressions was from 1917 and it doesn’t seem to control for 1918. This seems wrong as the benefit of NPI is to reduce mortality.

    Second, presumably the appropriate measure of effectiveness of NPI is the mortality of the 1918 flu relative to the average year. If NPI is effective, then those cities with earlier and longer NPI should have a lower relative mortality. It appears from a scatter of the days and speed of NPI that those cities that acted earlier also engage in longer NPI. When you graph the scatter of speed of NPI with relative mortality (1918 mortality / 1917 mortality, I’m just using the data provided by Lilley for illustration) there is only a slight negative relationship, whereas the days of NPI as a slightly positive relationship with relative mortality. A regression gives these coefficients: relative_mortality = (-0.05) * Days_of_NPI + (0.01) * Speed_of_NPI. Obviously, you want to do a more sophisticated analysis, but just from this, I’m wondering why we think 1918 provides evidence that NPI is effective at reducing overall mortality.

    • I think you’ll have to define what effective means.

      Any analysis will probably have to contend with heteroskedasticity, duration of NPI is probably more vulnerable to reverse causation (a city that is more severely hit will tend to want to maintain NPI for longer, and less effective NPI will also probably result in longer periods of NPI). This will be a pretty tricky question to answer with this data. Also I’m not sure whether the mortality ratio is a better measure than mortality difference.

      • Effective presumably would mean you have fewer deaths relative to what would have occurred. So if the Spanish flu would have raised overall mortality by a certain factor (with some variation), then implementing earlier and longer NPI would reduce that factor, and those cities that did that should have seen flu deaths increase by a smaller factor. I don’t know why the mortality difference would be preferred. When looking at changes in other things (e.g. GDP), ratios tell a more informative story. But running the same analyses, using mortality difference doesn’t make it better. Early action seems to be thought of as better, but those cities that moved quickly also had longer NPI. The 2007 JAMA paper (https://jamanetwork.com/journals/jama/fullarticle/208354) look at different types of NPI, and chopped up the data via an ANOVA table enough to find an association between types of NPI and excess mortality (measured as the difference not the ratio). But when you chop up 43 cities enough, I don’t really think it shows much. And I think the ratio of mortality (rather than difference) is more appropriate.

        • What I meant in terms of definition of effective is whether you require a specific effect size to count as effective. Fitting a GAMLSS model will give you a significant effect from early NPIs, but I don’t know if the effect is large enough to count as effective for you.

          That some cities had both early and long NPIs is something in the data (though cities that had *early and short* NPIs also existed and did especially well) but I don’t know how one would interpret that necessarily. Consider for example that a city that adopts lockdown early would be forced to keep its lockdown up for longer because a neighbouring city that closed late caused the infection to persist – its longer NPI isn’t really a sign of ineffectiveness in this case, but rather would be the fault of the other city.

          Basically I don’t think there’s really enough in the data to look at the effects of ending NPIs early.

Leave a Reply to Joseph Candelora Cancel reply

Your email address will not be published. Required fields are marked *