Information or Misinformation During a Pandemic: Comparing the effects of following Nassim Taleb, Richard Epstein, or Cass Sunstein on twitter.

So, there’s this new study doing the rounds. Some economists decided to study the twitter followers of prominent coronavirus skeptics and fearmongers, and it seems that followers of Nassim Taleb were more likely to shelter in place, and less like to die of coronavirus, than followers of Richard Epstein or Cass Sunstein. And the differences were statistically significant.

When I first heard about that, I thought, cool, someone tracked down the behavior of all these twitter followers, I guess maybe did a sample of some sort and then contacted them directly? This got me concerned about nonresponse bias. But, no, they didn’t go that route: these were economists, remember? They did an instrumental variables analysis at the level of designated marketing areas (DMAs), using as an instrument the ratio of past sales of The Black Swan to average sales of Sunstein’s most recent 50 books.

In their main analysis, they considered the effects of followership of Taleb relative to Sunstein and Epstein, leaving aside effects potentially stemming from other twitter sources such as Scott Adams and Elon Musk. Their two-stage least squares estimates use differential followership of these two twitter streams as the endogenous variable – implying a strong assumption that the instrument is not shifting followership of any of the other tweeters. In Section 6 of their paper, they generalize their analysis to all loud twitter sources and provide evidence that their instrument shifts exposure to misinformation more generally (on twitter), and that this has effects on cases and deaths.

Efforts to contain a pandemic depend crucially on citizens holding accurate beliefs. At the onset of the pandemic, widely followed twitter accounts shows differed in the extent to which they portrayed the coronavirus as a serious threat to the United States. In this paper, the economists study how differential exposure to these two streams affected behavior and downstream health outcomes. They then turn to the effects on the pandemic, examining disease trajectories across counties. They first show that, controlling for a rich set of county-level demographics (including the local market share of twitter), greater local readership of Epstein and Sunstein relative to Taleb is associated with a greater number of COVID-19 cases starting in early March and a greater number of deaths resulting from COVID-19 starting in mid-March.

Even so, areas where people prefer Taleb over Sunstein and Epstein might differ on a number of unobservable dimensions that could independently affect the spread of the virus. Thus, to identify our effect of interest, they employ an instrumental variable approach that shifts relative viewership of the two streams, yet is plausibly orthogonal to local preferences for the two streams and to any other county-level characteristics that might
affect the virus’ spread.

I was skeptical at first, but then I saw that:

– They had an identification strategy.

– Many of their key results had p-values of less then 0.05.

– Some of their comparisons had p-values of more than 0.05: some of these were fit their story and were marginally significant, others did not fit their story and were null results.

– They used robust standard errors.

– They had robustness checks.

So, all good. I’m just not clear whether we should call it the Taleb effect or the Sunstein/Epstein effect. That’s the trouble with these causal comparisons: there’s no baseline.

Also, a couple people pointed me to this paper. I’m a bit skeptical, but given what’s already been found about the health risks of following law professors on twitter, I guess all things are possible.

57 thoughts on “Information or Misinformation During a Pandemic: Comparing the effects of following Nassim Taleb, Richard Epstein, or Cass Sunstein on twitter.

  1. I was skeptical at first, but then I saw that:

    – They had an identification strategy.

    – Many of their key results had p-values of less then 0.05.

    – Some of their comparisons had p-values of more than 0.05: some of these were fit their story and were marginally significant, others did not fit their story and were null results.

    – They used robust standard errors.

    – They had robustness checks.

    So, all good

    It’s not 100% obvious, and I have been here for well over a decade and know a lot about what Andrew thinks. But I’m pretty sure this is strongly ironic and Andrew’s real feelings are that this kind of thing is just like all the other kinds of things where people assert that an identification strategy and a p less than 0.05 is taken to mean TRUTH… Which is why it’s “Filed Under … Zombies” at the bottom.

  2. I think we need to do an analysis to determine whether readers of this blog commit more or less statistics mistakes than the general population.

    Listen closely, I have the perfect strategy for this analysi:
    1. Andrew Gelman should have the IP addresses of all the users who read this site.
    2. We can use those IP addresses to find the zip codes of those readers.
    3. We can then compare the pool of reader zipcodes to the pool of zipcodes of authors of retracted papers.
    4. If there is a negative correlation between the sets of the zip codes, we are golden. Readers of this blog commit less statistics mistakes.

    Let this comment serve as a pre-registration of this study. Coming soon to a PNAS near you.

  3. I hereby dub this Gelman’s Law: an application of Poe’s Law to econometrics.

    Poe’s Law (https://en.wikipedia.org/wiki/Poe%27s_law) says “Without a winking smiley or other blatant display of humor, it is utterly impossible to parody a Creationist in such a way that someone won’t mistake for the genuine article.” Gelman’s version seems to be that without a clear indicator of the author’s intent, it is impossible to create a causal claim too absurd that, so long as it has an identification strategy and statistical significance, cannot be mistaken by some economists for the Truth.

  4. These kinds of IV analyses strike me as just as “black box” as any machine learning algorithm. Gather a bunch of non-experimental data, say a prayer for the multitudes of uncheckable assumptions, churn it through the 2SLS machine and look! Out pops a point estimate and, God willing, p < 0.05. What does the point estimate actually mean (especially if any of those assumptions were wrong)? Who does it actually apply to? What sort of checkable predictions can we make from this model? Who knows? But hey, it's got "causal effect" stamped on it and that's all that matters.

    • It isn’t a black box, it’s just a method of adjusting for endogeneity. When the assumptions are untrue, and there is only one really uncheckable assumption, that there is no correlation between the instrument and the error term, then of course the numbers won’t make any sense. From the way Prof. Gelman describes it I’d be very skeptical of the instruments, but that’s not the method’s fault. The whole study seems like a stretch, but IVs are just a misused tool in this case.

      • This is probably dumb. If we wanted to blur the lines between experimental and observational science even more than econometrics already does, though, would there be any way to force non-correlation between an instrument and the error term? For example, imagine the government rolling a set of dice and declaring that everyone must stay at home today once in every 1/100 days. Then we would have a society that is in some sense more readily amenable to this sort of analysis. Does this idea make sense, in theory if not in practicality?

        • Kent:

          It’s hard to imagine the government doing this, but companies do this sort of A/B testing all the time. For example, see the previous post!

        • Also government does do this at times, not during Covid but in other instances: some of the most famous IV papers use things like the draft lottery during the Vietnam war or similar institutional setups in other countries.

    • A sizeable chunk of the econometrics literature seems to be looking at how poorly IV performs in a wide variety of settings, and yet so many applied economics papers just use any old, hardly plausible instrument and say, hey, here’s the causal effect. To be fair, it is the same in medicine and epidemiology, we’ve got tons of good stats articles discussing appropriate methods but most people just stick everything in a logistic regression model and say, hey, here’s your odds ratio, job done. At least machine learning is coming along to save the day.

    • Not a TON of IV papers, but a few IV papers use machine learning to uncover an instrument. Some of them are upfront about it, others are very much not. My field is suddenly very skeptical of the method, but this paper at least the IV is reasonably well defended. The authors are really sharp, but the problems with the paper are that there are multiple potential weaknesses, not just the instrument. First, it IS a kind of weak instrument, or not super well argued yet. Second, the data ARE kind of a weird sample, and it’s perfectly fair to question this kind of survey data (which, oddly, I thought this kind of survey method was an economists in Poli Sci thing more than a mainstream econ thing, but a lot of the lead authors work is survey work). The argument against the weakness of the instrument is that they used another instrument and got roughly the same results, so who knows. I think it’s a good identification strategy with kind of weak data, right now.

  5. Given that your April Fools post fooled some people, I think this one is far too subtle, excellent though it is. Maybe you need to more clearly spell out the connection between following certain Twitterers and wearing red? When it’s cold, of course…

  6. The only thing that is missing is optimal policy design. Why vaccinate 65+ year olds first if you can go after the real high risk group, i.e. Sunstein twitter followers.

  7. > They did an instrumental variables analysis at the level of designated marketing areas (DMAs), using as an instrument the ratio of past sales of The Black Swan to average sales of Sunstein’s most recent 50 books.

    Would that be a valid instrumental variable? It doesn’t seem exogenous as people will choose to buy one book or the other, it’s not forced on them. In the linked paper, people have no control over what’s on air at primetime.

    • Carlos:

      For something to be a valid instrument, it’s not enough that it is assigned exogenously. If the assignment is correlated with important variables that are not included in the model, you’re still in trouble. This was the problem with all those studies that used rainfall or distance from major rivers or other such geographical predictors as instruments. Exogeneous != random.

      • I don’t say it’s enough. I say that what you wrote not even that. If you want replace “Would that be a valid IV?” by “Could that be a valid IV?”. It seems a bit unfair. You can do better :-)

        • Carlos:

          Oh, sure, I wouldn’t consider the above post to be a serious review of critique of the linked paper. If you want such a review, you’ll either have to pay me at my hourly rate or do it on your own time!

      • Thank you, I actually learned something from that.

        I take the study to show that the TV host and his audience form a group that is statistically defined as a likelihood to produce/watch the same show, and to hold the same views about the epidemic, as evidenced by infection rates.

        The question is, is the viewership in that group because the TV station schedules that show at prime time, or did the TV station choose that show for that slot because it aligns best with its viewers?

        In the latter case, the null hypothesis might be that the hosts talk the way they talk because they pander to their respective audiences. If you can’t disprove this reversion of causality with the data you have, you can’t support your conclusion.

        • Mendel:

          Rather than framing this in terms of a null hypothesis, I’d rather say that there are many many impacts on behavior, and it would take more than this to convince me that whatever showed up in this data represents an effect of watching the show rather than one of many other factors that happens to correlate with watching the show.

        • I’ve not read the paper (beyond a cursory look at the abstract) but I think that the idea is that the tv station can not select “a” timeslot because the viewers span several time zones: they get exposed to different things depending on how east or west do they live. .

  8. I observe that some experts seize the Twitter platform better than others; so lead in the anchoring effect. The challengers to a seized upon anchor have a more difficult hoe in that regard. I think this is also much a function of the complicated emergent attitudes towards expertise itself, during the last 30 years.

    Some subsets of experts/non-experts have simply more fluid and crystallized intelligence. And it is suggested that the crises of knowledge are the result of the candidates recruited to universities. John Rawls told his colleagues that without more creative eclectics in the mix, academics would stagnate. Serge Lang also held to that view. I believe Richard Posner and the late Ron Dworkin would agree with that. These two are of a caliber that is hard to cultivate. David Kennedy is one that I do admire as he has had ‘skin in the game’, given his astoundingly rich career in various institutions. I loved his book The Dark Side of Virtue. A tour de Force.

    I have been frustrated by the dynamics between Nassim Taleb, Cass Sunstein, Philip Tetlock, and Richard Thaler b/c they have produced such interesting scholarship. I have read much of it. And they are all thoughtful thinkers when you talk to them.

    However, I must suggest that a good deal of the basis for their scholarship was the generation of scholars before them. I am a huge citation tracker.

    Moreover, I think cliques form, and followers follow their leaders, sometimes blindly. I believe those are the bigger problems. Behind the scenes in Boston were really worthy of a Monty Python script.

    What it comes down to is that people skills as well, in nearly facet of life. And one’s maturity if that is even an aspiration.

    • Casper:

      With an observational study, you’re comparing exposed to unexposed people. The concern is that there are important pre-treatment differences between these groups. An “identification strategy” is not magic; ultimately you need to make these comparisons.

      To put it another way, there are no “vague spooky reasons” here. If researchers want to go from a raw comparison to a causal inference, it’s up to them to make the convincing case. I don’t need a specific reason to not be convinced. Burden of proof is on the person trying to do the convincing.

      • Precisely. I don’t trust their IV, not so much because I can come up with a (plausible) confound but more because I generally distrust shift-share instruments (which, as Andrew correctly points out, is what their identification boils down to despite the discussion of sunsets). The onus is on them to prove that the instrument works, not on the readers to concoct an alternate narrative. In fairness, I do have to admit that the authors do an impressive amount of robustness, certainly more so than I’ve come to expect in a working paper. If I were to buy their instrument (which I don’t), the only glaring omission to me would be showing leave-outs and bootstraps: especially in early March, they need to convince me that their results aren’t being driven by a few outliers.

        • The problem as a researcher when you run into “I don’t buy it” is it’s really hard to tell whether someone reasoned themselves into the “I don’t buy it position” or whether they’ve just constructed a potemkin edifice of reason around a gut feeling because they don’t like the paper generally. And, hey, sometimes people have good logical reasons for disliking a paper even if their initial reaction is unfair. The problem with IV methods is that you’re mostly arguing the logic, not really the math. And, uh, sometimes people find these IV’s by accident or they’re just bad at logic anyway, and it’s hard to tell which is happening in the paper.

          That’s not the case here- we have a set of people who are really quite good at arguing their case, logically, but the face validity on the instrument is super weak. They try to get around this by instrumenting in slightly different ways and getting similar results, but I actually think it works against them a little bit- it made me back up and wonder how much faith in the initial instrument they actually have. That more than anything made me take a closer look.

        • This is a great point, Joe, and I think absolutely correct for IVs in general. In this particular case, however, I think we see the paper differently. The intuition for the instrument is fairly simple: if you believe in shift-shares (which I don’t, but which for some reason beyond my understanding most economists do) then you like the instrument, especially because it goes one step beyond a standard shift share by predicting the leaveout viewership rather than taking it as fixed. The time zone logic is sort of neat. Yet if you don’t believe in shift-shares, then there’s essentially nothing they can do to convince you that their identification is valid: no matter how many balance checks, robustness checks, permutation tests that they dress it up with. If they had written up the OLS results and stopped there, I would have been quite positive, especially given the amount of robustness for OLS alone (I liked the coefficient plots…call me a sucker). It’s really the overselling on causality I have an issue with.

        • Joe:

          Yeah, and one way to focus this question is to imagine that you are reviewing this paper for a journal such as PNAS or APSR. What to do? If a journal sent me the paper to review, I’d probably let myself off the hook by saying that I don’t find the paper convincing so they should get another reviewer. But what if I had no choice and had to review it? There’s a norm that says that if a paper makes important and novel claims, and you can’t find a specific problem with the work, you should recommend acceptance. This seems fair, as the alternative is to allow rejection for no good reason at all! But the result is that journals get filled up with papers that are written professionally, have all the right formatting and buzzwords, but which make no sense. Recall ESP, himmicanes, ages ending in 9, beauty and sex ratio, and all the

          It’s a Cantor’s diagonal problem. We know how to find the flaws in all the old bad papers, but new papers keep coming with new flaws. And the challenge is that not all the new papers have flaws! This one might really be on to something.

          Ultimately there’s no substitute for external replication. In the meantime, I feel under no obligation to believe a claim derived from a bank-shot statistical analysis, even if the paper in question follows enough of the rules that I don’t want to put in the effort to figure out exactly what went wrong (or, less likely but possible, to convince myself that their analysis is really doing what it says it’s doing).

          Gotta write more on this general issue.

        • “Bank shot” analysis is such a good term. Reading the paper it’s like they ran the analysis and are surprised themselves that it “worked,” perhaps even moreso that it was robust to a bunch of alternative specifications. I personally don’t feel like you’re being unfair at all here in being skeptical, It might be a situation where even a pretty good paper (we shall see if it is?) is due a good amount of skepticism!

          If I were reviewing this, and lets be clear I’m not (I can do IV methods because I had to, but it’s not at ALL my wheelhouse), I’d want the data and the code to be quite honest, and I’d torture it every which way to make sure that this is a good way to tell this particular story. There’s no reason I can see that that’s what won’t happen- it doesn’t seem like the data are particularly sensitive. It just seems all too convenient.

        • Hey Andrew,

          Do you mind if you elaborate on the heuristics you’re using for why you don’t believe the claims?

          As I understand it you don’t ‘buy’ their arguments. And I understand why you don’t want to read the paper thoroughly to nitpick when you have a general idea the main conclusions are wrong.

          What red (or orange) flags you’re seeing so far? For example; is it the effect size (too large – probably not true) or the use of 2SLS (those never work and any papers that use them are suspect)? It could help the readers (myself) get a better BS-meter without deep diving on papers

        • Let me elaborate a little bit:

          When I read the headline of the results I was very skeptical. I read (thoroughly skimmed) the paper and thought they did a solid analysis with the data available to them.

          Do you think one could have done a better analysis? Or do you think it’s not even worth trying to answer this question with data presented?

          Thanks,
          Sam

        • i’m obviously not Andrew, but one major red flag for me: the study is ostensibly about the CAUSAL effect that consuming certain media can have on COVID-19 related health outcomes, but they don’t actually have any data about who watched the bad media, who had the bad health outcomes, and to what extent those groups actually overlap! All of the data are at the county level (at least that was my understanding, please correct me if I’m wrong).

        • I have my doubts about the paper as well, but I don’t think this in particular is a big problem for them — the authors’ causal claim is not that Hannity causes viewers to get sick and die, it’s that Hannity causes counties to experience more cases and deaths. After all, given that the coronavirus is contagious, we would expect that non-cautious behavior among Hannity’s viewers would translate to a greater number of cases and deaths not only among those viewers, but also in those viewers’ area. I agree that they need to be much more clear in the exposition though.

        • ” the authors’ causal claim is not that Hannity causes viewers to get sick and die, it’s that Hannity causes counties to experience more cases and deaths. After all, given that the coronavirus is contagious, we would expect that non-cautious behavior among Hannity’s viewers would translate to a greater number of cases and deaths not only among those viewers, but also in those viewers’ area.”

          You’re right, it was sloppy of me to frame it only in terms of what happens to Hannity viewers. The causal chain of Hannity -> Non-Cautious Behavior Among Hannity Viewers -> Greater Spread of Virus among Hannity and Non-Hannity viewers -> Greater Number of Cases and Deaths seems perfectly plausible. But even under this expanded causal chain, viewing Hannity can only cause a case if a Hannity viewer actually contracts the virus. Non-cautious behavior by itself does not spontaneously create coronavirus. So to truly evaluate this claim, you would at a minimum need to know infection status at the viewer level, no?

          My point is not that it’s unreasonable to think that the misinformation spread by Hannity and others was harmful. I certainly think it was harmful! My point is simply that there is a major piece of the causal chain (presence/absence of the coronavirus) that is completely missing from the authors’ data.

        • Oh, absolutely. I don’t think showing it among viewers is a vital step (i.e. them including it would be neither necessary nor sufficient to persuade me of the study’s validity) but I absolutely agree that it would make the paper much more convincing if they could.

        • Sam:

          I guess the biggest thing is that there can be systematic pre-treatment differences between people who watch one TV show or the other, differences that are not captured in the pre-treatment variables included in the regression. Or, if you use a geographic instrument, there can be systematic pre-treatment differences between places that watch more of show A, compared to places that watch more of show B, differences that are not captured in the pre-treatment variables included in the regression.

          It’s the usual concern with observational studies. One of my problems with “identification strategies” is that they can lead people to turn off their observational-study brains. We’ve seen this in some of the regression discontinuity studies we’ve discussed on the blog, most notoriously the air-pollution in China study but lots more too. It’s a kind of magic trick: everyone’s staring at the instrument or the forcing variable or whatever and forgetting to check that there are no major unexplained pre-treatment differences between the groups.

          Also, in answer to your other question: If I were studying this, I’d recommend starting with the simple comparison, then looking at differences between places that had more viewing of one show than the other, then adding predictors to the regression, etc. Basically, to think of it like an observational study. I’m skeptical that much can be learned here—there are just too many important differences between geographical areas that will just happen to be correlated with what show people watch—but, who knows, it could be worth a try. I think one problem here is the norm in social science where it’s not enough just to do exploratory work, you have to frame things as proof, or at least strong evidence.

        • > My point is simply that there is a major piece of the causal chain (presence/absence of the coronavirus) that is completely missing from the authors’ data.

          That doesn’t invalidate the reasoning. Causal chains often have unobserved elements. I think epidemiologic and public health research often looks at the effects from interventions on outcomes at the population level.

        • Andrew,

          Thanks or your response! I see where you’re coming from now.

          Also: `It’s a kind of magic trick` sounds like a good headline for a blog post :)

          Sam

      • “I don’t need a specific reason to not be convinced. Burden of proof is on the person trying to do the convincing.”

        Andrew, but you do have to have grounds for skepticism as well.

        Fisher, for instance, took that position to the point of unreasonableness in the smoking and cancer debate. At some point you run out of plausible alternative stories to explain the observed data.

        Here, for example, you raise the problem of confounding due to unmeasured pre-treatment differences. Say we do a proper sensitivity analysis and find that the results are robust to most plausible ranges of confounding. Then I argue the burden of proof is on you to come with an alternative story for scepticism. I’m saying this as a general statement, I don’t think it would be the case in this study in particular.

        So, if you want to say that your prior is strong enough, and the stakes low enough that you don’t even bother, that’s fine. But there must be some caution on generalizing this burden of proof argument.

        • Carlos:

          See above. I think the thing to do is to seriously look at pre-treatment differences between people who watched each show, and at pre-treatment differences between areas where more or fewer people watched each show, and then try to adjust for these differences. That is, start with the simple comparison, recognize that there will be problems, look carefully for these problems, and then adjust for them if you can. That’s kind of the opposite approach of finding an identification strategy, getting an answer, and then running a bunch of robustness checks to show that the answer holds up.

          I’m starting from the attitude that I have an observational comparison which, with luck and effort, I can interpret causally. What’s done in this (and in lots of similar work in social science) is to start with a shot at getting the causal inference and then do some tests to preserve that causal interpretation. Sometimes this can work, if the path is direct enough and the effect is large enough, but in general I don’t think this will work.

          Smoking and cancer is a huge effect, and it does seem like Fisher was off on that one. But . . . explaining differences in coronavirus spread in different U.S. cities? There are huge effects here which are obviously not caused by which Fox news show people are watching. The 1950s analogy for cancer studies would not be smoking, it would be whether people read Time or Life, for example. And cancer researchers in the 1950s had better things to do than use instrumental variables to compare the death rates of Time readers to Life readers.

          I have sympathy for the authors of the above-linked paper. They’re doing what they’ve been trained to do, the thing that is respected in social science academia and the news media. There’s no reason they should listen to me, rather than to their peers in economics and political science departments, journal editors, etc. But I still think they’re going about things wrong. And I don’t think their work is of the smoking-cancer sort, where you have to go out on a limb to be skeptical of it.

        • Andrew, thanks. In general, I agree with the feeling of your quote:

          “One of my problems with “identification strategies” is that they can lead people to turn off their observational-study brains.”

          And regarding this paper in particular, my prior is actually based on a similar feeling.

          That said though, the post and some parts of the comments may come across as an overly skeptical approach to observational studies in general, and that can spill over to other cases when it matters.

        • “I think the thing to do is to seriously look at pre-treatment differences between people who watched each show, and at pre-treatment differences between areas where more or fewer people watched each show, and then try to adjust for these differences. That is, start with the simple comparison, recognize that there will be problems, look carefully for these problems, and then adjust for them if you can. That’s kind of the opposite approach of finding an identification strategy, getting an answer, and then running a bunch of robustness checks to show that the answer holds up.”

          Andrew, I ask this out of interest and for my own edification, not to be confrontational. But at least on my reading, this is basically what the authors do, far more so than in a traditional IV paper. They show pre-treatment differences in their Figure A.2, they discuss the empirical challenges, and then they show OLS coefficient estimates under all combinations among a large set of controls (Figure 5). After documenting that the OLS results seem to be robust, they proceed to the IV. I’m very curious as to what you would have done differently, or if you simply feel that the question cannot be convincingly answered.

  9. My guess is the reason you see all of those robustness plots (Figure 5, Figure 11, some of the appendix plots) is that they were trying to convince themselves that the model really works, and they then said “If we needed all of these plots to convince ourselves, readers will too” . But that strategy may have backfired.

  10. These studies must be just for amusement. If we compare geographic areas that watch the weather on NY city stations with areas that watch the weather on Cheyenne, Wyoming stations, we would find that watching the weather in NY City is much more dangerous to your health. Meaning nothing.

    Hard to believe that last paper referenced comes from the University of Chicago Economics Department. I guess everybody wats to get on the Freakonomics bandwagon.

    • Zbicyclist:

      I can’t tell if you’re kidding with the reference to the University of Chicago Economics Department, but they’re notorious for believing just about anything that comes from a quantitative analysis! Two classic examples are here and here. Neither of these is from Levitt.

      • You caught the sarcasm.

        But this brings up something that’s a bit of a mystery. We often see studies identified as “Harvard researchers” or “a study by the University of Wisconsin” as if that institutional affiliation is very relevant.

        Yes, the average quality of research from, say, Columbia is probably higher than that from the University of Southern North Dakota at Hoople, but the variance around that mean is large. The actual institutional affiliation isn’t really that much of a quality indicator.

        I can understand why university PR departments issue press releases this way, but that doesn’t mean media outlets have to play along.

  11. When anybody uses 2SLS the question I always have is: How many instruments did they try before they l got one they liked? If they tried more than one, the significance tests are meaningless. In my experience, authors never say how many instruments they tried, and without such a statement, the research should be considered meaningless.

Leave a Reply to samuel Cancel reply

Your email address will not be published. Required fields are marked *