Skip to content

If the outbreak ended, does that mean the interventions worked? (Jon Zelner talk tomorrow)

Jon Zelner speaks tomorrow (Thurs) at 1pm:


In this talk Dr. Zelner will discuss some ongoing modeling work focused on understanding when we can and cannot infer that interventions meant to stop or slow infectious disease transmission have actually worked, and when observed outcomes cannot be distinguished from selection bias.

Dude’s an epidemiologist, though, so better check his GRE scores before going on. Also, test his markets to check that they’re sufficiently thick and liquid (yuck)!


  1. An Epidemiologist says:

    What’s the deal with Epidemiologists and GREs? I’m telling Sander Greenland about this!

  2. Renzo Alves says:

    Why does his GRE score matter? Wouldn’t his CV be a better guide?

  3. RW says:

    Why do people beat up on Cowen? He started a conversation and even publishes responses that go against some of his assumptions. I don’t see the big deal, although it is funny :)

    • Andrew says:


      Come back to me with your GRE score and some measure of the thickness and liquidity of your markets, and then we can talk. Otherwise, I’m concerned that you do not sufficiently grasp that long-run elasticities of adjustment are more powerful than short-run elasticites.

      • Justin says:

        I noticed the Columbia stats department requires applicants to take the GRE. Why?

        • Andrew says:


          We require applicants to take the GRE because it provides valuable information about some very useful skills. I’m a big fan of standardized testing. Test scores aren’t the only useful piece of information, but they’re useful and relatively inexpensive.

          • Joshua says:

            Andrew –

            I’m a bit surprised to read that comment.

            What do you think the predictive validity of the GRE is? For example, how well do you think the scores of applicants predict grades in graduates school, or graduation rates, or career success, or the ability of students to reason their way through conditional probability?

            How reliable do you think it is? If someone scores a 500 on the a first test, what do you think the likely range would be I’m the 2nd time you take it.

            My own is against standardized testing, except to the extend that I much prefer criterion referenced-esting to norm-based testing. A student’s GRE score tells you almost nothing as a teacher about that student’s strengths and weaknesses, gaps in knowledge (or expertise), intrinsic focus in their learning as opposed to focus only on grades, etc.

            • Joshua says:

              An anecdotal story. I once had an experience at a high-falutin University that colored my view towards GREs. Their graduate education program was highly ranked, but they wanted to increase the rank further. So they weighed different options, and ultimately decided to increase the minimum GRE score for in-coming students.

              It struck me that at an institution focused on quality of education, to make the program better evaluated they didn’t change anything, really, about the educational program itself. They didn’t really do much to change the students educational experience (sure, you could argue that raising GRE minimums enhanced the quality of the student body – but even at best that was only along one, rather narrow dimension).

            • Andrew says:


              I’m not an expert on this, but my impression was that the GRE does measure useful skills. Regarding reliability, I googled *gre test retest reliability* and found this from ETS which claimed high reliability. That said, I guess that cheating will only be increasing.

              I agree with you that there’s a lot that the GRE doesn’t tell you. The GRE is only one piece of information used in our admissions process, and we do sometimes admit strong candidates with relatively low GRE scores.

              • Joshua says:

                Andrew –


                Yah, I guess it depends on what you consider high reliability. Obviously, the ETS has a dog in the fight. I think I’ve read something like .90…but I’ve also read something like a 500 on the first test is likely to result in a range of 400-600 on a second test. So I’m not saying it’s totally invaluable, but that it’s value should be considered in that light. My sense it that it’s real value is just as a marker for evaluating incoming students along one dimension – and mostly in the sense that someone who scores at the high of the spectrum would likely have better educational outcomes than someone who scores at the low end.

                I think it’s probably not unlike what they’re finding about funding applications – where you can probably evaluate the really good ones and the really bad ones pretty reliably (in terms of people rating them the same in multiple iterations) but not so much for the much large band in the middle.

              • Joshua says:

                Andrew –

                > The GRE is only one piece of information

                I don’t disagree with that. My point is that I think that often, it’s value is over-rated by the people who use it. It, also, kind of fits into the category for me of one of those metrics that people like because it’s putting a number on something – without actually evaluating very carefully whether it’s a number that really measures what it is that you want to measure.

              • Joshua says:

                Andrew –

                > That said, I guess that cheating will only be increasing….

                Another, related anecdote. At that same university, they became concerned about the high level of cheating. So they had a lot of meetings, and decided that they needed to enforce more strict methods for preventing cheating. Not once, in any of the discussions, did I hear anyone discussing that maybe the problem is that their educational program resulted in students who thought that there was value in cheating.

                As for the GRE aspect – the whole real world versus lab-based assessment of the cheating thing is an issue, obviously, in terms of validity that is probably not incorporated into how the ETS evaluates the validity. In a lab paradigm, you’d never catch the impact of cheating that exists in the real world. As you probably know, there are entire industries in China built around helping students cheat – and of course, the coaching aspect here in the states would necessarily impact the validity in the sense that students who get tutoring help on doing well on the test might show a signal in their performance that would not really show up in their real world performance on related tasks.

              • Curious says:

                How many GRE tests do you believe have been administered over the past 20 years?

                Why do you think ETS reports reliability metrics on a sample of < 2000?

                Does that raise a red flag for anyone else?

              • Curious says:

                And please tell me this claim is a joke:

                “To use the SEM of score differences, multiply the value by 2. Score differences exceeding this value are likely to reflect real differences in ability at approximately a 95 percent confidence level.”

                There is no way they have any evidence to support that claim other than wishful thinking.

              • Andrew says:


                Controlled studies are typically done on small populations. You might as well ask a clinical trial of a heart procedure only includes 1000 patients, given that there are millions of heart patients in this country.

                I’m not trying to defend everything that ETS did and said in this study, and, yes, they have an interest in saying positive things about their test, but I think it’s standard practice to test reliability in controlled studies. The goal in this sort of study is to isolate reliability of measurement, separated from other issues such as changes in ability from one test-taking period to another.

              • Curious says:

                While I will readily acknowledge that scoring highly on these tests is a clear demonstration of previous academic achievement, given that’s what the test is intended to measure (not intelligence, nor IQ) in a standardized method of which grades themselves are not. I think interpreting them as more than a crude metric will more likely result in biased decisions against groups we would rather not experience more bias than they already do.

                That said, this data does not represent data from a tightly controlled experiment. It is a simple calculation of the Kuder-Richardson reliability formula (I would pay money to see a real test-retest reliability across administrations rather than split-half that was even close to those numbers), which could be done on any and all of their data. It could be done on every single administration they’ve conducted by location to show variations across those, if they were willing. It could be standardized reporting that is posted on their website and shared with the public, and in my opinion should be given their footprint in higher education. But, it’s not. It is as difficult to get real data out of ETS as it is from pharmaceutical RTCs. They could publicly post predictive validity for grades and attrition every year by year by simply creating a required feedback loop.

                And I will repeat that there is no way they have strong evidence to support their claim of clear differences of ability based on measurement error intervals as small as the one’s they’ve posted given the crudeness of the metric.

            • Martha (Smith) says:

              Joshua said,
              “What do you think the predictive validity of the GRE is? For example, how well do you think the scores of applicants predict grades in graduates school, or graduation rates, or career success, or the ability of students to reason their way through conditional probability?”

              I would guess that the answer depends on the graduate school and the field. ;~)

              • Martha (Smith) says:

                Some background:
                Many years ago, I was (for 4 years) on the committee that produced a list of the top n (I think about 10) applicants for NSF graduate fellowships in math, based on quality of the application (GRE scores, letters of recommendation, anything else relevant –e.g., winning the Putnam exam). The committee only ranked the applicants — the choice of who would be offered the fellowships was made by NSF, taking into account not just the rankings the committee gave, but also things like requirements that the fellowships be geographically balanced. My recollection is that virtually all of the people on the ranked “top candidates” list had GRE Advanced Math scores above 900.

              • Curious says:

                Any possibility the GRE scores were available to all who weighed in?

              • Martha (Smith) says:

                Curious: I’ m not clear on what you’re asking — in particular, I’m not sure what you mean by “all who weighed in”.

          • Pretty sure there is a typo Andrew meant “test scores AREN’T the only useful…”

          • jim says:

            “I’m a big fan of standardized testing. Test scores aren’t the only useful piece of information, but they’re useful and relatively inexpensive.”

            Wow! A fan of standardized testing!!!! I guess at least you have Salman Rushdie to talk to. So that’s why you have a blog, you’re in the witness protection program!

    • Chris Wilson says:

      Here are some more boomerangs from Cowen’s list:

      d. What is their overall track record on predictions, whether before or during this crisis?

      [CW: pot meet kettle]

      e. On average, what is the political orientation of epidemiologists? And compared to other academics? Which social welfare function do they use when they make non-trivial recommendations?

      [CW: what social welfare functions do pro big-business economists use?]

      g. How well do they understand how to model uncertainty of forecasts, relative to say what a top econometrician would know?

      [CW: mmmhmmm]

      i. How many of them have studied Philip Tetlock’s work on forecasting?

      [CW: pass]

      I want this list to be deeply ironic, but I fear it is not. It is cringe-worthy!

      • Carlos Ungil says:

        [CW: pot meet kettle]

        To be fair, his post is quite critical of forecasts from economists as well. And his conclusion before going into those “rude questions” is quite reasonable: “On this list, I think my #1 comes closest to being an actual criticism, the other points are more like observations about doing science in a messy, imperfect world. In any case, when epidemiological models are brandished, keep these limitations in mind. But the more important point may be for when critics of epidemiological models raise the limitations of those models. Very often the cited criticisms are chosen selectively, to support some particular agenda, when in fact the biases in the epidemiological models could run in either an optimistic or pessimistic direction.”

        For what it’s worth, I also find mildly disturbing that our host digs into the CVs of the authors of recent covid papers to see if they have at least a master in epidemiology or if they are are just public health graduates or, God forbids, economists. Now, if they were psychologists I would share his concern :-)

        • Andrew says:


          I think it’s completely reasonable when reading a scientific paper to look at the backgrounds of the authors. In the case of that Stanford paper, it’s not like I went, “Hey, the author list includes 0 statisticians and only 2 epidemiologists; therefore I don’t like it!” Rather, I went, “Hey, this paper has serious problems. Even though the authors are from Stanford! What happened? OK, let’s look at the author list to have some sense of what went on.” I don’t see why you consider it disturbing that I did this.

          • Carlos Ungil says:

            I was being dramatic, but your first appraisal was “a bunch of doctors and med students” which didn’t seem a good reflection of reality either. The first author (“a professor of medicine—I think that means he’s a doctor, not an epidemiologist. His graduate degrees are an MD and a masters in health services.”) doesn’t have an epidemiology degree but otherwise is not far from being a “professional epidemiologist” (

            Actually I was mostly thinking of your remark about the IHME, which even in the amended form reads “In addition to the doctors, economists, and others who staff the aforementioned institute, this university also has a biostatistics department” when, as was pointed in a comment, there is also a bunch of epidemiologists there (with epidemiology PhDs and all).

            It’s reasonable to look at the background of the authors, but it’s also good to remember that the background is not just the degree that someone got. There was also a discussion at some point about whether Fauci had the required credentials to talk about research, not having a PhD (but I think the general agreement was that he obviously did and and that holding a PhD in social science wasn’t necessarily better).

            • Andrew says:


              My quick assessments of the authors’ backgrounds may have been incomplete. But, again, I was not using their backgrounds to judge the work. I was judging the work based on the work, and then using their backgrounds to try to understand what went wrong.

              • Carlos Ungil says:

                Fair enough. Cowen is also trying to understand what’s wrong with epidemiologists :-)

    • Dale Lehman says:

      I’ll take your question seriously. Cowen is very good at what he does – he (sometimes) raises interesting questions, cites interesting work, and leads to discussion and attention paid to many issues. What I object to is that I don’t like what he does. There is little in the way of editing what he cites, unlike Andrew and this blog. So much is written (especially in these COVID times) and so much is garbage that just citing anything that someone writes and provoking “discussion” for its own sake, strikes me as just adding noise to an already overly noisy environment. Mention anything about race, age, online education, the future of work, politics, etc. and you will get inundated by comments, most of which are meaningless, and often nasty/harmful. If you measure success by the number of comments, then this is surely successful. If you measure by quality of contribution to understanding, then I have serious questions. I used to read Marginal Revolution regularly because I often find references to things I would not easily find on my own. But I increasingly ignore it, because the ratio of noise to signal has gotten too high.

      This is related to the reason I had asked Andrew to retract his post referring to John Fund (and note that the post got more comments than any I’ve ever seen on this blog). It wasn’t that Ferguson’s work wasn’t worth serious discussion. Many of the points raised were quite thoughtful and interesting. But I hate the idea that such serious discussion was in any way tied (even superficially) to Fund’s hatchet job. Similarly, we could have an interesting and serious discussion about GRE scores, but I’d hate for that to be associated with the inane post by Cowen.

      • Jonathan (another one) says:

        I’m here to defend Cowen without defending the comments, which have unfortunately (over the last three years, not just the COVID era) descended into the unreadable. I *still* think he links to stuff I wouldn’t have otherwise found. Is a lot of it crap? Sure! But it really takes very little time to discover it’s crap compared to the time I’d have had to spend finding it in the first place.

        Remaining is the issue of whether Cowen’s own thoughts about the things he chooses to link to are of declining value. I think not, but only if you think of Cowen as some sort of Nabokovian unreliable narrator, in which his opinions need to be filtered through your own brain — and that’s good! He still occasionally provides me with iconoclastic readings of the common wisdom, which is worthwhile (at least to me) whether you buy the particular take or not.

        All that said, the comments are unreadable, and I stopped reading them, with very rare exceptions, about a year ago. And the very rare exceptions have told me that the situation is getting worse.

        • Andrew says:


          Yeah, I like Cowen and Tabarrok’s blog. It’s in the blogroll! I don’t agree with everything they write, but that’s ok. They probably don’t agree with everything I write, either. The commenters on that site . . . yeah, they’re pretty bad. I guess they have some value in that they sometimes point out things that Cowen and Tabarrok have missed in their posts, but lots of the commenters are just insults, you get some racism, all sorts of things. It’s sad to see.

  4. Jonathan (another one) says:

    So I watched it. The lesson was that in the presence of heterogeneity in transmissibility and a focus on the most severe outbreaks one risks confounding the effect of interventions with regression to the mean. That much I already knew, which is why I was interested… What I was hoping to see was a way to disentangle the two.

Leave a Reply