Skip to content
 

Wanted: Statistical success stories

Bill Harris writes:

Sometime when you get a free moment, it might be great to publish a post that links to good, current exemplars of analyses. There’s a current discussion about RCTs on a program evaluation mailing list I monitor. I posted links to your power=0.06 post and your Type S and Type M post, but some still seem to think RCTs are the foundation. I can say “Read one of your books” or “Read this or that book,” or I could say “Peruse your blog for the last, oh, eight-ten years,” but either one requires a bunch of dedication. I could say “Read some Stan examples,” but those seem mostly focused on teaching Stan. Some published examples use priors you no longer recommend, as I recall. I think I’ve noticed a few models with writeups on your blog that really did begin to show how one can put together a useful analysis without getting into NHST and RCTs, but I don’t recall where they are.

Relatedly, Ramiro Barrantes-Reynolds writes:

I would be very interested in seeing more in your blog about research that does a good job in the areas that are most troublesome for you: measurement, noise, forking paths, etc; or that addresses those aspects so as to make better inferences. I think after reading your blog I know what to look for to see when some investigator (or myself) is chasing noise (i.e. I have a sense of what NOT to do), but I am missing good examples to follow in order to do better research – I would consider myself a beginning statistician so examples of research that is well done and addresses the issues of forking paths, measurement, etc help. I think blog posts and the discussion that arises would be beneficial to the community.

So, two related questions. The first one’s about causal inference beyond simple analyses of randomized trials; the second is about examples of good measurement and inference in the context of forking paths.

My quick answer is that, yes, we do have examples in our books, and it doesn’t involve that much dedication to order them and take a look at the examples. I also have a bunch of examples here and here.

More specifically:

Causal inference without a randomized trial: Millennium villages, incumbency advantage (and again)

Measurement: penumbras, assays

Forking paths: Millennium villages, polarization

I guess the other suggestion is that we post on high-quality new work so we can all discuss, not just what makes bad work bad, but also what makes good work good. That makes sense. To start with, you should start pointing me to some good stuff to post on.

20 Comments

  1. Marcus Crede says:

    No paper is perfect but I really like the recent Psych. Science paper by Orben and Przybylski.
    https://journals.sagepub.com/doi/pdf/10.1177/0956797619830329

    Open data, preregistered, replication across three data sets, very good attempt at careful measurement, and demonstration of forking path issues via specification curves.

  2. jd says:

    I’m a bit too new to contribute examples, but I really like the idea of seeing high-quality new work highlighted to see what makes the “good work good”. I agree with the people who wrote in that examples with poor quality or errors are great to learn from, but it would be helpful/encouraging to see good work as well.

    Slightly related to this – it doesn’t take long to perceive that there is a lot(!) of disagreement regarding data analysis (no shocker; but a bit surprising if you are new to the field). I get the idea that any analysis can be criticized from perhaps a variety of parties/directions and there is not always an agreed upon best way to do something. Because of this, it does seem that it is more difficult to pin down “good work” as opposed to “bad work”, as perhaps even the “good work” can be found lacking at least in some point. Perhaps that’s not entirely true, but at least that’s my perception.

    So, “good work” as defined/pointed-out by you and the readers of this blog would be interesting.

    The examples like the Millennium Villages project are helpful. (Obviously examples like in vignettes in ‘rstanarm’ and ‘brms’ are helpful, but that’s not quite the same thing)

  3. MikeM says:

    Here’s a paper I wrote some 20 years ago, showing the benefits of visualizing data instead of blindly running it through a statistical package. https://www.academia.edu/589936/Visualizing_homicide

  4. Dale Lehman says:

    I have not read the paper (and don’t intend to). I also have not tried to obtain the data – presumably available, though you need to register to get it (I’m not sure if that is automatic or not). But I am puzzled by the data description – the Ireland data is listed as N=4,573 after data exclusions. It goes on to say that the Ireland data included 2,514 boys and 2,509 girls, and that “After data exclusions, 4,573 adolescents were included in the study.” I can’t seem to locate any description of what these “data exclusions” were. I have not read the supplementary material, nor have I looked at the code – so, perhaps you can save the trouble and tell me who was excluded and why.

    • Dale Lehman says:

      This was a response to Marcus above.

    • Anonymous says:

      Dale,
      I don’t know what the data exclusions were. I just offered up this paper as an example of a paper that does many of the things that are discussed on Andrew’s blog “correctly” (eg., replication, large samples, an attempt at good measurement, preregistration of analyses, multiverse analysis and specification curves etc.).

  5. jrc says:

    Figure IV in this paper (The Digital Provide) is one of my absolute favorites of all time – no tests, no p-values, clearly interpretable data, and a clear result replicated across space and time within the same natural experiment.

    http://creative.colorado.edu/~muna0394/CaseStudies/Readings/Week%206/watermark.pdf

    One thing I like about this is that it also breaks almost all of the rules (except for careful measurement): it wasn’t pre-registered, it was almost certainly something conceived after the data was collected, there are a million ways to code/organize/represent the data…and none of that matters in terms of how convincing it is.

    I think of this as an example where “You don’t need much statistics when you have good measurement and good variation and a good idea what you are looking for”. Of course, the undisputed heavyweight champion of elegant and convincing statistics is Snow’s Table IX on Cholera prevalence by water supply company (reprinted in Freedmen’s Statistical Models and Shoeleather as Table 1) – you don’t need a p-value when you have hundreds of thousands of observations, careful measurement and a real, big effect.

    http://psychology.okstate.edu/faculty/jgrice/psyc5314/Freedman_1991A.pdf

    • Anoneuoid says:

      One thing I like about this is that it also breaks almost all of the rules (except for careful measurement): it wasn’t pre-registered, it was almost certainly something conceived after the data was collected, there are a million ways to code/organize/represent the data…and none of that matters in terms of how convincing it is.

      What are you being convinced of?

      We assign price based on time of sale rather than time of exchange, i.e., prices for sales via beach auction are assigned to the time of auction, whereas sales via mobile phone are assigned to the time when the sale was arranged, not when the fish were delivered

      So the figure shows us that delaying delivery for some time after the sale stabilizes prices. The delay reduces “FOMO”, so buyers and sellers are making more rational assessments of the value.

      • I think we are being convinced that prices converge rapidly to a low volatility consensus price when communications devices are widespread.

        • Anoneuoid says:

          That seems to be their favorite explanation.

          Another would be one guy started arbitraging between the different markets as soon as he could get data good/cheap enough to make it profitable. In that case you don’t need the widespread part.

          • jrc says:

            Could be, but I don’t think one guy (per region) was doing it alone. Figure III shows that all the boats were getting cell phones, and Table II shows that fisherman started selling at non-local markets only after the cell towers turned on. So if someone is doing the arbitrage mostly on their own, it isn’t just them buying up the fish in one place and transporting them elsewhere – he was directing the fishing boats themselves where to go (presumably over the phone). But either way, it is hard to see how the new ability to communicate wasn’t, in some sense, causing the price dispersion to drop to almost zero, regardless of whether you think that was driven by one smart guy or driven by each boat independently (or some combination of the two).

            I don’t claim you have to believe their underlying search model – I personally think it is helpful in thinking about the problem but don’t think it has to map directly to describing the motivations of individual actors. But this result was the first one that allowed me to really see the extent to which information is necessary for well-functioning markets… and for how long inefficient markets will persist when information isn’t easily and cheaply disseminated. You don’t even have to agree with that (or you can think it is so obvious that development economists should never have lost sight of that basic insight/assumption of economic thoery) to think the graph does a pretty convincing job of showing that the appearance of cell phones led to huge decreases in price dispersion (and, from the other tables, waste).

            • Indeed they show that it takes around 20-30 weeks to get 100% adoption but volatility reduces in the first 10 weeks or so, probably 20% adoption is enough as it means every boat with 5 people on it could have a phone… information is critical to good markets and info asymmetry is where all the action is in the US: education, medical, tech, all work by “monetizing” commodity products into confusing packages where decision making is intentionally hard for the consumer… prosumer markets are much more about paying for specific resources rather than getting flummoxed by marketing BS

            • Anoneuoid says:

              Figure III shows that all the boats were getting cell phones, and Table II shows that fisherman started selling at non-local markets only after the cell towers turned on

              […]

              the graph does a pretty convincing job of showing that the appearance of cell phones led to huge decreases in price dispersion

              So then the game becomes figuring out what else corresponded with the construction of a cell tower that could explain this.

              How about:

              1) construction of cell towers either leads to or is associated with an influx of cash to a region
              2) people pay more for fish since cash supply increases
              3) fishermen accumulate savings that allows them to risk taking days off when demand is low, buy luxury items like cell phones, and stop ripping off desperate customers

              • jrc says:

                Sure…all that stuff just had to happen 3 times corresponding almost exactly with the turning-on of 3 different zones of cell phone towers.

                My point isn’t that this research is “perfect” – no research is perfect. But I do think it is really good and convincing non-experimental empirical work that does not rely on p-values or one source of variation and does an incredible job of visually presenting the data in a useful and digestible way. And I’m not sure you can ever really ask much more from empirical social science research…but maybe you think I should be expecting more?

            • Anoneuoid says:

              Sure…all that stuff just had to happen 3 times corresponding almost exactly with the turning-on of 3 different zones of cell phone towers.

              I’d expect the influx of money to have generally similar effects, just as you would expect easier communication to have generally similar effects.

              But my point is only to not focus on a single favorite explanation, especially not one devised after the fact. Bayes rule tells us we can only judge one possible explanation relative to the others. I don’t get the sense from that paper much thought was given to other explanations.

    • JRC

      Your 1st link points to how a very good description can be so valuable. Serge Lang mentioned in his book Challenges that a very good fiction writer, and perhaps an artist, can capture elements that mathematicians/statisticians and social scientists are not able. Of course, the writer or artist having a scientific bent is implicated. Lange was a mathematician, as you all know. Because of his activism, especially on HIV/AIDS, Lang would consult writers who had few or no conflicts of interests. He thought that some individuals have better diagnostic skills, regardless of their credentialing. T

      Raymond Hubbard’s defense of abduction would support this conjecture above, judging from reviews of his book Corrupt Research.

      The trick is to capture some of it visually and by evaluating the data through multiple approaches.

  6. Thanatos Savehn says:

    Stray note: Looking at these thoughtful responses reminds me to be grateful for the islands, such as this, of well-curated and cultivated blogs that I happen across way out here upon the sundering seas of inquiry and occasional discovery.

  7. Jonathan (another one) says:

    How about this p-value-less calibrated-to-relative-importance-of-Type-I-and-Type-II-error article that implements a method that could have gotten Harold Shipman arrested and saved 130 womens’ lives? https://academic.oup.com/intqhc/article/15/1/7/1797060

  8. Ramiro says:

    Thank you very much for this posting. I will look at the examples you mentioned and encourage Andrew and anybody else to post and discuss high quality work in order to learn about “what makes good work good”!

Leave a Reply