Skip to content

On deck through the rest of the year (and a few to begin 2018)

Here they are. I love seeing all the titles lined up in one place; it’s like a big beautiful poem about statistics:

  • After Peptidegate, a proposed new slogan for PPNAS. And, as a bonus, a fun little graphics project.
  • “Developers Who Use Spaces Make More Money Than Those Who Use Tabs”
  • Question about the secret weapon
  • Incentives Matter (Congress and Wall Street edition)
  • Analyze all your comparisons. That’s better than looking at the max difference and trying to do a multiple comparisons correction.
  • Problems with the jargon “statistically significant” and “clinically significant”
  • Capitalist science: The solution to the replication crisis?
  • Bayesian, but not Bayesian enough
  • Let’s stop talking about published research findings being true or false
  • Plan 9 from PPNAS
  • No, I’m not blocking you or deleting your comments!
  • “Furthermore, there are forms of research that have reached such a degree of complexity in their experimental methodology that replicative repetition can be difficult.”
  • “The Null Hypothesis Screening Fallacy”?
  • What is a pull request?
  • Turks need money after expensive weddings
  • Statisticians and economists agree: We should learn from data by “generating and revising models, hypotheses, and data analyzed in response to surprising findings.”
  • My unpublished papers
  • Bigshot psychologist, unhappy when his famous finding doesn’t replicate, won’t consider that he might have been wrong; instead he scrambles furiously to preserve his theories
  • Night Hawk
  • Why they aren’t behavioral economists: Three sociologists give their take on “mental accounting”
  • Further criticism of social scientists and journalists jumping to conclusions based on mortality trends
  • Daryl Bem and Arthur Conan Doyle
  • Classical statisticians as Unitarians
  • Slaying Song
  • What is “overfitting,” exactly?
  • Graphs as comparisons: A case study
  • Should we continue not to trust the Turk? Another reminder of the importance of measurement
  • “The ‘Will & Grace’ Conjecture That Won’t Die” and other stories from the blogroll
  • His concern is that the authors don’t control for the position of games within a season.
  • How does a Nobel-prize-winning economist become a victim of bog-standard selection bias?
  • “Bayes factor”: where the term came from, and some references to why I generally hate it
  • A stunned Dyson
  • Applying human factors research to statistical graphics
  • Recently in the sister blog
  • Adding a predictor can increase the residual variance!
  • Died in the Wool
  • “Statistics textbooks (including mine) are part of the problem, I think, in that we just set out ‘theta’ as a parameter to be estimated, without much reflection on the meaning of ‘theta’ in the real world.”
  • An improved ending for The Martian
  • Delegate at Large
  • Iceland education gene trend kangaroo
  • Reproducing biological research is harder than you’d think
  • The fractal zealots
  • Giving feedback indirectly by invoking a hypothetical reviewer
  • It’s hard to know what to say about an observational comparison that doesn’t control for key differences between treatment and control groups, chili pepper edition
  • PPNAS again: If it hadn’t been for the jet lag, would Junior have banged out 756 HRs in his career?
  • Look. At. The. Data. (Hollywood action movies example)
  • “This finding did not reach statistical sig­nificance, but it indicates a 94.6% prob­ability that statins were responsible for the symptoms.”
  • Wolfram on Golomb
  • Irwin Shaw, John Updike, and Donald Trump
  • What explains my lack of openness toward this research claim? Maybe my cortex is just too damn thick and wrinkled
  • I love when I get these emails!
  • Consider seniority of authors when criticizing published work?
  • Does declawing cause harm?
  • Bird fight! (Kroodsma vs. Podos)
  • The Westlake Review
  • “Social Media and Fake News in the 2016 Election”
  • Also holding back progress are those who make mistakes and then label correct arguments as “nonsensical.”
  • Just google “Despite limited statistical power”
  • It is somewhat paradoxical that good stories tend to be anomalous, given that when it comes to statistical data, we generally want what is typical, not what is surprising. Our resolution of this paradox is . . .
  • “Babbage was out to show that not only was the system closed, with a small group controlling access to the purse strings and the same individuals being selected over and again for the few scientific honours or paid positions that existed, but also that one of the chief beneficiaries . . . was undeserving.”
  • Irish immigrants in the Civil War
  • Mixture models in Stan: you can use log_mix()
  • Don’t always give ’em what they want: Practicing scientists want certainty, but I don’t want to offer it to them!
  • Cumulative residual plots seem like they could be useful
  • Sucker MC’s keep falling for patterns in noise
  • Nice interface, poor content
  • “From that perspective, power pose lies outside science entirely, and to criticize power pose would be a sort of category error, like criticizing The Lord of the Rings on the grounds that there’s no such thing as an invisibility ring, or criticizing The Rotter’s Club on the grounds that Jonathan Coe was just making it all up.”
  • Chris Moore, Guy Molyneux, Etan Green, and David Daniels on Bayesian umpires
  • Using statistical prediction (also called “machine learning”) to potentially save lots of resources in criminal justice
  • “Mainstream medicine has its own share of unnecessary and unhelpful treatments”
  • What are best practices for observational studies?
  • The Groseclose endgame: Getting from here to there.
  • Causal identification + observational study + multilevel model
  • All cause and breast cancer specific mortality, by assignment to mammography or control
  • Iterative importance sampling
  • Rosenbaum (1999): Choice as an Alternative to Control in Observational Studies
  • Gigo update (“electoral integrity project”)
  • How to design and conduct a subgroup analysis?
  • Local data, centralized data analysis, and local decision making
  • Too much backscratching and happy talk: Junk science gets to share in the reputation of respected universities
  • Selection bias in the reporting of shaky research: An example
  • Self-study resources for Bayes and Stan?
  • Looking for the bottom line
  • “How conditioning on post-treatment variables can ruin your experiment and what to do about it”
  • Trial by combat, law school style
  • Causal inference using data from a non-representative sample
  • Type M errors studied in the wild
  • Type M errors in the wild—really the wild!
  • Where does the discussion go?
  • Maybe this paper is a parody, maybe it’s a semibluff
  • As if the 2010s never happened
  • Using black-box machine learning predictions as inputs to a Bayesian analysis
  • It’s not enough to be a good person and to be conscientious. You also need good measurement. Cargo-cult science done very conscientiously doesn’t become good science, it just falls apart from its own contradictions.
  • Air rage update
  • Getting the right uncertainties when fitting multilevel models
  • Chess records page
  • Weisburd’s paradox in criminology: it can be explained using type M errors
  • “Cheerleading with an agenda: how the press covers science”
  • Automated Inference on Criminality Using High-tech GIGO Analysis
  • Some ideas on using virtual reality for data visualization: I don’t really agree with the details here but it’s all worth discussing
  • Contribute to this pubpeer discussion!
  • For mortality rate junkies
  • The “fish MRI” of international relations studies.
  • “5 minutes? Really?”
  • 2 quick calls
  • Should we worry about rigged priors? A long discussion.
  • I’m not on twitter
  • I disagree with Tyler Cowen regarding a so-called lack of Bayesianism in religious belief
  • “Why bioRxiv can’t be the Central Service”
  • Sudden Money
  • The house is stronger than the foundations
  • Please contribute to this list of the top 10 do’s and don’ts for doing better science
  • Partial pooling with informative priors on the hierarchical variance parameters: The next frontier in multilevel modeling
  • Does racquetball save lives?
  • When do we want evidence-based change? Not “after peer review”
  • “I agree entirely that the way to go is to build some model of attitudes and how they’re affected by recent weather and to fit such a model to “thick” data—rather than to zip in and try to grab statistically significant stylized facts about people’s cognitive illusions in this area.”
  • “Bayesian evidence synthesis”
  • Freelance orphans: “33 comparisons, 4 are statistically significant: much more than the 1.65 that would be expected by chance alone, so what’s the problem??”
  • Beyond forking paths: using multilevel modeling to figure out what can be learned from this survey experiment
  • From perpetual motion machines to embodied cognition: The boundaries of pseudoscience are being pushed back into the trivial.
  • Why I think the top batting average will be higher than .311: Over-pooling of point predictions in Bayesian inference
  • “La critique est la vie de la science”: I kinda get annoyed when people set themselves up as the voice of reason but don’t ever get around to explaining what’s the unreasonable thing they dislike.
  • How to discuss your research findings without getting into “hypothesis testing”?
  • Does traffic congestion make men beat up their wives?
  • The Publicity Factory: How even serious research gets exaggerated by the process of scientific publication and reporting
  • I think it’s great to have your work criticized by strangers online.
  • In the open-source software world, bug reports are welcome. In the science publication world, bug reports are resisted, opposed, buried.
  • If you want to know about basketball, who ya gonna trust, a mountain of p-values . . . or that poseur Phil Jackson??
  • Quick Money
  • An alternative to the superplot
  • “Quality control” (rather than “hypothesis testing” or “inference” or “discovery”) as a better metaphor for the statistical processes of science
  • Whipsaw
  • Using Mister P to get population estimates from respondent driven sampling
  • “Americans Greatly Overestimate Percent Gay, Lesbian in U.S.”
  • Looking for data on speed and traffic accidents—and other examples of data that can be fit by nonlinear models
  • Pseudoscience and the left/right whiplash
  • The time reversal heuristic (priming and voting edition)
  • The Night Riders
  • Why you can’t simply estimate the hot hand using regression
  • Stan to improve rice yields
  • When people proudly take ridiculous positions
  • “A mixed economy is not an economic abomination or even a regrettably unavoidable political necessity but a natural absorbing state,” and other notes on “Whither Science?” by Danko Antolovic
  • Noisy, heterogeneous data scoured from diverse sources make his metanalyses stronger.
  • What should this student do? His bosses want him to p-hack and they don’t even know it!
  • Fitting multilevel models when predictors and group effects correlate
  • I hate that “Iron Law” thing
  • High five: “Now if it is from 2010, I think we can make all sorts of assumptions about the statistical methods without even looking.”
  • “What is a sandpit?”
  • No no no no no on “The oldest human lived to 122. Why no person will likely break her record.”
  • Tips when conveying your research to policymakers and the news media
  • Graphics software is not a tool that makes your graphs for you. Graphics software is a tool that allows you to make your graphs.
  • Spatial models for demographic trends?
  • A pivotal episode in the unfolding of the replication crisis
  • We start by talking reproducible research, then we drift to a discussion of voter turnout
  • Wine + Stan + Climate change = ?
  • Stan is a probabilistic programming language
  • Using output from a fitted machine learning algorithm as a predictor in a statistical model
  • Poisoning the well with a within-person design? What’s the risk?
  • “Dear Professor Gelman, I thought you would be interested in these awful graphs I found in the paper today.”
  • I know less about this topic than I do about Freud.
  • Driving a stake through that ages-ending-in-9 paper
  • What’s the point of a robustness check?
  • Oooh, I hate all talk of false positive, false negative, false discovery, etc.
  • Trouble Ahead
  • A new definition of the nerd?
  • Orphan drugs and forking paths: I’d prefer a multilevel model but to be honest I’ve never fit such a model for this sort of problem
  • Popular expert explains why communists can’t win chess championships!
  • The four missing books of Lawrence Otis Graham
  • “There was this prevalent, incestuous, backslapping research culture. The idea that their work should be criticized at all was anathema to them. Let alone that some punk should do it.”
  • Loss of confidence
  • “How to Assess Internet Cures Without Falling for Dangerous Pseudoscience”
  • Ed Jaynes outta control!
  • A reporter sent me a Jama paper and asked me what I thought . . .
  • Workflow, baby, workflow
  • Two steps forward, one step back
  • Yes, you can do statistical inference from nonrandom samples. Which is a good thing, considering that nonrandom samples are pretty much all we’ve got.
  • The Night Riders
  • The piranha problem in social psychology / behavioral economics: The “take a pill” model of science eats itself
  • Ready Money
  • Stranger than fiction
  • “The Billy Beane of murder”?
  • Red doc, blue doc, rich doc, rich doc
  • Working Class Postdoc
  • UNDER EMBARGO: the world’s most unexciting research finding
  • Setting up a prior distribution in an experimental analysis
  • Walk a Crooked MiIe
  • It’s . . . spam-tastic!
  • The failure of null hypothesis significance testing when studying incremental changes, and what to do about it
  • A debate about robust standard errors: Perspective from an outsider
  • Stupid-ass statisticians don’t know what a goddam confidence interval is
  • Forking paths plus lack of theory = No reason to believe any of this.
  • (Spammed by Google Ventures): Turn your scatterplots into elegant apparel and accessories!
  • Your (Canadian) tax dollars at work

And a few to begin 2018:

  • The Ponzi threshold and the Armstrong principle
  • I’m with Errol: On flypaper, photography, science, and storytelling
  • Politically extreme yet vital to the nation
  • How does probabilistic computation differ in physics and statistics?
  • “Each computer run would last 1,000-2,000 hours, and, because we didn’t really trust a program that ran so long, we ran it twice, and it verified that the results matched. I’m not sure I ever was present when a run finished.”


We’ll also intersperse topical items as appropriate.

P.S. Some of the items on the above list have changed since the original posting on 20 Jun 2017. That’s cos I got mugged by the news media and I’ve changed or eliminated various posts that I thought might make people angry or might give ammunition to people who want me to look bad. Self-censorship, that is. You can view this as unfortunate in that I’m removing solid material out of fear of specious attacks, or you can view it as a positive development in that I’m responding to feedback and providing posts with more content and less bile. I think it’s a bit of both.


  1. Maurits says:

    This is very unfortunate, because there are quite a few that I want to read RIGHT NOW instead of waiting for them to be published. Are these in (future) chronological order?

  2. Simon Gates says:

    How on earth do you manage to write so much? And keep the quality so high? It’s a fantastic achievement – thank you!

  3. What fun! I look forward to the reading and anticipation.

  4. Anonymous says:

    Wine + Stan + Climate change = ?

    Drunken climate modelling?

  5. Ben Prytherch says:

    I’m impatient enough that I Googled “This finding did not reach statistical sig­nificance, but it indicates a 94.6% prob­ability that statins were responsible for the symptoms.” Looks like the language was corrected, and all record of the previous wording wiped from existence:

    Now I feel like I’m doing the equivalent of peeking inside the wrapping of a present before Christmas.

  6. Hernan Bruno says:

    Is this how blogging is supposed to work?

  7. Angus says:

    Looking forward to “Stupid-ass statisticians don’t know what a goddam confidence interval is”

    I’ve suspected that sometimes people confuse the 95% confidence interval of a point estimate with 95% quantiles too.

    From Wikipedia-
    While results vary slightly across reputable studies, the consensus is that the mean human penis, when erect, is in the range 12.9–15 cm (5.1–5.9 in) in length with a 95% confidence interval of (10.7 cm, 19.1 cm) or, equivalently (4.23 in, 7.53 in) — that is, it is 95% certain that the true mean is at least 10.7 cm but not more than 19.1 cm.

    Strikes me as some big measurement (hur) error.

  8. Jordan Anaya says:

    Well it appears I’m going to make it onto this blog for something other than pizzagate.

    One underappreciated fact about my recent pizzagate paper is that I rewrote some of the rpsychi functions to recalculate ANOVAs (I previously just called rpsychi with rpy2 but that is kind of slow and I might want to run these functions on a large scale at some point).

    So the “two_way” function here: might be the only Python based implementation, I don’t know. I also don’t know how many people would find it of use, as it is mainly used to make sure people are calculating ANOVAs correctly, which I guess people previously took for granted (of course the ANOVA could be calculated *correctly* but the reported sample sizes for that row of the table could be inaccurate or the means or SDs could be reported incorrectly).

    • Andrew says:


      I’m a big fan of the Anova concept but not of the particular calculations of classical Anova; see this 2005 paper for my perspective. What I really need to do is perform an Anova-like analysis for a real example in Stan; this could be a template for others to follow when doing their applied analysis.

      (I understand the value of what you’re doing for forensic purposes; moving forward, though, I think we can do much better than classical Anovas for analyzing structured data.)

    • Andrew says:


      I followed the link. It may be good advice being offered there, but I can’t imagine non-scientists being able to answer questions such as, “8. Do results answer the SPECIFIC QUESTION(S)?”, considering that even trained scientists have difficulty with this one. Recall that National Academy of Sciences member Susan Fiske got bamboozled by the air rage, himmicanes, and ages-ending-in-9 papers, and some number of JPSP reviewers fell for Daryl Bem’s terrible, terrible ESP paper. My point is not to pick on those particular reviewers, but rather to point out the challenge of asking non-scientist laypeople to do an evaluation that real-world scientists can’t do, even when reading papers in their fields of study.

Leave a Reply