How statistics is used to crush (scientific) dissent.

Posted on June 11, 2019 12:03 PM by Andrew

Lakeland writes:

When we interpret powerful as political power, I think it’s clear that Classical Statistics has the most political power, that is, the power to get people to believe things and change policy or alter funding decisions etc… Today Bayes is questioned at every turn, and ridiculed for being “subjective” with a focus on the prior, or modeling “belief”. People in current power to make decisions about resources etc are predominantly users of Classical type methods (hypothesis testing, straw man NHST specifically, and to a lesser extent maximum likelihood fitting and in econ Difference In Difference analysis and synthetic controls and robust standard errors and etc all based on sampling theory typically without mechanistic models…).

The alternative is hard: model mechanisms directly, use Bayes to constrain the model to the reasonable range of applicability, and do a lot of computing to get fitted results that are difficult for anyone without a lot of Bayesian background to understand, and that specifically make a lot of assumptions and choices that are easy to question. It’s hard to argue against “model free inference procedures” that “guarantee unbiased estimates of causal effects” and etc. But it’s easy to argue that some specific structural assumption might be wrong and therefore the result of a Bayesian analysis might not hold…

So from a political perspective, I see Classical Stats as it’s applied in many areas as a way to try to wield power to crush dissent.

My reply:

Yup. But the funny thing is that I think that a lot of the people doing bad science also feel that they’re being pounded by classical statistics.

It goes like this:
– Researcher X has an idea for an experiment.
– X does the experiment and gathers data, would love to publish.
– Because of the annoying hegemony of classical statistics, X needs to do a zillion analyses to find statistical significance.
– Publication! NPR! Gladwell! Freakonomics, etc.
– Methodologist Y points to problems with the statistical analysis, the nominal p-values aren’t correct, etc.
– X is angry: first the statistical establishment required statistical significance, now the statistical establishment is saying that statistical significance isn’t good enough.
– From Researcher X’s point of view, statistics is being used to crush new ideas and it’s being used to force creative science into narrow conventional pathways.

This is a narrative that’s held by some people who detest me (and, no, I’m not Methodologist Y; this might be Greg Francis or Uri Simonsohn or all sorts of people.) There’s some truth to the narrative, which is one thing that makes things complicated.

126 thoughts on “How statistics is used to crush (scientific) dissent.”

Richard on June 11, 2019 12:40 PM at 12:40 pm said:

One difficulty, I imagine, is that the statistical methodologists aren’t stationary. It’s not that the rug is deliberately being pulled out from under hapless researchers; the methodologists are looking at problems that arose from old research and revising their methodological prescriptions. Unfortunately, Researcher X is a social scientist, not a statistician, and their statistical knowledge might be 10-15 years out of date, so they see something that’s been fine for most of their career suddenly become Bad Science, which I imagine is more than a little frustrating (as well as worrying, since it’s not necessarily just hitting your current research; it threatens to retroactively nullify everything you’ve done before as well).

Reply ↓
- Andrew on June 11, 2019 12:51 PM at 12:51 pm said:
  
  Richard:
  
  Yes, exactly.
  
  From my perspective, researcher X is using out-of-date methods to get wrong conclusions, and methodologist Y is just trying to be helpful.
  
  But from X’s perspective, the methodologists have been giving contradictory messages. Researcher X doesn’t necessarily distinguish between methodologist Y, who criticizes p-hacking and null hypothesis significance testing, and some earlier methodologist Z who recommended p-hacking and journal editor J who required statistical significance for publication.
  
  Reply ↓
  - Chris Wilson on June 11, 2019 1:51 PM at 1:51 pm said:
    
    My observation is that strongly worded papers, especially on methodological issues, from people such as yourself, can go a long way in helping X and J discern between Y and Z :) In other words, I can argue all day on logical grounds, or even produce mathematical derivations or proofs by simulation, but it has about 1/10 of the impact as a peer-reviewed paper by an eminent statistician. Sad, but that’s how this works…
    
    Reply ↓
  - Chris Wilson on June 11, 2019 1:57 PM at 1:57 pm said:
    
    My other observation is that there are a lot of old-school ‘Z’ methodologists still hanging around who have spent decades helping researchers troubleshoot intricate NHST problems in SAS or whatever. Suggesting it’s better to skip all that and go straight to fitting models in Stan does not always go over so well.
    
    Reply ↓
- Anoneuoid on June 11, 2019 4:35 PM at 4:35 pm said:
  
  their statistical knowledge might be 10-15 years out of date
  
  Most of these complaints have been known since at least the 1960s, I don’t think being “out of date” is a productive way to frame the issue. All that changed is collecting data and running analyses has become much cheaper so the already existing problems became impossible to ignore by even the biggest offenders.
  
  Reply ↓
The naked statistician on June 11, 2019 12:54 PM at 12:54 pm said:

Lakeland seems to suggest that frequentists are using their power to oppress Bayesians, but in recent years, Bayesians have become just as oppressive (and may I say arrogant) in their areas of influence, e.g. machine learning, foundations of statistics. I am not a frequentist but I generally find frequentists have a more flexible attitude in situations where being flexible is desirable.

Reply ↓
- gec on June 11, 2019 1:25 PM at 1:25 pm said:
  
  I think a good analogy might be that current Bayesian culture is a bit like the old days of the internet. Because it is such a small world, it is easy for the loudest voices to crowd out and oppress those with alternative opinions within the Bayesian sphere. And I agree that there are such people, and like most Loud Voices they tend to be underinformed (e.g., the current trendy view of Bayes factors as the only way to ever do inference).
  
  But I think Lakeland’s point is that, just like those early internet users, their views don’t really matter much outside their domain. The people who decide what gets funded/published are largely adherents to classical stats and have no interest in disputes among the Bayesian fringe.
  
  But like Richard says above, those people do not have the time/expertise to appreciate why new methods might be preferable to those they had been taught. So my impression is that the Loud Voices among the Bayesian crowd think they are helping by presenting an overly simplistic kind of cure-all that Those In Power might have an easier time understanding.
  
  Reply ↓
  - Daniel Lakeland on June 11, 2019 5:12 PM at 5:12 pm said:
    
    That’s a pretty good summary +1
    
    Reply ↓
    - Keith O'Rourke on June 12, 2019 7:37 AM at 7:37 am said:
      
      Yup, maybe even worth upgrading to a Todd talk https://www.youtube.com/watch?v=vvtp-dKfbco
      
      At around 1:30 replace the topic with Bayes Workflow and then the detractor pipes in with “what I think he means to say is Bayes factors always work and fully quantify all uncertainty!”
- Daniel Lakeland on June 11, 2019 5:45 PM at 5:45 pm said:
  
  I actually don’t think it’s the statisticians that wield the power, it’s the subject matter expert in various fields, and especially the ones that have built large successful careers off analysis of experiments and observational data using the predominant paradigms of the 1980s 1990s and 2000s, which is largely sampling theory based or unregularized regressions etc. stuff you can do easily in a canned way in Stata or SAS or similar
  
  Reply ↓
  - Sameera Daniels on June 11, 2019 6:50 PM at 6:50 pm said:
    
    Daniel,
    
    Yes subject matter experts wield power.
    
    Reply ↓
    - Keith O'Rourke on June 12, 2019 7:38 AM at 7:38 am said:
      
      That was Stephen Goodman’s take in one of his talks.
    - Sameera Daniels on June 12, 2019 3:10 PM at 3:10 pm said:
      
      Keith,
      
      By happenstance, I share some of the same viewpoints as Stephen Goodman. We don’t have the same background though. Temperament similar perhaps.
Z on June 11, 2019 12:58 PM at 12:58 pm said:

“It’s hard to argue against “model free inference procedures” that “guarantee unbiased estimates of causal effects” and etc. But it’s easy to argue that some specific structural assumption might be wrong and therefore the result of a Bayesian analysis might not hold…”

Lakeland, I really don’t think these methods need to be in competition. If somebody wants to estimate the effect of an intervention and they think they can adjust for confounding better than they can model the entire system mechanistically, shouldn’t they do that? And if you think they got the wrong answer but you can model the system mechanistically, you should try that. Or if there are other types of questions you want to try to answer with a complex mechanistic model that can’t be probed by standard causal inference methods, go for it! It’s true that I won’t believe that you specified your mechanistic model correctly or even well enough in most cases (just as I typically believe there’s confounding in causal analyses of observational data) but it’s always good to get different estimates of the same or related quantities that depend on different assumptions.

As someone who works in causal inference, I’ve often felt stymied by collaborators (usually doctors) who don’t get that it’s possible to estimate an effect without modeling its every mediating pathway. I’ve come to think of this mechanistic view as the sort of dominant intuitive paradigm that it’s difficult to disabuse laymen of. It’s funny to hear that my “camp” can also make people feel stymied and be perceived as the unthinking default that must be overcome. I think maybe we all mainly notice when when we run into resistance and come to think of the resistance viewpoint as dominant.

Reply ↓
- Anoneuoid on June 11, 2019 5:04 PM at 5:04 pm said:
  
  If somebody wants to estimate the effect of an intervention and they think they can adjust for confounding better than they can model the entire system mechanistically, shouldn’t they do that?
  
  I don’t believe this is possible. Unless you have what you believe to be an approximately correct model (includes all the relevant variables, etc) then your estimates are just arbitrary numbers. The predictions of such models could still be useful though.
  
  Eg, is the treatment effect 1.1751 or -0.3799?
  
  set.seed(12345)
  treatment = c(rep(1, 4), rep(0, 4))
  gender1 = rep(c(1, 0), 4)
  gender2 = rep(c(0, 1), 4)
  result = rnorm(8)
  
  summary(lm(result ~ treatment*gender1))
  summary(lm(result ~ treatment*gender2))
  
  Reply ↓
  - Anoneuoid on June 11, 2019 5:35 PM at 5:35 pm said:
    
    I haven’t looked at the series of posts on it closely, so maybe something like that is in there. But actually, if I gave a final exam on regression that might be the only question on it.
    
    Reply ↓
  - Z on June 12, 2019 2:44 PM at 2:44 pm said:
    
    “Unless you have what you believe to be an approximately correct model (includes all the relevant variables, etc)…”
    
    Right, for the simple case of a point exposure an approximately correct *predictive* model for treatment given confounders (those are the only “relevant” variables you need to include) will be sufficient. You could even use a double robust method where your effect estimate will be correct if either a *predictive* model for the outcome given treatment and confounders or a model for treatment given confounders is correct. A correctly specified mechanistic model is not needed. For example, to adjust for confounding by indication to estimate the comparative effectiveness of drug A compared to drug B, it would suffice to know all variables that doctors consider when deciding between A and B. You need not have a model for the mechanism of action or pharmacokinetic profiles of the drugs. I’m not saying it’s trivial to identify all (or at least all the important) confounders, but it’s often easier than arriving at an answer by modeling a full process (which is frequently an impossible task to do even approximately well).
    
    Reply ↓
    - Andrew on June 12, 2019 2:53 PM at 2:53 pm said:
      
      Z:
      
      Just because it brings up one of my pet peeves: the term “double robust” is relatively new, but it’s an old idea in causal inference, for example it came up in this 1990 paper of ours on estimating the incumbency advantage.
      
      My point here is not to claim priority on the concept but rather the opposite, to point out that the idea (if not the name) of double robustness was already so basic that it arose in this routine applied project. I guess it’s a good thing that, more recently, researchers came up with the phrase “double robustness,” as this motivated more careful theoretical study of this issue.
    - Z on June 12, 2019 9:54 PM at 9:54 pm said:
      
      Andrew, I looked at the paper but I wasn’t able to find a part that I could identify as double robustness. Are there two models such that if at least one of them is correct but not necessarily both you would get an unbiased estimate of incumbency advantage? I think I might be missing something just because the notation isn’t exactly what I’m used to.
    - Andrew on June 12, 2019 10:03 PM at 10:03 pm said:
      
      Z:
      
      Yes, the two models are: (1) the linearity of the vote share given incumbency, incumbent party, and previous vote share, and (2) random assignment of the treatment (incumbency or open seat). Neither model is perfect but each is not so unreasonable. Either model alone would be enough for the inferences to be valid. And this is was something I was thinking about when formulating and fitting the model.
    - Andrew on June 12, 2019 10:32 PM at 10:32 pm said:
      
      P.S. I remember thinking a lot about this double robustness thing when doing the research and writing that paper. I think the reason why we didn’t talk about it explicitly in the final article was that it was hard to say much about the topic in a formal way. The approximate accuracy of the linear model reduced our dependence on the assumption of random assignment, and the approximate accuracy of random assignment reduced our dependence on the assumption of linearity—that was clear, and we put a lot of effort into examining the approximate accuracy of the random assignment assumption, where that assumption failed, and how serious this failure would be for our estimates. The closest thing we have to an explicit discussion of double robustness is when we wrote: “incumbents do not base their decision of whether to seek reelection on their vote total in the previous election. This fortunately makes our results fairly insensitive to the assumption that the modeled relationship is linear.”
    - Z on June 13, 2019 10:13 AM at 10:13 am said:
      
      I get it now, cool!
    - Anoneuoid on June 12, 2019 3:01 PM at 3:01 pm said:
      
      For example, to adjust for confounding by indication to estimate the comparative effectiveness of drug A compared to drug B, it would suffice to know all variables that doctors consider when deciding between A and B
      
      I wholeheartedly disagree. The “drug effectiveness” is just an arbitrary number in this case that can change if you plug in other variables or change the structure of the model.
      
      Did you look at that simple simulation? Which is the true treatment effect, 1.1751 or -0.3799? Your scenario is no different, treatment vs control is the same thing as treatment1 vs treatment2.
    - Z on June 12, 2019 9:01 PM at 9:01 pm said:
      
      Ha, got it, this is a dead end.
    - Anoneuoid on June 13, 2019 1:13 AM at 1:13 am said:
      
      It’s a pretty simple question… but most people don’t want to answer it because that would mean acknowledging a huge problem exists.
    - Daniel Lakeland on June 13, 2019 9:39 AM at 9:39 am said:
      
      Your point is well taken: you have to understand how the model is coded to interpret coefficients. I think misunderstanding this situation is pretty common, but I do think you’ll find that predictions are invariant, so if you want to understand the treatment effect for say whatever gender is gender1=0
      
      m1 = (lm(result ~ treatment*gender1))
      m2 = (lm(result ~ treatment*gender2))
      
      predict(m1, newdata=data.frame(treatment=1,gender1=0)) – predict(m1,newdata=data.frame(treatment=0,gender1=0))
      1
      1.175054
      
      Interpreting coefficients is much harder when decisions are being made by the software under the hood. If you code a model in Stan by hand, it’s much easier to understand what the coefficients mean.
    - Z on June 13, 2019 10:44 AM at 10:44 am said:
      
      Daniel has illustrated your mistake nicely. I hope this experience will make you reconsider in the future assigning high confidence to the proposition that you’ve discovered a very simple flaw in the foundations of a mature field built by people who value mathematical rigor.
    - Anoneuoid on June 13, 2019 11:14 AM at 11:14 am said:
      
      Yes, predictions of such models are fine. The problem is only with interpreting the coefficients. So I don’t think my point was addressed at all. In fact I have discussed the same thing with daniel before and know he agrees with that.
    - Daniel Lakeland on June 13, 2019 11:34 AM at 11:34 am said:
      
      Anoneuoids point about interpreting coefficients does still stand, my impression is that a lot of “statistics education” in applied departments is really about how to use default software models in such a way as to have some basic idea of how to make the coefficient interpretable. The deeper problems of do we understand what we are doing and whether it’s logically valid at a deeper level is never a level you can get to in a semester or two.
    - Z on June 13, 2019 1:01 PM at 1:01 pm said:
      
      Oy, ok, here’s the full answer to your question, certain not to satisfy you but hopefully useful for anyone who’s followed along this far for some reason. If the predictions are the same under the two models, the estimate of the causal effect of treatment will be the same under the two models. Your mistake was in thinking that the estimate of the causal effect of treatment was the coefficient of treatment in the model. That is not the case. The estimate of the causal effect of treatment would be obtained from those models using the G-formula. The counterfactual expectation under treatment=1, for example, would be E[Y(1)] = E_x[E[Y|A=1,X=x]], where the inner conditional expectation is the prediction from *either* of your models and the outer expectation over X can be estimated by just taking the mean of the inner conditional expectations over your data. We then estimate the effect E[Y(1)]-E[Y(0)] by just taking the difference of the two estimates of the counterfactual expectations. Code for doing this in your example is below. (I used a large sample size so that you can see that not only is the estimate identical under both models, it’s also approaching the correct answer of 0 for this data generating process.)
      
      set.seed(12345)
      treatment = c(rep(1, 10000), rep(0, 10000))
      gender1 = rep(c(1, 0), 10000)
      gender2 = rep(c(0, 1), 10000)
      result = rnorm(20000)
      mod1 = lm(result~treatment*gender1)
      mod2 = lm(result~treatment*gender2)
      newdata1 = data.frame(cbind(treatment,gender1,gender2))
      newdata1$treatment = 1
      newdata0 = data.frame(cbind(treatment,gender1,gender2))
      newdata0$treatment = 0
      mean(predict(mod1,newdata=newdata1))-mean(predict(mod1,newdata=newdata0))
      > .001616497
      mean(predict(mod2,newdata=newdata1))-mean(predict(mod2,newdata=newdata0))
      > .001616497
      
      So you did provide an effective warning against interpreting regression coefficients wrongly by interpreting them wrongly yourself. Luckily, causal inference tells us how not to fall into traps like this.
    - Anoneuoid on June 13, 2019 2:08 PM at 2:08 pm said:
      
      Your mistake was in thinking that the estimate of the causal effect of treatment was the coefficient of treatment in the model. That is not the case.
      
      1) I didn’t interpret the coefficients at all, the question was how to interpret them. You calculated something different than everyone else and still didn’t provide an interpretation of the coefficients that everyone is looking at.
      
      2) I have never seen someone do a linear regression and then those extra steps ever. Can you give a real life example of someone doing that?
    - Z on June 13, 2019 2:36 PM at 2:36 pm said:
      
      >1) I didn’t interpret the coefficients at all, the question was how to interpret them. You calculated something different than everyone else and still didn’t provide an interpretation of the coefficients that everyone is looking at.
      
      You said “is the treatment effect 1.1751 or -0.3799?”. These numbers are the coefficients of treatment in the two regression models. You named these as the two options for what the “treatment effect” might be equal to. Forgive me for interpreting that as you interpreting the regression coefficient as a treatment effect… Also, the discussion was about the ability to estimate causal effects and you presented the difficulty of interpreting those regression coefficients as evidence that it was difficult to interpret causal effect estimates. If you didn’t think that those coefficients were what one would use to estimate effects what was the point of you whole example?
      
      >2) I have never seen someone do a linear regression and then those extra steps ever. Can you give a real life example of someone doing that?
      
      The regression plus those “extra steps” is called G-computation, in this case to estimate the effect of a simple point exposure treatment. It can also be used to estimate the effects of time-varying treatments. Here’s what appears to be a didactic tutorial: https://academic.oup.com/aje/article/173/7/731/104142. And here’s a “real life” application for a time-varying treatment, also somewhat didactic: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2786249/. (Since this debate was never about the merits of “how people tend to do causal inference without mechanistic models” but rather about how causal inference ought to be done, I don’t count it as a strike against non-mechanistic causal inference that there are admittedly regrettably few instances of practitioners using methods such as G-computation when they should. But everyone in the field of causal inference knows of it and wouldn’t make that regression mistake.)
    - MoreAnonymous on June 13, 2019 2:44 PM at 2:44 pm said:
      
      +1 to Z’s useful “full answer to your question” and the simple simulation that they provide in it.
      
      Anoneuoid and Daniel, having seen many of your comments on this site, I think both of you would actually find it joyful to learn about causal inference. Given the debates that go on (and on) between the different factions in statistics, it may seem that the causal inference literature would be unwelcoming to people with your mindset and perspective. Actually, however, a lot of the causal inference literature was written specifically for people like you — for people who are disturbed by unsettled issues in the underpinnings of usual statistical practice and who can’t just ‘go along’ like most others do. This is especially true of the DAG-based literature. You are the target audience. Seriously.
      
      Based on your prior comments, I’d recommend ‘Counterfactuals and causal inference’ by Morgan and Winship, which provides a nice overview. Or, if you want to read someone who is absolutely as irritated by problems in usual statsitical practice as you are, try Judea Pearl’s work. Not ‘The Book of Why’, but maybe ‘Causal Inference in Statistics’ or ‘Causal inference in statistics: An overview’
    - Andrew on June 13, 2019 3:43 PM at 3:43 pm said:
      
      To start on causal inference, I recommend chapters 9 and 10 of my book with Jennifer. To start on the literature of causal inference, I recommend these two articles of mine:
      Causality and statistical learning and
      Experimental reasoning in social science.
    - Z on June 13, 2019 4:08 PM at 4:08 pm said:
      
      I personally like in addition to those other references the Hernan and Robins book freely available online here: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
      
      And I second MoreAnonymous’s sentiment that causal inference is tailor made for rigorous thinking outsiders and regret taking a confrontational tone in this debate.
    - Nope on June 13, 2019 5:11 PM at 5:11 pm said:
      
      Jesus H.. Anoneuoid in here ripping on classical statistics and causal inference and he’s never heard of the potential outcomes framework. Wowza.
    - Daniel Lakeland on June 13, 2019 5:20 PM at 5:20 pm said:
      
      Z seems to miss the point entirely. Anoneuoids question was a trick question which begs the question of whether a coefficient has a proper causal interpretation on purpose. The correct answer is that all the outcomes come from the same RNG, there is no causal effect, and furthermore the interpretation of the coefficient depends on things that are implicit in the code that fits the model, and in assumptions about the process which are not directly present in the model.
      
      It’s not like Z doesn’t know this, or that Anoneuoid thinks there really is a causal effect, after all he designed the example specifically to have 0 causal effect… the problem here is complete lack of communication because both parties are making deep background assumptions … that are completely different
    - Daniel Lakeland on June 13, 2019 6:41 PM at 6:41 pm said:
      
      I’ll try to bridge the gap here as best I can from my phone while watching my kids play pingpong…
      
      Anoneuoids background and interests are biomedical. In biomedical sciences there is a strong tendency to do all the terrible things we read about here at this blog… straw man NHST, interpretation of regression coefficients as causal estimates without explicitly acknowledging the assumptions needed, models without any indication of external validity used to estimate medical treatment effects etc etc.
      
      Z and Nope and others come from an econ background. The hot trend in Econ is to go around looking for some pile of data about a problem and then apply an “identification strategy” that turns your classical statistics estimator into an estimator of a causal effect (it’s basically unheard of to use Bayes in Econ afaict). The basic assumption is that some external factors comes along and causes a change in something, and then some strategy is used to estimate the difference between what did happen and what would have happened, and this is the causal estimate. The econ literature is very proud to have figured out this basic thing and is stamping out estimates of everything they can think of… At the same time, it’s absolutely typical for people to claim mechanistic modeling is impossible as has been done here, while at the same time it is necessary to have identified all confounding variables and to have built accurate prediction models incorporating all of these confounders in order to get causal estimates. It therefore implicitly is the case that these practitioners believe that they understand the problem well enough to identify all relevant variables, while at the same time having exactly zero idea of how those variables work…
    - Nope on June 13, 2019 8:22 PM at 8:22 pm said:
      
      Thanks for bridging the gap Daniel, really took on a fair view of both sides there.
    - Z on June 13, 2019 9:59 PM at 9:59 pm said:
      
      Daniel,
      
      1) I’m almost certain there’s nothing to be learned about causal inference from Anoneuoid’s example. The funny ways regression coefficients can behave, sure, but not causal inference. He thought the example broke standard methods of causal inference (He says: “It’s a pretty simple question… but most people don’t want to answer it because that would mean acknowledging a huge problem exists”). I showed that actually, no, a huge problem does not exist. The standard approach of G-computation gives you the right answer that the effect is 0 with no ambiguity. So what’s left? Insight from pondering that causally meaningless regression coefficient under alternative gender codings like a Zen koan?
      2) I’m not from an econ background. I know nothing about econ and have trouble reading papers in econ jargon.
      3) I like Bayes. Bayes vs not Bayes is pretty much orthogonal to mechanistic vs not mechanistic, though I agree it’s typically more straightforward to fit complex generative models in a Bayesian framework.
      4) I gave the example earlier of only needing to know which variables a doctor considers when making treatment decisions to adjust for confounding and estimate the effects of treatment strategies. These are easy enough to identify by talking to doctors without understanding the pharmacokinetics or cellular level mechanisms of action of the treatments in question. I will not argue with you over whether it’s true that if you know these variables and have lots of data you can estimate causal effects, that’s just math. Yes, in practice, you don’t know all these variables or they’re not all measured, and you need to fit predictive models to small data sets that won’t be perfect and you won’t perfectly adjust for confounding. On the issue of whether you’re likelier to get close to a causal effect by adjusting for the confounders you can measure or trying to model human biology, we can agree to disagree.
      5) I think I need to stop coming back to this thread, I’ve spent way too much time on this today
    - Andrew on June 13, 2019 10:27 PM at 10:27 pm said:
      
      Z:
      
      Don’t feel bad. I spend lots of time on this blog too! But I think we do good by airing these discussions. At least, I would’ve loved to have such a resource to read when I was a student.
    - MoreAnonymous on June 13, 2019 10:38 PM at 10:38 pm said:
      
      Daniel, The impression that I get from your comments is that neither you nor Anoneuoid have an introductory-level understanding of the theory behind causal inference. Basic elements of this theory include average treatment effects (ATEs), potential outcomes, and conditional independencies in causal graphs. Morgan and Winship is a good introductory book for self-learning.
      
      I truly hope you try to learn causal inference because I think you would like it a lot. Many of your comments center on the question, “OK, so that’s a regression coefficient. But what does it MEAN?” You invoke this question as a challenge, as though it were imprudently ignored by the theory of causal inference. But its the opposite. Your question is at the heart of what causal inference seeks to answer. Your question is the same kind of question that some of the founders of causal inference started by asking! To me, the answers that causal inference provides to these questions are deep, beautiful, and yes…subject to important limitations.
      
      Another way to look at this is the following: So you have a mechanistic model. You think it is close to reality. For any variable in your model, x, there is a mechanistic function f(u) that determines the value of x from the values of some variables u. Those variables u might include variables other than x, lagged values of x itself, and noise terms. The DAG-based causal inference literature is a principled study of what we can learn about the parameters of the mechanistic model if we don’t have access to a lot of the information we’d like. For example, maybe we don’t know the functional form of any of the mechanistic functions, or we only know the functional forms of some of them, or we are not quite sure about the variables that do and do not contribute to u, or we don’t have any data on some of the variables in the model (hidden variables), or for some of the variables our data are limited to one level of the variable, but we would like to generalize our results to all levels (for example, maybe we only have data on hospital patients, but we want to estimate some parameter that applies to both hospitalized patients and members of the general population). Causal inference seeks to tell us what we can and cannot discover, given the major limitations that affect the information we have access to and the assumptions about our mechanistic model that we are or are not willing to make. If causal inference tells us that some facet of the model is indeed estimable, then the process of estimating it sometimes boils down to a regression coefficient from a regression model that has been set up in just the right way. Causal inference tells us how to set up the regression and how to interpret the resulting coefficient.
      
      PS. I agree with all of Z’s points 1-5 above. And I work in biomed, not econ. Z… thanks for your time investment, I too am stopping.
    - Daniel Lakeland on June 13, 2019 10:40 PM at 10:40 pm said:
      
      Z sorry for mischaracterizing your background… nevertheless you are from a background where calculating G-computations is standard… Anoneuoid is from a background where that is *literally unheard of*. I’m at Cold Spring Harbor these days, where my wife is teaching mouse biology methods and I’m sailing and fishing with my kids. The grad students at lunch today sitting behind me got to talking about replicability and the P value article in Nature… the discussions were all about how they decided to divide 0.05 by the number of tests they ran to “improve replicability” and how they consulted with statisticians about their data and the statisticians told them to run some “Type II ANOVA” and they couldnt figure out what that was so they pressed some buttons and generated some numbers and sent it off for review and hoped no one complained… one student said he’d never replicated anything in his life and didn’t plan to start. another said that they had a friend who did their PhD work on a thing that seemed promising but later when they tried to replicate the basic initial findings none of it turned out to be consistent. Some said how they mostly just go by whether they can detect a difference by eye and figure that’s more accurate than all the statistics anyway.
      
      so what’s “standard” to you is like a million miles from what’s standard to Anoneuoid.
      
      Anoneuoids point was that it’s standard in fields he’s familiar with to generate some kind of experimental data, run a regression and interpret a statistically significant coefficient as evidence for a certain sized *causal* effect. But few biologists are even aware of the pitfalls of how the coefficient can vary with seemingly irrelevant information like whether males are coded 1 and females 0 vs females 1 and males 0…
      
      In a field where this basic step isn’t fully and broadly understood I’m sure you can see that there would potentially be whole labs where all the research from the past decade might be called into question once the existence of this problem is unearthed.
      
      the fact that people like you know what the problems are and how to address them isn’t a reason for thinking that they are widely done correctly in all fields or that there isn’t a major disaster looming when it’s discovered that the actual causal effect of a particular mutation on irritable bowel disease is not only different in magnitude from the one everyone has based the last decade of their life on but also the opposite sign!
    - Daniel Lakeland on June 13, 2019 11:20 PM at 11:20 pm said:
      
      MoreAnon, it’s easy for all of us to misunderstand the others background. it’s also clear you’ve misunderstood what is more of a Socratic method of asking questions to try to elicit answers rather than because I don’t know the answers.
      
      In actual fact I have read some causal inference literature, and do know the basics of what this literature teaches, and for example I have Andrews book on regression and causal inference, and have read both papers he linked above, and I’ve even debated Pearl on this blog (he seemed to be incapable of accepting the idea that Bayesian probability is not an observable quantity and does not have to converge to long run frequency). The basic fact is that I don’t routinely need those tools. I don’t build models as DAGs usually. If I have uncertain functional forms I provide informative priors, and do Bayesian calculations, if the posterior doesn’t concentrate then I look for alternative kinds of data to help inform the model.
      
      If you’d like to see work I’ve done recently you can look at this in biology:
      
      https://elifesciences.org/articles/29144
      
      The agent based model involved is capable of reproducing the observed behavior of growing rib cells under a variety of combined knockout genotypes, the parameters involved are so well separated by the by eye comparison to observational data that no fancy fitting methods are really needed. Identifying and estimating a precise number for any given parameter is largely irrelevant. The model predicts that we will observe differences in particular measurements. Subsequently we measured and indeed the measurements show differences of the kind predicted, that can be identified with Bayesian estimation procedures, and correspond to the estimates we used. In the end noone cares about the particular numbers, what matters is the consistent patterns of behavior. The purpose of modeling is insight not numbers, to paraphrase Hamming.
      
      Building that quantitative model allowed biologists to understand a decade of data collection and analysis and put to bed their confusion over why things happened the way they did because it altered their basic way of thinking about the problem and allowed them to “play with” the predictions using an intuitively understandable formulation.
      
      In the future I hope to develop more formal computational methods for fitting this type of model, but at it’s core it’s the description of the process that matters far more than the particular numbers being plugged in. Post model, a process that seemed confusing and unintuitive to biologists became intuitive and easily explained…
      
      I believe the same kind of thing can be done relatively broadly in biology, econ, sociology, demographics, whatever. The key thing is to try in the first place. Today, some people are trying, but they are basically the dissenters.
    - Anoneuoid on June 14, 2019 7:03 PM at 7:03 pm said:
      
      And here’s a “real life” application for a time-varying treatment, also somewhat didactic: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2786249/. (Since this debate was never about the merits of “how people tend to do causal inference without mechanistic models” but rather about how causal inference ought to be done, I don’t count it as a strike against non-mechanistic causal inference that there are admittedly regrettably few instances of practitioners using methods such as G-computation when they should. But everyone in the field of causal inference knows of it and wouldn’t make that regression mistake.)
      
      Yes, as I thought. No one is doing this stuff you claim is right. Everyone has been misinterpreting their regression coefficients for many years. And the issue with interactions is just one minor one. The bigger one that the model needs to be correctly specified, eg including all the correct variables and no incorrect ones.
    - Daniel Lakeland on June 14, 2019 11:33 PM at 11:33 pm said:
      
      Anoneuoid, yes in many areas of science people fit regressions and treat the coefficients as if they were causal estimates. And Nope and Z and others are correct that this has been shown to be wrong and the right method is to compare predictions in two different scenarios under some restricted models and data collection.
      
      But it goes deeper, as you point out. To get causal inference you need to have identified all the variables that could have caused the difference, and have measurements of them, and not have any extra variables that could be statistically dependent but non causal. Then you have to specify a model capable of predicting accurately, even though typically you see assumptions that mechanistic structure is impossible to correctly specify. So basically it’s ok to get some very restricted model applicable to just your dataset, like a linear regression because the range of variation is small, a sort of Taylor series for example…
      
      The end result is basically a situation where you have a number and a hammer to wield against the “correlation is not causation” mantra, but typically zero ability to interrogate the model for how causation might play out in alternative scenarios. In particular suppose there is another important causal variable, but it was constant for the data you have… then the estimate of the number for the dataset could be realistic while predictions for other places, times, cultures, industries, species, etc fail to replicate because these other non constant variables now come into play.
    - Anoneuoid on June 15, 2019 6:42 AM at 6:42 am said:
      
      Yea, it’s like Deming said. f you want to draw a conclusion about the well defined population you sampled from (enumerative study), that is one thing.
      
      But it is a mistake to then extrapolate those conclusions to other circumstances (analytic study) without domain knowledge, ideally in the form of a quantitative mechanistic model. No amount of slapping more math onto the same information is going to help you there.
      
      And asking doctors what variables they think are important is not going to get you there either. For one thing the variables they are taught are important are the ones statisticians have been plugging into their regressions. And those have been chosen largely because they are convenient and available (gender, race, age, etc).
      
      The stats guys think the MDs have way more understanding of the human body than the actual extremely rudimentary one we have in reality… Then the MDs think the stats guys have a mathemagical power to extract conclusions from the very limited info they’ve been provided.
      
      And anyway, I never care about an average effect… I care about what will happen if I personally receive a treatment. How well does that RCT average effect correspond to what we should expect if grandma with 4 comorbidities who is already on a dozen pills gets put on a new drug?
    - Daniel Lakeland on June 15, 2019 7:20 AM at 7:20 am said:
      
      Right, and though I think biology is MORE complicated by interactions than social science, it is at least more experimental. We can build mechanistic models in biology, and it’s successful. Can we do it in social sci, I’m sure the answer is yes, but many social scientists give up from the start, I think it’s actually taught as a mantra in early undergrad classes…
      
      on the other hand, in social sciences you may often only care about average effects. for example if you change the tax code certainly some people will benefit some will lose out, but we can’t have a different tax code entirely for each citizen… yet you might want to know if reducing income tax and then adding a sales tax would in total increase or decrease homelessness… and if you get an estimate for say Utah you might figure that maybe Colorado would be at least similar or something… so I do think the paradigm of reduced form causal inference in social sciences has its place, I just strongly disagree with the idea that it should be the dominant activity
    - Chris Wilson on June 15, 2019 9:33 AM at 9:33 am said:
      
      Good discussion all around. I just want to echo the comment about the unproductive interplay btw limited domain understanding and mathemagicl statz. The following will sound a bit arrogant but oh well. I remember a med student on clinical trying to bro-sprain to me how the Framingham risk score works. It was an appalling mess of an explanation! I also had to remind him that BMI doesn’t work so well for risk prediction in an athletic population- I have 98th percentile VO2max and strength from rowing and lifting yet BMI puts me as ‘overweight’ lol. Anyway, he was young and learning- but I have encountered subtler variants of these problems in many physicians. Point is, tons of research and practice in biology is a mess, from a causal insight/process-based model point of view…
    - Anoneuoid on June 15, 2019 1:11 PM at 1:11 pm said:
      
      Looking at that g-formula paper I see these are the variables they considered:
      
      Non-modifiable
      Age
      Period/Calendar year
      Parental history of myocardial infarction
      Smoking prior to 1980
      Oral contraceptive use prior to 1980
      BMI at age 18 years
      Baseline smoking
      Baseline physical activity
      Baseline diet score
      Baseline alcohol
      Baseline BMI
      Directly modifiable
      Multivitamins
      Aspirin
      Statins
      Post-menopausal hormones
      Smoking
      Physical activity
      Diet score
      Alcohol
      Indirectly modifiable
      BMI
      High blood pressure
      High cholesterol
      Diabetes (confirmed)
      Angina
      Stroke (confirmed)
      CABG
      Cancer
      Menopause
      Osteoporosis
      
      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2786249/
      
      Then from that they determine “the 20-year risk of CHD” [coronary heart disease] is, eg, 18% lower if you quit smoking.
      
      So what happens to that value if you include how wealthy they are/become? I see the Nurses health study does not ask about that: https://www.nurseshealthstudy.org/participants/questionnaires
    - Daniel Lakeland on June 15, 2019 2:52 PM at 2:52 pm said:
      
      You don’t get something for nothing. the calculations will give you average effects across the population your data represents. If you want a conditional estimate you’d have to collect that data and fit a model. If you then want an average estimate across say a range of pre and post income groups you need to work that out separately…
      
      so if you want the kind of stuff that you and I usually seem to want you wind up needing to do the effort to model mechanisms.
    - Anoneuoid on June 16, 2019 2:33 PM at 2:33 pm said:
      
      You don’t get something for nothing. the calculations will give you average effects across the population your data represents. If you want a conditional estimate you’d have to collect that data and fit a model. If you then want an average estimate across say a range of pre and post income groups you need to work that out separately…
      
      so if you want the kind of stuff that you and I usually seem to want you wind up needing to do the effort to model mechanisms.
      
      I didn’t mean anything so deep. Just add in another variable that is correlated with smoking and CHD and now that coefficient for smoking (or g-stat, or whatever) will be different than 18%. They don’t have that variable so just ignore it.
      
      That is why I say these “effect sizes” are arbitrary.
    - Daniel Lakeland on June 16, 2019 10:13 PM at 10:13 pm said:
      
      Well the causal inference world doesn’t accept the idea that you can just throw variables into a regression and get causal inference AFAIK. The basic definition of causal effect is the difference btw what happened and what would have happened if one of the variables had been actively changed. So you need a method for producing counterfactual estimates of individual cases under the changed condition. the causal effect estimate is model dependent, just as Bayesian probabilities are, so you need some theory in which what is acceptable as a causal variable is limited.
      
      But this level of discussion is rare in many fields and lots of people throw things into regressions and claim causality
    - Daniel Lakeland on June 16, 2019 10:26 PM at 10:26 pm said:
      
      https://www.stat.columbia.edu/~gelman/arm/chap9.pdf is a good discussion of the basic background
      
      The unifying idea is basically missing data / counterfactual estimation. This is easiest when you have a mechanistic model but can be done with non mechanistic regression models. It’s not always the case that the effect size is equal to a coefficient. The general case is just the difference in predictions for individuals given a model with some theory that is accepted by the user.
AV on June 11, 2019 1:01 PM at 1:01 pm said:

Has it become harder to argue coincidence versus causality?

Reply ↓
Anonymous on June 11, 2019 3:19 PM at 3:19 pm said:

Quote from above: “So from a political perspective, I see Classical Stats as it’s applied in many areas as a way to try to wield power to crush dissent.”

Just like the next statistical perspective, and/or application, and/or method will do when it somehow “gets in office”?

If so, perhaps it’s then (at least partly) a case of “same sh#t, different day”, and/or “the king is dead, long live the king!”.

All the while, perhaps it’s nobody’s fault that decisions get made based on possibly flawed analyses and/or methods…

Here are “Brothers Osborne” with “It ain’t my fault” with a possibly illustrative video concerning the (gist of) above:

https://www.youtube.com/watch?v=E5RDEXpc8OY

Reply ↓
- Anonymous on June 11, 2019 4:50 PM at 4:50 pm said:
  
  Quote from above: “All the while, perhaps it’s nobody’s fault that decisions get made based on possibly flawed analyses and/or methods…”
  
  I think a substantial part of me not wanting to become a researcher is that i think i “don’t get” statistics, and i don’t want to depend on others (too much) to possibly perform the statistical analyses for me.
  
  Statistics is mostly just incomprehensible to me: i think it just doesn’t suit my brain and/or way of thinking.
  
  I can clearly remember me asking a “statistical expert” to help with, and check, my analyses for my 1st paper, because i didn’t feel comfortable deciding whether i did the “correct” things.
  
  I can also clearly remember me saying to my professor that i wasn’t comfortable performing some princple component analyses (or someting like that) for a certain project, because i just didn’t feel i knew what i was truly doing.
  
  At certain points in time i have felt that i made the “right” decision concerning science-related stuff, and i think the above things might be examples of that.
  
  It’s also interesting to think about in some way i think. For instance:
  
  # Should researchers engage in things that they really don’t fully understand, like some statistical analysis?
  # Is a researchers even capable of deciding whether they truly understand things?
  # If a researcher thinks he/she understand (or doesn’t) things, is that really the case?
  # Is depending on the so-called “expertise” of others truly a possible “solution” in instances where a researcher doesn’t fully understand, and/or is capable of doing, certain things?
  # Is it scientifically “responsible” for a researcher to “trust” a so-called “expert” when that researcher is not able to truly determine whether the “expert” is even an expert, and/or whether the “expert” says and does things that are “correct”?
  
  Reply ↓
  - Anoneuoid on June 11, 2019 5:12 PM at 5:12 pm said:
    
    # Should researchers engage in things that they really don’t fully understand, like some statistical analysis?
    # Is a researchers even capable of deciding whether they truly understand things?
    # If a researcher thinks he/she understand (or doesn’t) things, is that really the case?
    # Is depending on the so-called “expertise” of others truly a possible “solution” in instances where a researcher doesn’t fully understand, and/or is capable of doing, certain things?
    # Is it scientifically “responsible” for a researcher to “trust” a so-called “expert” when that researcher is not able to truly determine whether the “expert” is even an expert, and/or whether the “expert” says and does things that are “correct”?
    
    All you need to care about is:
    
    1) Are the results reproducible from independent groups (ie, are similar enough for all practical purposes)?
    2) Do the predictions of the model fit data collected after the model was developed?
    
    The rest is irrelevant. Use haruspex to come up with your theories for all I care.
    
    Reply ↓
    - Martha (Smith) on June 11, 2019 5:36 PM at 5:36 pm said:
      
      To save others some typing: Haruspex: a diviner in ancient Rome basing his predictions on inspection of the entrails of sacrificial animals.
    - Anonymous on June 11, 2019 5:41 PM at 5:41 pm said:
      
      “Use haruspex to come up with your theories for all I care”
      
      I did not know the word “haruspex”, so i looked it up. A definition i found reads as follows: “(in ancient Rome) a religious official who interpreted omens by inspecting the entrails of sacrificial animals”
      
      After readin your comment, i was reminded of a recent comment i made on this blog about “imposter syndrome”. I think i commented something like how i never understood this term, and wondered whether having an “imposter syndrome” might be largely related to the characteristics of the actual job, or activity, or whatever that one is feeling an “imposter” about. It seemed to me that this “imposter syndrome” might be largely felt by people with jobs, activities, etc. that are hard to “objectively” evaluate. I subsequently wondered if there were many carpenters, tennis players, or blacksmiths that ever had an “imposter syndrome”.
      
      I think the term “expert” might be related to that reasoning, which may be why i also don’t really understand, and/or agree with, the term “expert” as well. I think i like it when there is some “objective” way to determine whether someone is “good” at something. In those cases, i reason there is no need for the word “expert”. In (most?) other cases, the useage of the word “expert” seems vague, and possible misleading, and possibly incorrect to me.
      
      Perhaps (at least for me) listening to a “statistical expert” is kind of like listening to a “haruspex”. And performing statistical analyses is kind of like inspecting the entrails of sacrificed animals.
      
      I think i’d rather stick to things i (at least think i) understand better…
    - Anoneuoid on June 11, 2019 6:06 PM at 6:06 pm said:
      
      If you are doing things right there is no reason your work should be difficult to objectively evaluate though.
      
      1) Can you describe what you did well enough for others to replicate it?
      2) Can you make accurate predictions about future data?
      
      What is hard to objectively evaluate about that? The only difficulty is for people who *do not* want to do those things but still be treated as if they are doing science.
    - Anonymous on June 11, 2019 6:49 PM at 6:49 pm said:
      
      “What is hard to objectively evaluate about that?”
      
      Perhaps you are describing (a vision of) the scientific process, or science, in a general manner. I was thinking about a specific part of science that is the topic of the blogpost: statistics (and statistical “experts”).
      
      I, for instance, can not “objectively” evaluate possible statistical expertise when scientists propose, or discuss, a certain statistical analysis, or method, for instance. I can also not understand most statistical analyses, or even the discussions about them, so i also can’t objectively evaluate these things.
      
      I note that this may not be the case with other “experts”, like a car mechanic who fixes my broken car, or a top tennis player. I can at least in some way “objectively” see that 1) my car is now working again, and that 2) Rafael Nadal won “Roland Garros” for the 12th time. Even though i may not understand how the motor of the car works, or how to exactly hit a topspin ball, i can see the results of people who can, and are good at it. I don’t think this is the case concerning statistical “experts”, and statistics.
      
      (As a side note, i have wondered whether “making accurate predictions about future data” could be a criteria for evaluating scientists. This is because i reason being able to make accurate predictions about future data might be directly related to (the strength of) theory evaluation, and formulation, which i think could be crucial in science.
      
      I also reason (strong) theory evaluation, and formulation, might in turn directly relate to the scientists capabilities concerning (logical) reasoning, and perhaps other types of thinking (e.g. concerning the design of an experiment), which might be crucial in science as well.
      
      I note that i also think that evaluating scientists this way can easily lead to folks wanting to be “correct” in predicting things which could lead to all kinds of possible negative consequences. I also note that making “inaccurate” predictions about future data may be an important part of science as well, and may not be “bad”. And finally, i note that i can’t think of examples where scientists are, or have been, “objectively evaluated” concerning whether or not they could make accurate predictions about future data…)
    - anoneuoid on June 11, 2019 7:02 PM at 7:02 pm said:
      
      I note that this may not be the case with other “experts”, like a car mechanic who fixes my broken car, or a top tennis player. I can at least in some way “objectively” see that 1) my car is now working again, and that 2) Rafael Nadal won “Roland Garros” for the 12th time. Even though i may not understand how the motor of the car works, or how to exactly hit a topspin ball, i can see the results of people who can, and are good at it. I don’t think this is the case concerning statistical “experts”, and statistics.
      
      Yes, performing engineering feats is a great way to convince people you know what you are talking about. This goes back to the time of Archimedes at least: https://www.hellenicaworld.com/Greece/Technology/en/Syracusia.html
      
      I note that i also think that evaluating scientists this way can easily lead to folks wanting to be “correct” in predicting things which could lead to all kinds of possible negative consequences.
      
      Like what?
      
      And finally, i note that i can’t think of examples where scientists are, or have been, “objectively evaluated” concerning whether or not they could make accurate predictions about future data…)
      
      There are many examples of this. The most famous is probably Einstein and the apparent position of stars during an eclipse. In some areas (eg properly used machine learning) it is literally an every day phenomenon.
    - Anonymous on June 11, 2019 7:10 PM at 7:10 pm said:
      
      “Like what?”
      
      Ehm, perhaps only sticking too well understood theories and phenomena to “accurately predict future data” (e.g. predicting an apple will fall due to gravity, then predicting a pear will fall due to gravity, then predicting a banana will fall due to gravity, etc.)
      
      Or, using all kinds of questionable research practices to find, and/or publish, only “statistically significant” findings that you “predicited” from the start (like has possibly (probably?) happened a lot in the past decades).
    - Anoneuoid on June 11, 2019 7:24 PM at 7:24 pm said:
      
      “Like what?”
      
      Ehm, perhaps only sticking too well understood theories and phenomena to “accurately predict future data” (e.g. predicting an apple will fall due to gravity, then predicting a pear will fall due to gravity, then predicting a banana will fall due to gravity, etc.)
      
      Or, using all kinds of questionable research practices to find, and/or publish, only “statistically significant” findings that you “predicited” from the start (like has possibly (probably?) happened a lot in the past decades).
      
      Yes, sorry. Obviously these predictions need to meaningfully different from those derived from other explanations…
    - Anonymous on June 11, 2019 7:41 PM at 7:41 pm said:
      
      “There are many examples of this. The most famous is probably Einstein and the apparent position of stars during an eclipse. In some areas (eg properly used machine learning) it is literally an every day phenomenon.”
      
      I was thinking about “evaluation” in the (short-term?) context of getting hired, getting a promotion, receiving tenure, etc.
      
      My comment above replying to your question “like what?” also reflects that.
      
      I have never heard of researchers being evaluated when they apply for tenure, or a promotion, concerning whether or not they were able to accurately predict future data in their papers. Except if one interprets counting the number of published papers with probably mostly only “statistically significant” findings (due to the file-drawer effect) as an evaluation of the researchers’s ability to accurately predict future data.
    - Anoneuoid on June 11, 2019 8:19 PM at 8:19 pm said:
      
      I was thinking about “evaluation” in the (short-term?) context of getting hired, getting a promotion, receiving tenure, etc.
      
      My comment above replying to your question “like what?” also reflects that.
      
      I have never heard of researchers being evaluated when they apply for tenure, or a promotion, concerning whether or not they were able to accurately predict future data in their papers. Except if one interprets counting the number of published papers with probably mostly only “statistically significant” findings (due to the file-drawer effect) as an evaluation of the researchers’s ability to accurately predict future data.
      
      Afaict, the primary purpose of modern academia seems to be acting as a jobs program where the offspring of upper middle class people will accept much lesser pay during their most productive years as a way to offset inflation. So, I wouldn’t take any current practices as representative of science.
    - Sameera Daniels on June 11, 2019 8:36 PM at 8:36 pm said:
      
      Re: Afaict, the primary purpose of modern academia seems to be acting as a jobs program where the offspring of upper-middle-class people will accept much lesser pay during their most productive years as a way to offset inflation. So, I wouldn’t take any current practices as representative of science.
      —-
      
      Not sure that has been the primary purpose of modern academia. There are, though, questions for a century or more as to what is representative of science. Certainly in the 60s queries surfaced.
      
      However, when is with researchers, most strike me as with affluent suburban backgrounds. At the World Bank, that has been my experience. The development fields. In fact, David Kennedy wrote a rather compelling account of his experiences in Dark Side of Virtue.
    - Anoneuoid on June 12, 2019 8:19 AM at 8:19 am said:
      
      Not sure that has been the primary purpose of modern academia.
      
      From the perspective of the people who control the purse strings (primarily the US Congress) I believe it is.
    - Keith O'Rourke on June 12, 2019 7:57 AM at 7:57 am said:
      
      You are raising very thoughtful concerns.
      
      Anoneuiod is pointing to the final evaluation of some work – can others replicate it and does it do something new.
      
      But you have to find a good route to get to such work – which is what I think you are asking about. Many statistical “experts” you might choose to work with will very unlikely help you get to such work. Unfortunately being very knowledgeable about your field of study may not help you make a good choice on which statistician to listen to or work with.
      
      Some considerations in this regard were raised in this past post https://statmodeling.stat.columbia.edu/2018/01/23/better-enable-others-avoid-misled-trying-learn-observations-promise-not-transparent-open-sincere-honest/
      
      Now, I have heard in Europe those with adequate funding hire 3 different statisticians to analyse their data. Getting multiple opinions is a good idea, but apparently what these folks do is take the analysis with the most publishable results and hide the other two away from public view. That likely will fail Anoneuoid’s final test.
    - Sameera DanielsI on June 12, 2019 3:16 PM at 3:16 pm said:
      
      Very interesting article that I’ll read when I return this evening. Thanks Keith.
    - Nope on June 11, 2019 10:18 PM at 10:18 pm said:
      
      What if it’s not about fitting the data man? I have a model that predicts people in a hospital will be sicker than those not in a hospital. Fits data in sample and out of sample great. Have I learned anything about the causal effect of health care? Nope. Fit ain’t everything, a fact that seems to be lost on Bayesian types.
    - Andrew on June 11, 2019 10:21 PM at 10:21 pm said:
      
      Nope:
      
      OK, now this has devolved into trash talking. Prediction can be valuable for its own sake, also it can be useful to see where prediction fails, as that can motivate model improvements. We call that posterior predictive checking. Prediction only tells you about causality, or science, or whatever, if that’s in the model being fit. That’s the case whatever method of inference is being used, Bayesian or otherwise.
    - DC on June 12, 2019 9:46 AM at 9:46 am said:
      
      But Andrew, @Nope would rather just assume the prediction holds (aka, ‘boring’ stuff like actually making sure you can model hospitals, illness, local population dynamics, etc), and then just skip right to explaining the sexy ‘causal effect of health care’. After all, all the ‘real science’ published in PPNAS and elsewhere is done this way… but really, how can one study the causal effect of something when they haven’t taken steps to model the full process adequately?
    - Daniel Lakeland on June 12, 2019 10:08 AM at 10:08 am said:
      
      +1, though I don’t think you have to model the “full process”, even making an effort to include a couple of the most important factors would be a start over running some synthetic control procedure in parallel on 12 cores and calling it a day when the sign and magnitude of the results comes out the way you want it to, thereby “proving” that hiring more janitors at hospitals reduces hospital born infections by 38% or whatever.
    - Nope on June 12, 2019 11:04 AM at 11:04 am said:
      
      DC, Andrew, Daniel.. does the phrase “exogenous variation” mean anything to you? I’m failing to see how modelling the mechanism gets you around the fundamental endogeneity problems that plague most causal inference questions. Also still waiting on those successful structural modelling applications in social science…
    - Andrew on June 12, 2019 2:14 PM at 2:14 pm said:
      
      Nope:
      
      I don’t see what your objection is to what I wrote above. You write: “I’m failing to see how modelling the mechanism gets you around the fundamental endogeneity problems that plague most causal inference questions.” But I wrote above: “Prediction only tells you about causality, or science, or whatever, if that’s in the model being fit.”
      
      Regarding you “still waiting on those successful structural modelling applications in social science,” I wrote elsewhere in this thread: “I take Hmmm’s point that there are no actual success stories of structural modeling in the social sciences, or at least not a lot.”
      
      So I think you’re not recognizing the subtleties in my position. As sometimes happening, you’re arguing against a straw man, not realizing that we’re in agreement on these issues.
    - Daniel Lakeland on June 12, 2019 5:22 PM at 5:22 pm said:
      
      Andrew, others. It’s certainly not going to be my area of expertise to provide a review of successes in mechanistic modeling within social sciences. But I think if you are going to find any such successes you first need to define what it means to succeed, and second should look for people doing agent based modeling because the technique is well suited to exploring mechanism.
      
      For me, a mechanistic model succeeds if it provides insight into the dynamic way that social systems work, and is able to reproduce qualitative and quantitative observed behaviors for a range of parameters and in a variety of conditions in which behavior varies in the real world.
      
      Googling a bit, I found the following very surface leads for those interested in chasing down more:
      
      RAND has done a bunch of work on social behavioral learning that they claim to successfully describe vaccination adoption, breast cancer screening, and a few other areas of application
      
      https://www.rand.org/content/dam/rand/pubs/research_reports/RR1700/RR1768/RAND_RR1768.pdf
      
      Here’s someone investigating flood risk and insurance using agent based approaches that make quantitative and qualitative predictions for how changing parameters will affect future flood insurance payouts
      
      https://jasss.soc.surrey.ac.uk/20/1/6.html
      
      Here’s a model in development for how social decision making affects traffic and congestion in urban areas
      
      https://www.sciencedirect.com/science/article/pii/S2352146515002677
      
      Here’s a lit review on ABMs in land use studies
      
      https://www.sciencedirect.com/science/article/pii/S2095263512000167
      
      here is some research on financial markets and boom and bust cycles, including income inequality and financial systems control such as Federal reserve type monetary policy
      
      https://ideas.repec.org/p/fce/doctra/1527.html
      
      I’m just googling and reading abstracts… but it hardly seems to me that this area is either nonexistent, or a complete failure.
    - Nope on June 12, 2019 6:46 PM at 6:46 pm said:
      
      ^Andrew. I suppose my general objection is that you seem to paint a negative view of “reduced form” work. It is my opinion that at the present moment in social science, these are the only viable methods at researchers’ disposal to do quality causal inference work. Of course, these methods can be abused and misused, as the psychological literature has shown. The economics literature would have a lot of issues exposed as well if it were subjected to a mass replication exercise. But that being said, there is really high quality reduced form work that has been done.
      
      I suppose I just can’t relate to some of the commenters who say a DiD estimate is meaningless. If there is a “soft model” to go along with it, reasonable robustness checks, and very believable variation being used in the data, I don’t see where the issue is. It’s pretty easy for Lakeland et al. to just say this work is crap and that you need to “model the mechanism”, but I don’t see where this gets you. And further, because he can’t point to any examples where this has actually been effective outside of physics, it’s hard for me to have faith in what he is saying.
    - Daniel Lakeland on June 12, 2019 9:28 PM at 9:28 pm said:
      
      Nope: to do quality causal inference work..
      
      What does this mean to you? I mean, give an example of quality reduced form work. If like to see what that means to you. I suspect that it will involve a point estimate of some quantity at some point in time… Like the effect of policy A on the rate of consumption of thing X at time T in country C was 1.2 units…
      
      To me this is like celebrating that you successfully measured the rate of change of the water level in reservoir R at time T as 1.2 feet per week… Yay
    - Nope on June 13, 2019 11:49 AM at 11:49 am said:
      
      Haha wow Daniel, you are really showing how clueless you are with respect to social science research. Reduced form work in economics still generally has a model, especially in current work. Your concern is just internal vs. external validity.. yes this is well known and is obviously the main tradeoff that is made between reduced form and structural work. My issue with structural work is that while, in theory it gives you perfect external validity (i.e. you have the correct model of the world!) I think the internal validity of the estimates is very low.
      
      Again, this approach would work well in physics, where there are actually consistent laws that govern the physical world… but no such universal laws really exist in social science. Of course, there is supply and demand which is somewhat of a law, but there aren’t laws that give precise (and correct) predictions across every setting with humans interacting. In any case, I suggest you actually read some good economics empirical work. It’s true that if you read this blog you will have a very negative view of social science research, because Andrew tends to attack the low-hanging fruit of social psychology research, and perhaps rightly so as it is terrible. But economics is a different animal altogether, with its own set of issues.
      
      Here is a great recent labour paper that encapsulates a lot of what current labour research is about:
      
      (mix of structure and reduced form):
      Isaac Sorkin: https://f247968a-a-62cb3a1a-s-sites.googlegroups.com/site/isaacsorkin/papers/sorkin_revealedpreference.pdf?
      
      Look at anything Josh Angrist or Alberto Abadie have done. It’s amazing to me you lump in the careful empirical work they do with p-hacked social psychology research. It’s unbelievable. Please educate yourself a bit before talking about things on which you don’t have a clue!
    - Daniel Lakeland on June 13, 2019 1:39 PM at 1:39 pm said:
      
      Nope: there’s no need to be rude here. I’ve never claimed to be an expert in Economics, I’m an interested bystander who is generally unimpressed with the specific quantitative works I’ve read. I have a number of particular interests that I occasionally read things about, I don’t generally read economics literature broadly. My interests include:
      
      1) Infrastructure development and investment
      2) Healthcare effectiveness
      3) US welfare systems and poverty traps
      4) US monetary policy, business cycles, tax policy, and welfare policy and its effect on wealth inequality income among the bulk of citizens and overall economic productivity.
      5) Measurement of productivity, typical levels of family welfare, and poverty.
      6) Copyright, patent, trademark, monopoly, oligopoly, collusion, technology, rent seeking, and competition.
      7) Education effectiveness and productivity
      8) Environmental issues
      
      Here’s what I typically find, and certainly there are reasonable alternatives, some people do other kinds of work. But this is my typical complaint:
      
      a) Economists consistently repeat the assumption you seem to make: “no such universal laws really exist in social science.” Because of this they completely ignore very simple science, like conservation of energy or mass, or biological reality. This renders some of their models dead from the start. We had an example here on the Mekong delta: https://statmodeling.stat.columbia.edu/2018/08/23/problems-published-article-foot-security-lower-mekong-basin/
      
      In fact, basic physics and biology and psychology and cultural issues and legal requirements and etc all applies to economics, and there are often very good reasons to include hard science facts within economic models. For example in environmental economics or healthcare effectiveness, or welfare and poverty or 3rd world development. People need a certain amount of calories to perform certain kinds of labor, and they need a certain amount of protein, fat, etc to maintain health, children need different things when growing, transportation to and from work affects people’s ability to perform work. etc etc. Agent based models can incorporate these facts, equilibrium based linear regression equations generally can’t. Agent based models have external validity in so far as the effects they model are real constraints on the system, DID regressions don’t. etc. All the problems that plague say randomized controlled trials of drugs plague most of the pseudo-random external shock DID regression type stuff, plus a bunch more problems.
      
      b) Microecon papers often focus on availability bias: find some source of info where there’s some kind of exogenously imposed change and estimate the effect of it. This is independent of questions of whether we should care about this particular effect at this particular time, whether the question has any hope of “external validity” or otherwise, whether the entire process was a snapshot of a particular dynamical system at a particular time that will never again exist in the universe… etc. External validity is *literally* the entire reason for doing science. If we want to know the temperature outside we can put out a thermometer… It’s only if we want to be able to predict the weather that we need to develop weather models… And we *do* want to predict, especially in Economics related to policy such as environmental, traffic/infrastructure, education, monetary policy, student loans etc.
      
      c) Where there is theory it often has very basic problems. Like for example modeling something that’s obviously a dynamic process as if it were an equilibrium process in which the process can be described by algebraic equations relating the variables, rather than for example differential equations relating dynamics that changes over decades. The typical case is often even worse: everything is taken to be linear except a few things that maybe have non-theory-based polynomials or splines.
      
      Equilibrium assumptions are only valid when the equilibriation time is very short compared to the observational timescale. So for example plausibly the dynamic effect of Ronald Regan era tax code changes on wealth concentration is still playing out today and we’re just now reaching the equilibrium level 40 years or so later… anyone attempting to measure anything about tax code effects on wealth distribution at any point in the past would be essentially generating random numbers dominated by short term fluctuations in asset prices etc. Why? because the code affected day to day issues, but also the types of investments that companies made, financial strategies they have used, and inheritance issues affecting families an entire generation later (Donald Trump’s tax expose in the NYT for example). Addressing this general principle: that many if not most things are not in equilibrium at any given time is rare in my experience.
      
      So, if I complain about economics, it’s because I care about it, and when I read something about an issue I care about, I rarely find the results satisfying at a meta-level of “this provided good guidance for understanding the world at a generalizable level”.
    - Andrew on June 13, 2019 2:38 PM at 2:38 pm said:
      
      Daniel, Nope:
      
      I think I stand between you two on this issue.
      
      On one hand, I do think that reduced-form analyses in social science (econ and otherwise) can be valuable. Just as pure descriptive work (e.g., Red State Blue State) can help our understanding of the world, so can what one might call descriptive causal inference: reduced-form estimates from clean experiments and observational studies supported only by vague theory.
      
      At the same time, I’ve seen a lot of innumeracy among social scientists (including economists) and a general attitude that
      
      vague theory + identification + statistical significance = certainty.
      
      An example that will be familiar to readers of this blog is the regression discontinuity study of air pollution in China. Sure, this paper was published only in PNAS, not a real social science journal, but (a) PNAS papers often seem to get more attention and respect than papers in real journals, (b) that particular paper received uncritical major media coverage, and (b) the authors of that paper included several economists who never got around to retracting it.
      
      So, sure, reduced form analysis can be valuable, but numeracy is a requirement.
    - Daniel Lakeland on June 13, 2019 10:16 PM at 10:16 pm said:
      
      Andrew, I fully agree with your point about numeracy, realistic uncertainty, and the like. And I also think reduced form estimates of things have their important place… When the question is very inherently important, merely measuring an effect is useful.
      
      what I disagree with strongly is the general idea that reduced form estimates are all that’s possible, or that it’s a good use of resources to gold Rush into measuring all the things to publish papers and get tenure, or that mechanistic models are hopeless, or that structural econ models as practiced today are indicative of the hopelessness of mechanism…
      
      today econ models I’ve seen feel like they are divorced from important scientific insight from say biology, physics, engineering, medicine, dynamical systems, systems engineering, ecology and other areas that could make the theories much more realistic. Working in a vacuum and failing to explain things doesn’t mean mechanism is impossible. And it’s notoriously difficult to come up with the emergent high level behaviors exhibited by agent based models, so it’s not surprising that economists find it difficult to incorporate mechanism.
      
      my opinion is that the gold Rush to estimate all the causal things from as many public datasets and private datasets as possible leads to irrelevance when we are at a point in time where economics is extremely relevant. While economists fiddle with the causal effects of smoking bans on alcohol sales or whatever, people are burning the economy to the ground with trade wars and renewed calls for rent control and forgiving all the student loans and health Care is still a disaster and higher education has grown in price faster than any other good etc etc… To paraphrase Keynes, Economists have set the bar too low, we don’t want to know merely that after the storm the ocean is flat, or a decade ago when we passed some minor law about primary education it caused some mild reduction in tooth decay among the poorest 10% of kids or whatever.
      
      Reduced form estimates are interesting but far too low a bar. Econ envy of physics claiming that 99.9% of physical behavior can be explained is bullshit, predicting the weather even 36 hours ahead is very tough, understanding earthquake onset is very tough, describing climate sensitivity is very tough, but we don’t throw up our hands and just go around describing the causal effect of particular chalks or cue materials on the outcomes of billiards matches…
133 on June 11, 2019 5:01 PM at 5:01 pm said:

My personal experience suggests that often it’s not a problem about the statistics, but about the rush of publishing.
In my opinion, Bayesian pushes to define clear assumptions and hypotheses on the modelled systems. If you think the model is fitting well the data, then it’s your duty to test the consequences of these assumptions. And this can be only done through independent and orthogonal experiments. In life science, unfortunately it’s very common to see people stopping at the significance step making general claims based on their assumptions. They call this “making a story”. I call this telling fairy tales. People must accept that the stats is just to suggest possible models that after must be tested, otherwise we are not talking of science but of divination.

Reply ↓
- Martha (Smith) on June 11, 2019 5:39 PM at 5:39 pm said:
  
  “In the life sciences ….They call this “making a story”. I call this telling fairy tales.”
  
  Seems to be even more common in the social sciences.
  
  Reply ↓
- Raj on June 11, 2019 9:12 PM at 9:12 pm said:
  
  We need more debunkers of link/association claims. A recent story: https://www.sciencemag.org/news/2019/06/talk-hand-scientists-try-debunk-idea-finger-length-can-reveal-personality-and-health
  
  Reply ↓
  - Andrew on June 11, 2019 9:41 PM at 9:41 pm said:
    
    Raj:
    
    Yes, we discussed a particularly ridiculous example of this finger-length stuff here. Checking the link . . . this was over two years ago! Funny, I remember it as being more recent.
    
    Reply ↓
    - Martha (Smith) on June 11, 2019 10:14 PM at 10:14 pm said:
      
      Great quote from the link:
      
      “Bad form to put something in the title of the paper that’s not actually being measured.”
      
      Duh.
hmmm on June 11, 2019 9:02 PM at 9:02 pm said:

Great to see: Lakeland going off again, not knowing what he’s talking about. If you are trying to do causal inference, modelling the mechanism directly doesn’t magically give you results that can be interpreted causally. The problem with structural modelling (which, by the way, there is a lot of in economics) is that if your model assumptions aren’t correct, then your results can’t be interpreted. Model “fit” is often not that important if the goal is causal inference: you are just picking up meaningless correlations, who cares if your model fits the data really well? The classical way of doing things, like DiDs, has weaker assumptions required, and much more interpretability.

Please point to some actual success stories of structural modelling in the social sciences. News flash: there aren’t any. And I want to see something where a reasonable causal estimate was obtained, not just some predictive model that is fitting the data well. I really just can’t understand this view that causal estimates, for policy evaluation say, can be obtained in a reliable fashion by directly modelling the mechanism. Certainly this could be used in conjunction with some reliable reduced form estimates maybe (indeed, this is how must structural economics papers are set up), but by themselves these estimates are largely useless. It is simply not possible to understand what assumptions are really being imposed on the data, and what data variation is being used to come up with your causal estimates. Whereas with simpler reduced form models, this is very possible.

Reply ↓
- Daniel Lakeland on June 11, 2019 11:17 PM at 11:17 pm said:
  
  > If you are trying to do causal inference, modelling the mechanism directly doesn’t magically give you results that can be interpreted causally
  
  well nothing happens by magic, so there’s that. But if you have a causal model for a process, and you fit it, then you can interpret it causally, and make causal predictions, the nice thing is if you find these causal predictions don’t hold, then you know your model is wrong. You seem to think that’s a bug, whereas I think it’s a feature. The lack of success in social science is damning of social science theoreticians, not an argument that we should abandon the concept of theory.
  
  The problem I typically see with these policy evaluations is like what happened with the coal air pollution near the Chinese river,
  
  https://statmodeling.stat.columbia.edu/2013/08/05/evidence-on-the-impact-of-sustained-use-of-polynomial-regression-on-causal-inference-a-claim-that-coal-heating-is-reducing-lifespan-by-5-years-for-half-a-billion-people/
  
  or with the gun control study a while back:
  
  https://statmodeling.stat.columbia.edu/2016/03/11/why-this-gun-control-study-might-be-too-good-to-be-true/
  
  and https://statmodeling.stat.columbia.edu/2016/03/17/kalesan-fagan-and-galea-respond-to-criticism-of-their-paper-on-gun-laws-and/
  
  no amount of invoking mathematical theories of reduced form estimation is ever going to make those results correct. The second one dodges the question, and the first one treats a policy intervention discontinuity as magic.
  
  Reply ↓
  - Andrew on June 11, 2019 11:29 PM at 11:29 pm said:
    
    Daniel:
    
    That’s a good response, but I take Hmmm’s point that there are no actual success stories of structural modeling in the social sciences, or at least not a lot.
    
    I guess what I’d say is that successful causal inferences in the social sciences are typically associated with what might call “soft mechanistic modeling.” For example, I have a couple papers estimating the effect of incumbency in congressional elections. We don’t have a step-by-step structural model, but we have a sort of soft model that says that incumbents should do better than non-incumbents for various reasons including name recognition, constituency service, etc. A full structural model would attempt to separate these effects, and we haven’t tried to do that, but it’s not like our model is a black box, either. We have some justification for it. Similarly with economic models, for example estimating the elasticity of some input on some output. Basic economic theory typically implies elasticities between 0 and 1, and the basic theory isn’t always directly applicable, but again I’d consider this a sort of soft mechanistic model.
    
    These soft models apply to discredited science too. Brian Wansink had lots of theories motivating his claims regarding eating behavior, and the power pose researchers had theories on hormones etc. It turned out these theories didn’t work—or at least they didn’t produce the definitive data patterns originally claimed in those research papers. So I’m not claiming that soft theories are some sort of panacea. Again, I take Hmmm’s point that we’re rarely, if ever, “directly modeling the mechanism” in social science; rather, we have soft theories along with some data collection that is somehow connected to the theories.
    
    The one place I know of in social science where there’s something close to modeling of the mechanism is in psychometrics and educational testing, where there is sometimes a model of what bits of knowledge and skill are necessary for a student to get a question correct, and what is needed to instill these capacities in the student. But I agree that most social science doesn’t look like that.
    
    Reply ↓
    - Daniel Lakeland on June 12, 2019 5:49 AM at 5:49 am said:
      
      The fact that it doesn’t isn’t necessarily indicative that it shouldn’t. can we point to reduced form estimates that successfully do much in a convincing way rather than just mainly providing noise and heat?
      
      I’ve been reading up on agent based models, mainly for biology applications, but there were a few really nice social examples. Thomas Schellings racial segregation model and more modern variants, for example. I started working on a model for rent control back when Phil posted that stuff about YIMBY a while back. I didn’t have time to get very far but I was convinced it would be a productive way to study the Bay Area dynamic, and whether and under what circumstances building marginally more housing would cause the distribution of rents to fall, rise, or whatever. It was clear the tool was capable of representing the complexity at least.
      
      Another area that should get more attention is population structured models. For example looking at the evolution of income inequality using age structured integro-differential equations. Policies affect different ages differently, and different generations are exposed to different policies, transfers between generations affect different groups differently. You can structure on age and SES of parent and grandparent and then model the evolution of wealth.
      
      A friend who comments here is an economist who studies third world development, I’ve discussed using biological growth models of bones to study child growth data. we haven’t had a chance to do it, but it’s clear that the question of why certain children do poorly can be addressed by questions of biological inputs and choices made in different cultures about providing them, including resource constraints and cultural norms and education/knowledge…
      
      That social sci doesn’t do this sort of thing is more an indictment of social sci than a reason to dismiss it IMHO.
    - hmmmm on June 12, 2019 11:13 AM at 11:13 am said:
      
      Daniel, you cannot just fit a causal model. Generally this model will have made some assumption about exogeneity of an error term, and you almost certainly won’t have variation in your model that satisfies this. For example, to look at the YIMBY question, it’s very clear to me that to get far on that problem you will need some “shock”, either a supply or demand shock, to identify anything structural in your system. To model the endogeneity problem explicitly, to me, is kind of a cop out. You realize you don’t have some variation that mimics an experiment, so instead you will just make assumptions until you get identification. This is simply not the right way to do causal inference, imo.
    - Z on June 12, 2019 2:50 PM at 2:50 pm said:
      
      +1
    - Daniel Lakeland on June 12, 2019 3:48 PM at 3:48 pm said:
      
      > Daniel, you cannot just fit a causal model.
      
      Or what? the identification police will arrest me? If I have a model, I can plug in data, and run a fit, and get a posterior distribution.
      
      Indirectly what your comment shows is that you and I have *completely* different ideas of what the purpose of research is. I assume you would find a statement like “under certain circumstances X we’d predict that doing Y would cause effect Z and under circumstances A we’d predict that doing B would cause effect C, but we aren’t able to determine whether either X or A holds in the real world because observational data isn’t able to allow us to identify the existing conditions” would be useless. After all you can’t just get a fancy econ professorship with that kind of thing right?
    - hmmm on June 12, 2019 6:19 PM at 6:19 pm said:
      
      I can tell you are not up to date on current economics research. If you think that a good reduced form economics paper resembles the crap that is published in PPNAS, then you have been misled, good sir. The reason policymakers use reduced form work over structural work is because the reduced form work (in economics) is replicable and interpretable. Structural work is neither (from experience, I have been unable to replicate structural papers with the code in front of me).
    - hmmm on June 12, 2019 6:23 PM at 6:23 pm said:
      
      No I’d be perfectly fine with that statement you made. What I’m not fine with is this weird world you seem to live in where you can just fit models until the cows come home and somehow learn something causal about the world. In my view, at the end of the day you need some exogenous variation or shock to the system under consideration. No matter how fine you make your model of the housing market, you will be fundamentally screwed as far as estimating supply or demand elasticities go unless you have a demand or supply shock to work with. How do the Bayesian methods get around that? They don’t..
    - Daniel Lakeland on June 12, 2019 9:14 PM at 9:14 pm said:
      
      See i just don’t think the purpose of research is pure measurement of effects. That can be ONE purpose, but then I’d much rather get measurements of fundamental quantities than effective quantities. Just to analogize to something I’m more familiar with at a deep level, would you rather know the viscosity of a fluid, or the effect of allowing one liter of the fluid to run through a pipe of a particular diameter with a particular length? ( the velocity is such and such, the vessel drains I’m such and such time.. Who cares?)
      
      I’m not primarily a person who reads econ papers, but I’d rather read a paper about an ABM that reproduces a variety of investment behaviors in poor countries, and explains how those decisions depend on a variety of factors than a paper that accurately estimated the causal effect of a marginal year of education on the adoption rate of e-scooters in southern parts of Vietnam in the year 2017 or whatever, the point being that the net effect of a large number of very specific conditions is very rarely by itself of any real interest… What would be the perturbation effect of a surplus of lentil production on this estimate? You’ll never know. Next year the environment may be so different that your estimate is worthless for any purpose at all. This is routine in marketing research. It’s a mistake to think that we should care strongly about specific numbers. As Hamming said the purpose of computing is insight, not numbers.
    - hmmm on June 12, 2019 9:31 PM at 9:31 pm said:
      
      ^Daniel, that’s fine. Could be interesting. But, I thought we were talking about empirical work here? Which, that is not.
    - Daniel Lakeland on June 12, 2019 10:04 PM at 10:04 pm said:
      
      ABM that reproduces a variety of … etc etc
      
      how do we know it reproduces these things? We compare predictions to observed data in different situations. The existence of the data and the comparison makes it empirical, if the model can make predictions about the future and then we can compare them in the future when observations can be made, all the better…
      
      I’d love to see a reduced form estimation paper which then makes some kind of reduced form prediction and then later in the future we can in fact see if the prediction comes true… do you have an example? The problem with the whole exogenous shocks DID estimates etc paradigm is even if it gives the right answer for the past, a questionable but at least concievable situation, the lack of a reasonable model makes it impossible to know whether the situation generalizes to the future in a world with vast numbers of changing conditions.
      
      what is the health effect of opening a coal power plant in a town in China? you might estimate it for town T in year Y, given a decade of hospital records but does it mean opening a similar plant in town T2 in year Y+N should be similar? town T2 is on the coast has different prevailing winds, a different demographic, some technology differences in the power plant, a different Healthcare system, there are alternate policies on asthma that the government passed in the intervening years, the power plant runs at lower capacity due to lower demand caused by tariffs imposed by Trump… etc etc. the old number means nothing or maybe it means something… we just don’t know.
    - Daniel Lakeland on June 12, 2019 10:15 PM at 10:15 pm said:
      
      > just fit models until the cows come home and somehow learn something causal about the world
      
      The thing I think you’re missing is that I’m not claiming that any model you fit automatically gives you causal inference… I’m claiming that if you build a mechanistic model that reproduces important causal relationships that actually hold in the world then when you fit it to pretty much any data at all you will increase your knowledge of causality.
      
      how do you know that your model really IS reproducing these causal relationships? that’s a hard question that requires a lot of model checking and experimental evaluation, and cross validation with alternative datasets and measurement of auxiliary quantities simultaneously predicted by the model and etc. you can’t just run some linear regression and call it a day.
    - hmmm on June 12, 2019 12:00 PM at 12:00 pm said:
      
      Also Daniel, this isn’t even how it’s done in economics. I actually think there is too much theory in applied micro fields now. The typical paper in labour economics will have a big theory section, and the empirical section will often either be pure structural or the soft mechanistic modelling Andrew refers to above. Economics publications in top journals look nothing like PPNAS or whatever that junk is that gets cited here all the time. You have to understand that economics is not physics: the “R-squared” in physics is like 99.9%, in economics it’s 2%. That is, no models do a good job in economics, and they probably never will – the systems are just too complex. This should give you pause before applying your physics methods to social science work.
    - Daniel Lakeland on June 12, 2019 1:30 PM at 1:30 pm said:
      
      every time I hear that kind of stuff I think of this https://statmodeling.stat.columbia.edu/2012/01/28/the-last-word-on-the-canadian-lynx-series/
    - Anoneuoid on June 12, 2019 1:33 PM at 1:33 pm said:
      
      I’m not familiar with the economics literature but for a biomed topic I was told similar (it’s more complicated than physics, etc). Then I found that not only was this untrue, but people had come up with full-blown derived-from-first-principles mathematical models for what I was studying in the 1930s that could explain what I was seeing. But the same people didn’t care, they just wanted to know if there were significant differences anyway.
      
      So I wouldn’t be surprised if there was a similar illness at the center of economics research.
    - DC on June 12, 2019 2:03 PM at 2:03 pm said:
      
      +1. Also, one can have reasonable models of phenomena even though the topic is ‘more complex than physics’. The choice isn’t just between explaining 99.9% vs. 2%, nor is it really necessarily about variance explained for that matter. Yes, there’s instability, uncertainty, and just plain noise in lots of investigations of human behavior and social systems (I work on quant and qual problems in this area, certainly nowhere near a physicist); but, it seems to me that *this is literally the point of trying to build better models*, i.e., so that we can try to better understand such complexity and use our imperfect models to poke holes in what we think we know (e.g., see Andrew’s many comments, blogs, and points in papers about the value of continuous model expansion and predictive checks).
Jonathan (another one) on June 11, 2019 9:36 PM at 9:36 pm said:

To me, it’s not so much about statistics crushing dissent as it is that statistics can almost never be used to compel belief. As practiced the vast bulk of published wok is (a) I have a theory; (b) the data is consistent with that theory; (so far no statistics needed) and (c) the data is far more consistent with my theory than a particular version of not-my-theory that I have selected to show you. It should be unsurprising that (c), while helpful when well-performed, does not actually compel *belief.* Frankly (a) and (b) by themselves do a lot more to compel belief than (c) does, except in the very best work. By the same token, however, doubts about the methodology employed in (c) rarely compel disbelief… they merely negate (c) as a rhetorical device. (a) and (b) survive intact for the most part, other than the metaproof: “If that’s the best statistical test you can muster, your data must be weak.” (That is that argument I usually here for using 0.05 as a cutoff p-value: given the opportunities inherent in forking paths and p-hacking, if you can’t easily get below 0.05 you have nothing.)

Reply ↓
- Jonathan (another one) on June 11, 2019 9:37 PM at 9:37 pm said:
  
  And people publish work, not woks.
  
  Reply ↓
Anonymous on June 11, 2019 10:11 PM at 10:11 pm said:

On the contrary, with the increasing popularity of Bayesian statistics, researchers now have an extra forking path. If the NHST produces p > 0.05, then a Bayesian method can be applied.

Reply ↓
- 133 on June 12, 2019 5:27 AM at 5:27 am said:
  
  This is not true. With Bayesian you are forced to explain your assumptions clearly (just showing which distros you chose is a lot) so other people can understand your hypotheses. Also, with Bayesian you end up with an estimate of the uncertainty about rejecting the null. This is much more informative than a pvalue. Getting contradictory results for frequentists and Bayesian methods is not corresponding to the reality unless one screws up with the models.
  
  Reply ↓
  - DC on June 12, 2019 9:36 AM at 9:36 am said:
    
    @123, it is admirable that you gave this some reflection, but folks like ‘Anonymous’ here seem to never cease to say these same things over and over with no justification. Plenty of fair critiques exist of Bayesian theory and inference, but the kind of half-baked critique written here typically seems to come from folks with little to no time spent actually using Bayesian methods in their own work.
    
    Reply ↓
  - MichiganWater on June 12, 2019 8:29 PM at 8:29 pm said:
    
    This is absolutely true. Here’s a perfect example from a critique (by me) of a paper published in the field of exercise science, looking at the effect of training volume on muscle hypertrophy (among other things).
    
    The authors performed a frequentist analysis and failed to demonstrate a statistically difference between a medium-volume group and a high-volume group. No worries, though! We’ll still be able to make a substantive scientific claim that the high-volume group was better because Bayes Factors to the rescue! Two of the four metrics they tested has Bayes Factors of around 2.3 in favor of a difference between groups (2.34 and 2.35). [Oh, the other two metrics had BFs in favor of no difference (1.51 and 1.67), but since that’s not exciting, we can just ignore those, right?]
    
    Here’s the long, long version:
    https://www.reddit.com/r/AdvancedFitness/comments/9ina5h/the_schoenfeld_volume_study_results_do_not/
    
    Anonymous is exactly right. Can’t get there with a frequentist analysis using alpha=0.05? No problem. We’ll just go down another path.
    
    Reply ↓
    - Keith O'Rourke on June 13, 2019 7:57 AM at 7:57 am said:
      
      Thanks – this brings us back to the comment thread in which Daniel’s comment (the topic of the post) appeared.
      
      Always good to recycle blog content ;-)
      
      https://statmodeling.stat.columbia.edu/2019/06/09/question-9-of-our-applied-regression-final-exam-and-solution-to-question-8/#comment-1058027
      
      “Agree – its the ” later on, practitioners” that should be the primary focus of applied statistics courses.
      
      That “what will people repeatedly do” later given what they should have learned in the course.
      
      And if the course won’t or can’t cover Bayesian Workflow adequately, pointing them to Bayes may well do more harm than good.
      
      Bayes has to be practiced safely!”
    - DC on June 13, 2019 8:39 AM at 8:39 am said:
      
      And, how many Bayesian statisticians do you think would find such a claim justified? I can think of zero. If your answer is none or very few, then I think my point still fully holds. As with any frequentist procedure, there will be people who do Bayes poorly and people who do it well; just because you can pull examples of misuse of Bayes factors (which, much like many others on this blog, I do not use or advocate for) doesn’t prove a larger point.
    - Chris Wilson on June 13, 2019 9:04 AM at 9:04 am said:
      
      At the risk of some kind of No True Scotsman fallacy, this kind of thing is not really Bayesian in spirit. This is what happens when you mash up the desire for classical NHST-style hypothesis testing, with Bayesian methods. The Bayesian approach here would be to estimate the various differences between groups, using partial pooling wherever possible, and quantify the relevant uncertainties.
    - 133 on June 13, 2019 12:37 PM at 12:37 pm said:
      
      What you say doesn’t prove anything but that bad practice doesn’t depend on the tool.
      
      If you use your smartphone to hammer a nail, and after destroying it you start using your head, that’s not a problem of smartphones and heads.
      What you say has nothing to do with “Bayesian is an additional way to achieve bad purposes”.
      You should have said, “wrongly applied methods are ways to achieve bad purposes”.
    - MichiganWater on June 13, 2019 8:35 PM at 8:35 pm said:
      
      I’m not sure why there was a disconnect with other commentors, but Keith got my point exactly right. Bayes has to be practiced safely! When you add a Bayesian approach as an option, or a somewhat(?)-Bayesian approach in this case, there’s the potential for shenanigans by using it as another forking path, as stated by Anonymous above. I took 133 to be saying that researchers couldn’t do the thing that Anonymous said they could do, so I gave an example where the researchers did exactly what Anonymous said might happen. Nothing in my response was intended to be negative on Bayesian methods (or classical, for that matter), so perhaps there was some context concerning a ‘larger point’ that I was missing that contributed to this confusion regarding my comment.
      
      And, DC, I wouldn’t be worried about Bayesian statisticians finding a claim like the above justified, because they’d know what they’re talking about. But they aren’t the target audience of such scientific papers. Isn’t the concern that _researchers_ will accept claims when there isn’t a sufficient justification, especially when they are seeing new, unfamiliar methods of analyzing data?
      
      Chris, not sure I get a vote, but Aye. I think it was from one of Richard McElreath’s lectures that I got the idea that if you’re not getting a posterior distribution to work with, then you’re not really doing Bayes. Seems reasonable to me, but I also see a lot of stuff being produced that uses Bayes Factors (papers, blogs, JASP, etc) as the key thing to interpret, so maybe the Bayes tent is bigger than what I think it is.
Zad Chow on June 12, 2019 2:18 PM at 2:18 pm said:

“– X is angry: first the statistical establishment required statistical significance, now the statistical establishment is saying that statistical significance isn’t good enough.”

I’ve seen this… even today. Was recently having a discussion on Twitter on why the number needed to treat is a problematic statistic, and an exchange with someone (who I believe to be a physician) ended up with him/her saying the following,

“P values are notoriously misleading. Relative risk is vague and unintuitive. Absolute risk is just the NNT flipped over. Short of everybody meticulously reading the supplement to every study, what do you suggest?”

https://twitter.com/samblack/status/1138794108821094400

Reply ↓
- Anoneuoid on June 12, 2019 2:56 PM at 2:56 pm said:
  
  I’d suggest collecting this Group A vs Group B type data to begin with. There are times it makes sense but mostly provides little valuable info.
  
  Reply ↓
  - Anoneuoid on June 12, 2019 3:02 PM at 3:02 pm said:
    
    Typo, that should be:
    
    not collecting this Group A vs Group B type data to begin with
    
    Reply ↓
    - Martha (Smith) on June 12, 2019 3:46 PM at 3:46 pm said:
      
      But often comparing results in Group A and those in Group B is exactly the purpose of the study.
    - Anoneuoid on June 12, 2019 4:41 PM at 4:41 pm said:
      
      Yes, I mean making that the purpose of the study is misguided to begin with. Instead of looking for differences they should be looking for “universalities” and then coming up with theoretical explanations for those.
    - Martha (Smith) on June 12, 2019 10:44 PM at 10:44 pm said:
      
      Please give some examples of what you consider ” universalities”, and also explain why you think people should be looking for them — and in what types of research questions these are relevant. (e.g., it seems to me that to compare drug A and B, one needs to compare results in people who have taken drug A with people who have taken drug B (for the same medical condition, and under the same other possibly relevant conditions).
    - Anoneuoid on June 13, 2019 1:10 AM at 1:10 am said:
      
      I have a previous post to this blog with more links I can’t find at the moment but eg:
      
      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2007940/
      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2916857/
      https://link.springer.com/article/10.1007/BF02289265
    - Daniel Lakeland on June 13, 2019 8:25 AM at 8:25 am said:
      
      Sometimes we just need to measure something. This is a very low level kind of research but it’s important. So for example comparing two drugs, you have to do that some times. I would argue typically late in the process of understanding your drug, when you want to measure effectiveness in real world conditions.
      
      But let’s take an example of where universality is more important. I read a RCT report on accupuncture. They compared accupuncture to antihistamines and steroids for control of allergies. in my opinion if you understand many of the mechanisms of allergy and it’s control, then to study accupuncture you should administer it and measure quantities like inflammatory molecules, number of immune cells in the bloodstream, the release of immune mediating compounds and soforth.
      
      Instead the design was typical: administer accupuncture and see if there is a difference in how often people choose to use their medications, or if they rate their symptoms differently. They used noninferiority tests methodology, it was in my opinion a complete waste of time. Even if there was a causal effect on drug use and ratings you have zero knowledge about whether it would persist, whether it was largely psychological, or whether it actually might reduce physical effects such as downstream long term problems caused by hyperactivity of the immune system.
Sameera Daniels on June 16, 2019 5:00 PM at 5:00 pm said:

Re: Instead the design was typical: administer acupuncture and see if there is a difference in how often people choose to use their medications, or if they rate their symptoms differently. They used noninferiority tests methodology, it was, in my opinion, a complete waste of time. Even if there was a causal effect on drug use and ratings you have zero knowledge about whether it would persist, whether it was largely psychological, or whether it actually might reduce physical effects such as downstream long term problems caused by hyperactivity of the immune system.
—-

Excellent Daniel

Your hypotheses would pertain then to many other allopathic related RCT too. Not simply a non-allopathic treatment.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

How statistics is used to crush (scientific) dissent.

126 thoughts on “How statistics is used to crush (scientific) dissent.”

Leave a Reply Cancel reply