What are my statistical principles?

Jared Harris writes:

I am not a statistician but am a long time reader of your blog and have strong interests in most of your core subject matter, as well as scientific and social epistemology.

I’ve been trying for some time to piece together the broader implications of your specific comments, and have finally gotten to a perspective that seems to be implicit in a lot of your writing, but deserves to be made more explicit. (Or if you’ve already made it explicit, I want to find out where!)

My sense is that many see statistics as essentially defensive — helping us *not* to believe things that are likely to be wrong. While this is clearly part of the story it is not an adequate mission statement.

Your interests seem much wider — for example your advocacy of maximally informative graphs and multilevel models. I’d just like to have a clearer and more explicit statement of the broad principles.

An attempted summary: Experimental design and analysis, including statistics, should help us learn as much as we can from our work:
– Frame and carry out experiments that help us learn as much as possible.
– Analyze the results of the experiments to learn as much as possible.

One obstacle to learning from experiments is the way we talk and think about experimental outcomes. We say an experiment succeeded or failed — but this is not aligned with maximizing learning. Naturally we want to minimize or hide failures and this leads to the file drawer problem and many others. Conversely we are inclined to maximize success and so we are motivated to produce and trumpet “successful” results even if they are uninformative.

We’d be better aligned if we judged experiments on whether they are informative or uninformative (a matter of degree). Negative results can be extremely informative. The cost to the community of missing or suppressing negative results can be enormous because of the effort that others will waste. Also negative results can help delimit “negative space” and contribute to seeing important patterns.

I’m not at all experienced with experiment design, but I guess that designing experiments to be maximally informative would lead to a very different approach than designing experiments to have the best possible chance of yielding positive results, and could produce much more useful negative results.

This approach has some immediate normative implications:

One grave sin is wasting effort on uninformative experiments and analysis, when we could have gotten informative outcomes — even if negative. Design errors like poor measurement and forking paths lead to uninformative results. This seems like a stronger position than just avoiding poorly grounded positive results.

Another grave sin is suppressing informative results — whether negative or positive. The file drawer problem should be seen as a moral failure — partly collective because most disciplines and publishing venues share the bias against negative results.

I was going to respond to this with some statement of my statistical principles and priorities—but then I thought maybe all of you could make more sense out of this than I can. You tell me what you think are my principles and priorities based on what you’ve read from me, then I’ll see what you say and react to it. It might be that what you think are my priorities, are not my actual priorities. If so, that implies that some of what I’ve written has been misfocused—and it would be good for us to know that!

38 thoughts on “What are my statistical principles?

  1. > designing experiments to be maximally informative
    RA Fisher’s take at the end of his career was to minimize the assumptions needed for analysis and make those that are unavoidable more likely to not be very wrong.

    You seem to want to design them to find out what’s most wrong with the assumptions in order discern less wrong assumptions.

    Also maybe, any study is just a way of better informing how to do the next study or decide to settle for what has been learned so far.

    And ESP has been established beyond any reasonable doubt and all scientists just have to accept that.

  2. Hmm. I don’t think “forking paths” are “design” errors, they happen during analysis…

    I think the bits about maximizing information seem consistent with your philosophy.

    I’ve not tried to think before about what your overall philosophy might be, but off the cuff I’d say you seem to be aiming to make your assumptions explicit and get your model as accurate as possible.

    • Dogen said,
      “I don’t think “forking paths” are “design” errors, they happen during analysis…”

      I think “forking paths” can be see as the result of a big design error: Namely, letting “design” change as a result of analysis.

    • Speaking of the garden of forking paths, here is an argument that it’s not something researchers stroll down by accident, but rather something deliberately cultivated in even the median paper. Alvaro de Menard (unlike Andrew) seems to think being a good, conscientious person is enough because such a person wouldn’t publish obvious crap otherwise, but hardly anyone is that.

      • >Alvaro de Menard (unlike Andrew) seems to think being a good, conscientious person is enough because such a person wouldn’t publish obvious crap otherwise, but hardly anyone is that.

        The way I see it people mostly follow the incentives of the system they’re in. They are currently incentivized to behave badly, but that doesn’t mean they do it accidentally. Their personal goodness is a secondary factor at best (perhaps more relevant in outright fraud cases, but those are a rounding error).

  3. I won’t try to articulate your principles (or even mine). But Jared’s attempt, while there is much to agree with, seems too narrow. I don’t think it is all about experimental design/evidence. You have said a lot about observational data in the past, and I’m not sure it makes sense to interpret all observational data as experimental data – that would make all observations sort of “poor” experiments. Perhaps that is a fruitful way to think of things, but in economics we attach somewhat more importance to observed things than merely one observation out of the many possibilities. I wonder what you, and others, think about that – or is it just going down a rabbit hole?

      • I would not regard observational studies as “poor experiments — because I would not classify them as experiments. Experiments are important, but observational studies can also contribute to our knowledge. They can, for example, give information that helps in designing experiments. They might also give useful information for priors in Bayesian analysis.

      • The need for control and randomization in clinical studies is mostly specific to that type of study. It arises because:

        1) humans (and really all life forms) are highly variable test subjects;
        2) the scale of treatment effects can range continuously from zero to very large;
        3) the number and magnitude of adverse effects is often unknown;
        4) people’s lives could be at stake
        5) the possibility of researcher bias

        Observation is still a valid form of learning though: that’s what the CDC is doing with case reports on COVID-19.

        Actually, I wonder if the very strong bias for RCT has prevented the medical community from grasping the substantial evidence for aerosol transmission of COVID-19.

        Suddenly I get your thing about RCTs being problematic, Andrew. People are trained to think we can’t learn anything from observation. But we can and we have to.

  4. Statistics is hard! Any individual will make mistakes, lack data, or make a faulty assumption. Show your work and explicitly state each assumption in an analysis. Preferably building a full model because it will allow you to show all of the chosen effects and adjust for them appropriately. Then share that analysis with open source code so that anyone that wants to follow your reasoning can find exactly how it came to be.

    Know why you are performing the analysis. If the point of the model is to help you make decisions then you should be just as interested in the variance parameter and the uncertainty of your parameter estimates. You should be using all of those parameters to make an optimal decision. If the point of the model is communication then statistics is REALLY hard! You can be doing a disservice if you do not adequately represent your uncertainty and emphasize point estimates or hard conclusions.

  5. I don’t know, and I’ve been working at Columbia for over a decade now. It would be easier if the question were “What are Andrew’s *scientific* principles and priorities?”, so we could talk about making data / code public, acknowledging mistakes, etc. If the question were “What are Andrew’s causal inference principles?”, that would be harder but you could start with Rubin’s principles and make some amendments. If I were to try to narrowly list Andrew’s statistical / probability principles, it would be something like

    1. Probability must have a Frequentist justification. This is perhaps more of an anti-principle than a principle, in the sense that Andrew has a principled opposition to Cox-like theorems that attempt to provide a non-Frequentist justification for probability.
    2. The model is wrong. Not just in the “wrong but possibly useful sense”, but Andrew tries to make inferences assuming the model(s) is / are wrong, in contrast to what almost all Bayesians do, which is make inferences conditional on the model being right after having done as much as they can to make the model right.
    3. Folk theorem. Almost nothing Andrew refers to as a theorem is an actual theorem, but the conviction that computational problems are often indicative of modeling problems is closer to a principle than, for example, the Pinocchio Principle.
    4. All effects vary. In this sense, concepts like an average treatment effect are artificial because no one’s individual treatment effect is average. Similarly, all time-series have an infinite number of break points, etc. This principle motivates hierarchical modeling and continuous model expansion.
    5. Don’t dichotomize. I guess this is another anti-principle, but it is behind Andrew’s opposition to rejecting a null hypothesis, claiming that a study has been replicated, Bayes factors, and so on.

  6. This is an interesting thread, not so much for trying to write the screenplay of “Being Andrew Gelman” as for placing his and our concerns in a larger (epistemological?) context.

    I don’t think it all boils down to one single principle, but there is an overall logic.

    I agree with Jared that a focus on learning, and not just avoiding error, is part of the story. But this may itself be an outcome of a deeper commitment to complexity and uncertainty. One aspect of this is skepticism toward broad-brush claims, average treatment effects, etc. It’s a complex world with many competing forces at work, and each local situation, maybe each individual, is conjunctural. Another aspect is that uncertainty is baked in, so to speak. We can reduce it a bit by better measurement and other practices, but a lot of it is simply going to be there. Rather than trying to blot it out in favor of binaries like accepted/rejected hypotheses, we should accept it for what it is, a part of our reality. If we think the world is complex and uncertain in this way, prioritizing learning over time and across a community makes a lot of sense.

    Another consequence is that we need to be more forgiving toward error (statistics is hard etc.) and less forgiving toward dissembling and hubris.

    Maybe (not sure about this) still another consequences is that, in looking at all the stages in measurement, research design and ultimate analysis that have to grapple with complexity and uncertainty, we should put less emphasis on elaborate statistical methods that in a pristine theoretical world would give us slightly more precise or less biased inferences. In the face of all that we might get wrong, sometimes a little common sense skepticism is better than a cleverly deduced test statistic. Effective graphical representation can be a big help in this.

    On another level, I sense that there is concern that the institutions that shape research — academic clubs and hierarchies, grant-making, journalism outlets that popularize results — generate perverse incentives.

    • Several good points.
      Also, the phrase “uncertainty is baked in” is a good way to describe an important point that is often neglected (or, worse yet, swept under the rug).

  7. Andrew:

    What you write about varies from moment to moment, but here are the key features that you seem to espouse. I wouldn’t call them “statistical” principles. Just principles of the scientific endeavor, often expressed through the application of statistics:

    1) Quality experimental measurement. Bad data can’t be helped.

    2) Understanding what a given method can and can’t do and specifically minding the problems that bedevil the NHST approach

    3) People should stop making strong – some might say ridiculous – claims on weak – some might say virtually worthless – data generated with a questionable method.

    4) Honesty is good but not an excuse for sloppiness or inappropriate applications of method.

    5) Fraud happens. When it happens it should be called out and the people who commit it should be held responsible.

    6) Criticism is good and should be far more common.

    I don’t think that’s exhaustive but its a pretty good start.

  8. Andrew:

    This is what I think of you, but it is not a principle.

    You are honest with your work and are a true Bayesian. You love what you do. Keep doing it because you are doing a great job to the field as well as to many learners.

  9. The long version is already articulated in papers like Gelman & Shalizi (2013) but one way I conceptualize this community, which in my mind is linked to your personal mission although they’re not identifal, is:
    This is a place to share the vision — and help scientific communities apply it to their fields — that statistics is not a recipe that yields objective truth, but rather a self-consistent, principled, and transparent way to reason about uncertainty so that scientists (broadly defined) can understand what is and is not known, facilitating learning.

  10. I wouldn’t say true Bayesian. It seems you worry about long-run properties of procedures and calibration of them, especially in the context of model selection. So it seems you don’t particularly care about likelihood principle or always applying strictly Bayesian principles. A strict Bayesian would just condition on the data observed and not worry about data that could have been observed.

    On the other hand, you obviously accept most aspects of the Bayesian approach, especially for parameter inference. So I’d say your approach was a hybrid of Bayesian, frequentist/long-run error control ideas, and what you consider common sense.

    • Andrew (other):

      Worry about “long-run properties of procedures and calibration of them” seems a controversial topic. Your “true” Bayesian will argue that Bayes makes frequentist/long-run error control ideas and what you consider common sense, superfluous and something that should be disregarded.

      Now, it is true that if the prior and data model are essentially unquestionable (e.g. breast cancer detection), that is true deductively. However, it’s almost never the case in most research.

      What is concerning me now is how to raise this concern when someone is giving a talk on Bayesian analysis to a non-expert committee and arguing it displaces all concerns about error control. Even in breast cancer detection most seem to want to know test sensitivity and specificity in addition to positive and negative predictive probabilities. Often the presenter will respond that a skeptical or vague prior will finesse the concern (along with a number of references). But with multiple parameters, the implications of these may not be very clear. On the other hand, a point prior of one on no effect is a very skeptical prior for which there will be no updating but the type one error can be assessed (possibly as a function of other nuisance parameters).

      Another way to see no problems if the prior and data generating model are above question, the average coverage of the credible intervals equals the posterior probability. So on average at least, this automatically provides good error rates. But again, depends on no concerns about the prior and data model.

  11. I really appreciate Jared’s question. I believe there is great value in each of us spending more time considering that we do have principles that guide us in our work even if we can’t easily write them down (or choose not to). I think science could benefit from more talk about “Principles of Statistical Practice” or “Philosophies of Statistical Practice” and greater acknowledgement of how individual they can be — as well as how difficult it is to articulate them given the many levels to be considered within the theory and practice of Statistics.

    I have thought a lot about my own Principles of Statistical Practice. What stands out to me is this — the most important information I gather to understand my principles comes from identifying practices/decisions/behaviors that do NOT align with my principles. It comes from paying attention to the things that make me cringe (even if I haven’t articulated a broader principle the practice is violating). That is, I know when my principles are violated even if I can’t satisfactorily explain the underlying principle being violated. I see your blog as doing something similar — your efforts at pointing out things that don’t align with your principles provide a great deal of information about your principles without you having to state them explicitly. I understand the desire to want a nice, clean list of principles and then to check different practices against it, but I just don’t see that as the way things work. It is messier and more cyclical than that. In some ways, writing them out explicitly can force simplification that may encourage unnecessary rigidity, or at least less willingness to consider nuance. Which brings me to what I really wanted to say —

    I see your blog as a space that allows for nuance. Not providing a single strong guiding position or list of principles is okay by me, even if it can feel uncomfortable. Maybe “refusing to ignore nuance” can be considered a high level statistical principle itself. When I think of scientists I most respect, that seems to be a common underlying principle.

  12. Some of your principles, as inferred by me (feel free to correct any of it):
    –Embrace uncertainty and variability.
    –Don’t dichotomize continuous variables.
    –Related to the preceding: it’s usually better to model natural phenomena (at least in the social sciences) with continuous rather than discrete variables.
    –Also related to the preceding, but more general: don’t throw away information during inference.
    –Bayes’s formula is good for inference inside models, but not for evaluating models.
    –The best way of evaluating a model is through simulating fake data from the posterior predictive distribution and plotting the simulated data against the actual data.
    –Priors based on symmetry/entropy considerations are generally very inferior to priors based on scientific knowledge.
    –The prior is a part of the model, just like the likelihood, and can be revised in light of data, just like the likelihood.
    –Complex models are generally better than simple models (because the world is complex).
    –Hierarchical models are generally better than non-hierarchical models (because partial pooling of data is better than complete pooling or no pooling).
    –Measurement is generally more important than statistical inference.
    –Statistics is hard and it’s OK to make mistakes.

    • Whether or not they are Andrew’s principles, they sure sound like good principles to me. (But I’m open to disagreement backed up by sound reasoning or counterexample.)

  13. I think you already wrote a summary of your statistical principles, though it’s long and doesn’t boil down to one catchy phrase: https://statmodeling.stat.columbia.edu/2009/05/24/handy_statistic/

    My personal guidance rule is that I’m relatively stupid. There’s always somebody who knows more and does things better than me in an specific area. That’s what motivates me to leave a paper trail and document each step as good as I can so I can reevaluate it after some time, even with faulty memory and reason, and can get feedback more easily.

  14. Not exactly your principles, but how would you feel about the following becoming the standard model for data-analytic research:

    (1) The dataset is randomly split into two parts. The researcher is blinded to one part.
    (2) The researcher then performs whatever exploratory analysis on the other part that they see fit to do.
    (3) The researcher then pre-registers a piece of code.
    (4) The researcher is then unblinded to the other part of the dataset. They run the code as-is, and report the full suite of results.
    (5) Any change made to the code after unblinding is disclosed as a protocol deviation.

    This model has the virtue of allowing researchers to dive deep into their data, as they (rationally) feel they must, while preserving the nominal statistical properties of their inferences, conditional on the exploratory part of the data. It also ensures analytic reproducibility (due to code publication), and ensures the full set of results from the inferential part of the data are reported, so readers can make their own judgments about issues like multiplicity.

    The main shortcomings would be how to deal with small samples, and how to verify blinding. I’m not sure much can be done about small samples, since one either rigorously adheres to a fixed plan that is almost surely suboptimal given little prior information, or one explores the data to a point where any well-behaved inference is impossible. As to the blinding issue, I’m OK with it not always being verifiable, since this still turns a statistical problem into a problem of outright fraud. Fraud is both more rare and easier to punish severely than subtly dubious statistical practices. Finally in cases where this just isn’t possible (e.g., the researcher is reanalyzing data that’s already been analyzed before), it just isn’t possible, but we can downgrade the evidentiary value of such research accordingly.

    The other piece I’d add, to deal with publication bias and selection on results, is:

    (1) Get rid of journals, and replace papers with blog posts.
    (2) Have a (moderated) comments section function as post-publication review.
    (3) Use collaborative filtering to help readers sift through the ocean of garbage to find the bits of good stuff.

    Not saying this would fix all the problems with science, but it would seem to fix many of the purely statistical problems. Would you have any major concerns about this alternative paradigm?

  15. “Visualize uncertainty, embrace variation, and proceed with humility; to such a seeker will G-d reveal his leaves; to those who seek elseways, a life confined among the garden walls, as if in darkness” – jrc on ag

  16. The correct answer is David Freedman’s law of conservation of rabbits: “If you want to pull a rabbit out of the hat,
    you have to put a rabbit into the hat”

  17. I’ll start with “if I were in Andrew’s place I’d…”, which means that it will be some kind of strange mix between what I think Andrew thinks and what I think Andrew could think could be fine of what I think. So…

    If I were in Andrew’s place I’d start with what it’s all about: Finding things out about reality, more precisely about a reality that we cannot construct at will, that may force us to take it into account because there’s a good chance that we will get into trouble if we ignore it.

    That we cannot control it means particularly that we have to be self-critical. We should not accept results because we like them, or even because some method we like and often trust has given us this result. We should be aware that reality may be unforgiving if we get things wrong, even if we are brilliant at convincing others that we are right. We should not be easily convinced. We should ask about whatever result and model, what can go wrong with this, and can we check whether it has gone wrong, and if so, how? A good result is a result that still stands if we and others make the best efforts to make it fall.

    Data are key, but they are at the same time under- and overestimated. They are underestimated in the sense that they often hold much stronger information than what can be used assuming a certain model, namely informing us about issues with the model, giving us ideas about a better model, and then showing issues with that better model too (sometimes the best use of models is to show us how reality deviates from it… ooops, I can’t remember Andrew having said that, it may be my own;-). They are underestimated regarding the ability of new data to make a model or conclusions based on previous data fall. The case is not closed if a however good looking result has been achieved based on a certain dataset, because new data, even seemingly from the same process, may very well throw into doubt or even falsify that result.

    Data are overestimated when thinking that data can be trusted, tell the whole story, or that all decisions required for modelling, estimating, uncertainty assessment can be made based on the data alone. It has to be questioned how data were obtained, and it has to be accepted that many decisions in data analysis (e.g., several issue regarding the setup of a prior; but also modeling and method choices in frequentist analysis) cannot be inferred simply from the data, but require background knowledge, and sometimes almost arbitrary decisions among several at first sight equally valid alternatives (occasionally some experimentation can help, but not always). We should not trust methods that are sold as “objective” because all decisions are apparently made by the data and not by the user; this means usually that a method designer has made some decisions for the user, without any proper knowledge of the situation in which the user will apply the method, therefore not very reliable; the user should also not be tempted into using a method that doesn’t require them to make decisions, see Andrew’s and my paper “Beyond subjective and objective in statistics”.
    http://www.stat.columbia.edu/~gelman/research/published/gelman_hennig_full_discussion.pdf

    Data are also overestimated thinking that data, even good data, is key for having good results and insights. Data won’t get us anywhere if we can’t ask proper interesting questions that the data can actually address, and if we don’t know the background, how the data are obtained, and all kinds of things that are known already about our topic of interest.

    Honesty is very, very important. First we need to be honest to ourselves – “forking paths” is about forgetting (not always consciously) what exactly we did in order to arrive at a result, and that exactly what is so easily forgotten may invalidate it. If we are dishonest, we cannot learn from criticism as far as the criticism is about what we said but not what we did. Honest work will be improved, dishonest work will be pointlessly defended. We have to be prepared to stand corrected and to look like an idiot at times (of course we may still defend what we really are convinced of, despite being self-critical). Reality is a force stronger than us, so we have to adapt if we get it wrong. And whether we get it wrong is a matter of reality and new data, not of whether we did all the right things recommended in textbooks.

    We need to be open about uncertainty; the first key is probability modelling, as it allows us to quantify uncertainty. However, there is always modelled uncertainty and uncertainty about the model. We need to question the model, and we should try hard to incorporate sources of uncertainty with the model in a new improved model, which will estimate uncertainty more accurately, but will still be vulnerable to certain issues and have entry points for improvements. Once more, this can often not be done in an automatic “objective” way, it requires user ideas and decisions, particularly regarding formalising background knowledge. Not incorporating a source of uncertainty in a model for the reason that we don’t know how exactly to do it and to justify it will give reality an opportunity to bite us.

    Enough for now.

  18. I wrote something that doesn’t appear for some reasons. I thought that was because I forgot to add my email address but with email address it doesn’t appear either. What about this one? Test, test…

  19. I would love to see a statement of Andrew’s principles, because there’s an aspect of his approach that seems contradictory or puzzling (I assume because I’m missing something). I’ll try and explain by describing my own (confused and partial) understanding of experimental design and data analysis.

    I assume that experimentation starts with the assumption that there is a hidden structure in the domain being investigated. In this hidden structure, some variables are causally connected to other variables (e.g. sunlight allows plant growth, nutrients allow plant growth, frost stops plant growth, there is an interaction between sunlight and nutrients etc) and other variables are not connected in that way. These hidden connections also have hidden parameters (some level of sunlight is optimal, frost is only damaging below a certain temperature etc). The aim in experimentation is to approximate or reveal this hidden structure in some way.

    Given this, there are three different categories of experiment: exploratory experiments, confirmatory experiments, and parameter estimation. Each type of experimentation asks a different question, and so is analysed in a different way.

    Exploratory experiments ask: what type of causal relationships could potentially hold in this domain? What candidate structures are worth considering further? Given questions like this, a good experimental design is one that provides a lot of information, and suggests a range of possible causal structures to the experimenter. A good form of analysis for this type of experiment is one which doesn’t miss possible worthwhile or informative relationship in the data. Post-hoc analysis, data-dredging etc are all good here, because the aim is to identify candidate structures that may be worth considering further. False positives, or type 1 errors, aren’t a problem here (because we are exploring, searching for suggestive patterns) but false negatives are (we don’t want to reject relationships that may be worth exploring further). Because a good exploratory experiment is designed to investigate lots of possible relationships, the “garden of forking paths” or “multiple comparisons” problems are unavoidable. This means that an exploratory experiment simply cannot tell us whether a given relationship is “real” (is actually a part of the hidden causal structure).

    Confirmatory experiments come next, and address this problem. These experiments ask “does this apparently plausible relationship actually hold?”. Of course, with finite data can never prove with certainty that a relationship holds; so instead we ask “can I convince even a hardened skeptic that this relationship is very likely to hold?”. This question has a yes-or-no answer (relative to the skeptic’s degree of skepticism, or required significance), and seems important, at least to me (thinking a given domain had a certain causal structure when in fact it didn’t would mean that we had misunderstood the domain). These experiments are defensive, and so are necessarily the opposite of exploratory experiments: “garden of forking paths” and multiple comparison issues have to be avoided in this experimental design, false negatives are not a problem but false positives are, data-dredging is bad, and so on.

    Confirmatory experiments tell us what the causal structure of a given domain is, bit by bit (subject to our degree of skepticism in accepting their results). The final type of experiment assumes that a given model of the hidden causal structure is in fact correct, and asks: given observed data, what are parameters of this model are most likely to hold? Since these experiments involve a form of exploration (in the space of parameter values for the assumed model), they seem to be subject to some of the “forking path” issues with the first type of exploratory experiment. Since these parameter-fitting experiments assume the model (the structure) as a prior, to me they seem to depend on a separate confirmatory or hypothesis-testing results to tell us that the model is “correct”. Fitting a model to data with no separate evidence telling you that the model’s structure is probably correct will lead a skeptic to say “sure, the model fits the data: but that’s just because you made the model with the data in mind. Remember the garden of forking paths? Noticing a pattern in a complex set of data doesn’t mean that the pattern is real: every complex set of data will have some noticeable patterns. “.

    Given this picture of experimentation and analysis, the thing I would love to know is: what are Andrew’s principles for selecting or constructing a model of a given domain? How does he convince himself and others that the model’s structure (approximately) captures (some of) the true structure of the target domain?

    • FIn:

      I’m not sure. Your question is related to a general issue in statistical theory of all kinds, Bayesian and otherwise: there’s a lot about testing and comparing models, and inference within a model, and using models for predictions, and robust models, and even some model-free theory, but not much on putting together the models in the first place. Model-free theory doesn’t really help, because this just shifts the question from “where does the model come from?” to “where does the method come from?”

      This question arises in applied science as well. It’s standard practice to evaluate a treatment using a randomized experiment and some sort of statistical analysis, but how did the researchers decide on the treatment in the first place? That’s often framed as being outside of formal statistics or science.

Leave a Reply to Peter Dorman Cancel reply

Your email address will not be published. Required fields are marked *