What are my statistical principles? What are yours?

This question came up a year ago, and I’m still not sure what my principles are!

Let’s start with this comment from Megan Higgs:

There is great value in each of us spending more time considering that we do have principles that guide us in our work even if we can’t easily write them down (or choose not to).

I [Megan] have thought a lot about my own Principles of Statistical Practice. What stands out to me is this — the most important information I gather to understand my principles comes from identifying practices/decisions/behaviors that do NOT align with my principles. It comes from paying attention to the things that make me cringe (even if I haven’t articulated a broader principle the practice is violating). That is, I know when my principles are violated even if I can’t satisfactorily explain the underlying principle being violated.

I understand the desire to want a nice, clean list of principles and then to check different practices against it, but I just don’t see that as the way things work. It is messier and more cyclical than that. In some ways, writing them out explicitly can force simplification that may encourage unnecessary rigidity, or at least less willingness to consider nuance. Not providing a single strong guiding position or list of principles is okay by me, even if it can feel uncomfortable. Maybe “refusing to ignore nuance” can be considered a high level statistical principle itself.

And this from Christian Hennig:

Start with what it’s all about: Finding things out about reality, more precisely about a reality that we cannot construct at will, that may force us to take it into account because there’s a good chance that we will get into trouble if we ignore it. A good result is a result that still stands if we and others make the best efforts to make it fall.

Data are key, but they are at the same time under- and overestimated. They are underestimated in the sense that they often hold much stronger information than what can be used assuming a certain model, namely informing us about issues with the model, giving us ideas about a better model, and then showing issues with that better model too. Data are overestimated when thinking that data can be trusted, tell the whole story, or that all decisions required for modeling, estimating, uncertainty assessment can be made based on the data alone. It has to be questioned how data were obtained, and it has to be accepted that many decisions in data analysis cannot be inferred simply from the data, but require background knowledge, and sometimes almost arbitrary decisions among several at first sight equally valid alternatives.

Reality is a force stronger than us, so we have to adapt if we get it wrong. And whether we get it wrong is a matter of reality and new data, not of whether we did all the right things recommended in textbooks.

We need to be open about uncertainty; the first key is probability modeling, as it allows us to quantify uncertainty. However, there is always modeled uncertainty and uncertainty about the model.

Now it’s my turn to talk. A difficulty with this sort of discussion is that it often seems to oscillate between procedural advice (Be open to the possibility of error. Don’t cheat. Share your data. Respect substantive theory.) and low-level details (Don’t use hypothesis tests. Graph your data. Use informative priors.), all of which seems like the wrong level of abstraction. There’s our workflow paper, but that’s kind of too specific to be called a set of principles.

So we’re still not there.

21 thoughts on “What are my statistical principles? What are yours?

  1. Having read your blog, one underlying theme seems to be “Analyze everything you have, and explore the data as thoroughly as you can.”

    This might be contrasted with another common principle other people than you have which would be “Prespecify your tests, and narrow the scope of inference only to what you can prespecify.”

    So like, your approach (seems to me) has more in common with say multiverse analysis or exploratory data analysis, while the other with pre-registered null hypothesis testing.

    I think both approaches try to deal with the garden of forking paths and human bias issues, but are different underlying principles to solve it.

    It’s sort of like, do you care more about finding the best model to describe the data, or that you were able to correctly guess the outcome before seeing the data.

    • Aren’t these two sides of the same coin, though?
      You do exploration to come up with a hypothesis, and then you design and pre-register a study and analysis to confirm that finding.
      You do that because your aim isn’t to find something interesting in the data, but because you want to find something interesting in the real world, and the data are just the means to get there.

      • I’d say a very common set of statistical principles that people hold is the distinction between exploratory and confirmatory research, with the latter requiring preregistration. Underlying that is another principle of general distrust of analysts to not engage in P-hacking, with a corollary belief that exploratory research unable to ever create generalizable knowledge as a result. So, another common principle is that only pre-registered, confirmatory research should be trusted.

        I think this is a coherent set of statistical principles one could apply (and might be the necessary solution in an environment where many people are motivated by publication instead of getting it right) but it is not the only approach one might take.

        However, I think that a different statistical ethos might argue that thorough, transparent exploratory analysis (e.g., lots of plots, clearly laying out all the analyses tried) can be just as good and can still generate knowledge that is useful in the real world.

        I think an informative question to ask yourself is which you’d trust more (a) a pre-registered test of a subpar analysis strategy or (b) a more rigorous analysis done post-hoc in an exploratory fashion.

        I think that tells you something of your statistical principles.

        • which you’d trust more (a) a pre-registered test of a subpar analysis strategy or (b) a more rigorous analysis done post-hoc in an exploratory fashion.

          I think that’s an unfair dichotomy because there is no reason for the pre-registered analysis to be worse than the exploration. You might as well compare (a) a pre-registered test with a well-thought-out analysis strategy or (b) an unsystematic exploration that threw ad-hoc methods at the data until something “significant” fell out.

          I think a fair question would be to assume a competent statistician and then ask if you’d trust more an exploratory analysis of a large data set the researcher happened to have lying around, or a pre-set analysis of a smaller set of data that the researcher gathered to answer a specific question.

        • Sure, let’s go with that example. It might be best to set aside the preregistration bit, because that’s really more about increasing transparency and trust in the analyst, so far as I can tell.

          In your example you’re balancing whether your care more about smaller standard errors (i.e., a large sample) or a more pure test of the hypothesis. So, I guess kind of if you value internal validity (i.e., the new data for a specific purpose) or narrower confidence/credible intervals for estimating the effect (the big, pre-existing data). If the small dataset was sufficiently small with really wide confidence intervals, I’d probably trust it less than the big one because I think that small samples really limit what we can learn. But if the “small” sample is still like >200, I’d probably trust it more; there are diminishing returns for larger samples.

          Another example in the same ballpark (but not the same) would be if you think measurement error is a bigger problem than poor internal validity. I think the big split in my subfield (social/personality psychology) was that the personality psychologists thought reducing measurement error was a bigger deal, but the social psychologists thought internal validity was more important. So you see big N longitudinal studies in personality research, and multiple small sample experiments in social psych more regularly.

          So, in my own principles, I think I value low measurement error and large sample sizes more than I value internal validity. I think that is different than the exploratory vs. confirmatory distinction. Does that kind of make sense?

          (I’m really just trying to engage with the question and break down what I think is most important when analyzing and interpreting results, when tradeoffs are necessary. Obviously, some excellent studies can have it all, but it’s interesting to think about what you would be willing to let slide first if resources or pragmatic concerns limited you).

  2. For some reason, readers of his blog have often seemed to denigrate Edward Tufte’s work. But, many of his quotes apply beautifully in my opinion. Just replace “graphical excellence” with “statistical excellence” and consider some of these:

    “Graphical excellence is the well-designed presentation of interesting data – a matter of substance, of statistics, and of design.

    Graphical excellence consists of complex ideas communicated with clarity, precision, and efficiency.

    Graphical excellence is that which gives the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.

    Graphical excellence is nearly always multivariate.

    And graphical excellence requires telling the truth about the data.”

    Or:

    “What is to be sought in designs for the display of information is the clear portrayal of complexity. Not the complication of the simple; rather the task of the designer is to give visual access to the subtle and the difficult – that is,
    the revelation of the complex.”

  3. So funny to read things we wrote awhile ago and barely remember writing! I still agree with everything in it, though not sure if it’s realistic or a bit of a cop-out. Regardless, I still can’t articulate a clean set of statistical principles. I agree with Andrew that any list I start quickly feels too procedural and superficial – and goes against my sense of statistical principles.

    • I agree with the thought that nuance is important in defining good statistical practices, but I’m always bothered by any variation on “I can’t define hard-core pornography, but I know it when I see it.” Many of us are asked to teach good statistical practices and if we can’t articulate the principles of good statistical practice, our students should be deeply suspicious. I think Feynman had it completely right on this (see url below) – if you can’t explain it to a child, you don’t understand it. So far, I don’t understand it.

      https://en.wikipedia.org/wiki/Feynman_Technique#:~:text=The%20Feynman%20Technique%20is%20a,concise%20thoughts%20and%20simple%20language.

    • Fair enough. I think your comment reflects the nuances of how different people think about “principles” (in general and relative to statistical inference) and the exercise of writing down a set of guiding principles that feels complete. For example, I could write down a few very broad principles, like “consider the nuances of particular problems,” “work to justify all assumptions and generalizations,” “don’t pretend as if statistical analysis rids a problem of uncertainty,” but it is difficult to choose a level at which to make the list that I would feel comfortable identifying as my “set of principles”. The fact I don’t (yet) have a clean set of articulated “principles” does not mean I don’t have principles or that I don’t have lists of good statistical practices (in line with my principles) that I share with students and collaborators.

      I often find lack of understanding, or at least under-appreciation of the depth and nuances of inference, connected to the desire for a nicely packaged list of rules or practices that can be followed by others. I believe that rules-of-thumb and checklists often do more harm than good because they discourage thinking and engagement with nuances and challenges. However, I do believe lists of good statistical practices can be helpful (e.g., “always plot your data”) – as long as they are context specific and acknowledge challenges and nuances. To me, “principles” are not the same as “practices.”

  4. > Start with what it’s all about: Finding things out about reality

    That doesn’t sound totally right. Maybe that’s what statistics is about, but saying all == statistics seems limiting here.

    > A difficulty with this sort of discussion is that it often seems to oscillate between procedural advice

    (And from the original message from Jared)

    > > I am not a statistician but am a long time reader of your blog

    > > One grave sin is wasting effort on uninformative experiments and analysis, when we could have gotten informative outcomes — even if negative

    > > Another grave sin is suppressing informative results — whether negative or positive. The file drawer problem should be seen as a moral failure

    These points seem to be less about procedural or details of stats and more about how stats interfaces with everything else going on.

    So stuff like — what is good enough? When to spend an hour on stats vs. something else? Are you getting anything in return?

    In this theme, blog-post-a-day seems like a statistical principle. It’s statistical themed, so that’s the first part, and then it’s a principle cuz you prioritize posting/blogging against a bunch of other stuff you could be doing.

    • Ben:

      I’m not sure what you mean by “prioritize posting/blogging against a bunch of other stuff you could be doing.” If each year I write 400 blog posts, teach 100 classes, give 40 lectures, publish 25 papers, and write 0.5 books, what is being prioritized? Blog posts because there are more of them? Books because each one is bigger? 0.5 of a book might be more effort than 400 posts, I’m not sure.

      • I don’t really know how such a prioritization would work (like write down a function to optimize), but since it happens then may as well call it a principle?

        Why does it get done? Well maybe it’s one of your statistical principles? So a principle would be sort of an axiomatic thing, from which other things follow.

        But I’m just trying to make a definition away from the details-of-statistics sort of things. It seems like there’s a lot there already.

        My thought when I read this this morning was statistical principles probably have something to do with how I balance statistics with the rest of my job; this isn’t necessarily a statistical calculation. And I p-hacked the original post to find the file drawer comment — that’s a journal/researcher/economics sorta incentive thing, which isn’t necessarily a statistical calculation either. Then I thought about the blog itself — why is it here? That’s sorta the line of thought at least.

  5. It strikes me that one of the difficulties in coming up with a set of statistical principles is that we use statistics for a lot of different things. Statistics is used to describe, summarize, and communicate structure in data. But statistics is also employed to help us figure out the meaning of data in terms of estimating parameters in a theory-driven model. That’s basically the classic distinction between “descriptive” and “inferential” statistics. But while I think many people could come to some agreement on principles of good communication (Dale’s Tufte quotes seem reasonable to me, for example), I don’t think we’ll easily agree on principles for the other uses of statistics.

    Well, maybe we could agree on principles about how to most efficiently estimate parameters of a model, but the principles involved in designing and testing the model and interpreting the meaning of the resulting parameters, those seem more elusive. Those principles would depend on knowledge of the domain of the model, the purpose of the model in context, etc. In painting, there are (probably) principles for how best to mix pigments, clean brushes, etc., but I doubt many could, in a satisfying way, enumerate the principles required to make good art.

  6. Interesting perspective.

    My take on this is that, assuming we aim at generating information quality, the 8 principal considerations for assessing this are:
    1. Data Resolution
    2. Data Structure
    3. Data Integration
    4. Temporal Relevance
    5. Chronology of Data and Goal
    6. Generalizability
    7. Operationalization
    8. Communication

    These are principles that help design and assess statistical work.

    For a recent example on applying this see: https://www.researchsquare.com/article/rs-892584/v1?fbclid=IwAR1V76YsOh20bQw7us6an5EnzrpxLRJUVSD4o8jzxj8vClyK-5MOD0QBN-c

    For more on information quality see http://infoq.galitshmueli.com/

  7. I came across the article at https://academic.oup.com/jcem/article/96/7/1911/2833671 yesterday, while trying to understand a physician’s reasoning for prescribing a particular Vitamin D pill. It does seem relevant to the current discussion on this blog; I’d be interested in people’s comments on the methodology.

    First, a quote explaining the (strange to me) notation involving a number followed by circles with or without pluses: “The Task Force also used consistent language and graphical descriptions of both the strength of a recommendation and the quality of evidence. In terms of the strength of the recommendation, strong recommendations use the phrase “we recommend” and the number 1, and weak recommendations use the phrase “we suggest” and the number 2. Cross-filled circles indicate the quality of the evidence, such that ⊕○○○ denotes very low quality evidence; ⊕⊕○○, low quality; ⊕⊕⊕○, moderate quality; and ⊕⊕⊕⊕, high quality. The Task Force has confidence that persons who receive care according to the strong recommendations will derive, on average, more good than harm. Weak recommendations require more careful consideration of the person’s circumstances, values, and preferences to determine the best course of action. Linked to each recommendation is a description of the evidence and the values that panelists considered in making the recommendation; in some instances, there are remarks, a section in which panelists offer technical suggestions for testing conditions, dosing, and monitoring. These technical comments reflect the best available evidence applied to a typical person being treated. Often this evidence comes from the unsystematic observations of the panelists and their values and preferences; therefore, these remarks should be considered suggestions.”

    • It is primarily the communication of “expert’s” views to folks who most not do research themselves or only occasionally.
      “The objective was to provide guidelines to clinicians for the evaluation, treatment, and prevention of vitamin D deficiency ”

      But prior to communicated the views, they have to be clearly discerned. There does seem to be a presumption in literature on designs for the display of information (like Dale pointed to) that discernment of the views was fairly unproblematic or least someone else’s problem.

      I think these two roles need to be better separated even if they are conflated – some interesting views (even misguided) may easily make for salient graphs.

  8. I’m not sure if this what you’re going for, but I have thought some about how to ground my work as a statistician in some basic principles. I came up with truth, service and humility. Truth because the goal of statistical analysis is to learn something about reality. Service because we rarely do statistics for its own sake but to assist and enlighten other endeavors. And humility because a we quantify uncertainty and this leads to a understanding of what we don’t know in addition to what we do. As a follower of Jesus, these principles are meaningful on a different level as well, as they can easily be grounded in the example and teaching of Christ the King.

  9. Ah, I was waiting for some time for Andrew to get back to that “statistical principles” posting. Very nice. Also quite a good job done condensing my original comment down to 30% or so but keeping the essential points clear. My comment was meant to somehow synthesize Andrew’s (as far as I know it) view and my own. I’m still pretty happy with what I wrote, and I think that I largely avoided broad procedural advice as well as low-level details; I rather tried to sketch a general attitude with which to do things.

  10. By the way, I also enjoy how philosophically realist it sounds, actually coming from a constructivist with anti-realist leanings (like for example Bas van Fraassen).

    • Christian:

      I agree it is nicely condensed.

      The points you make about the data are important and often overlooked.

      The first paragraph does look a lot like a summary of CS Peicre’s views perhaps missing defining truth as the (ideal that may never be obtained) limiting process “we and others … best efforts to make it fall” and it stops falling (that can never be ascertained).

Leave a Reply

Your email address will not be published. Required fields are marked *