Christian Hennig writes (see here for context):
Statistics is hard. Well-trained, experienced and knowledgeable statisticians disagree about standard methods. . . .
The 2021 [American Statistical Association] task force statement states: “Indeed, P-values and significance tests are among the most studied and best understood statistical procedures in the statistics literature.” I do not disagree with this. Probability models assign probabilities to sets, and considering the probability of a well chosen data-dependent set is a very elementary way to assess the compatibility of a model with the data. . . .
Still, considering the P-value as “among the best understood”, it is remarkable how much controversy, lack of understanding, and misunderstanding regarding them exist. Indeed there are issues with tests and P-values about which there is disagreement even among the most proficient experts, such as when and how exactly corrections for multiple testing should be used, or under what exact conditions a model can be taken as “valid”. Such decisions depend on the details of the individual situation, and there is no way around personal judgement.
I do not think that this is a specific defect of P-values and tests. The task of quantifying evidence and reasoning under uncertainty is so hard that problems of these or other kinds arise with all alternative approaches as well.
This is well put. Hennig continues:
A much bigger problem is the tension between the difficulty of statistics and the demand for it to be simple and readily available. Data analysis is essential for science, industry, and society as a whole. Not all data analysis can be done by highly qualified statisticians, and society cannot wait with analysing data for statisticians to achieve perfect understanding and agreement. On top of this there are incentives for producing headline grabbing results, and society tends to attribute authority to those who convey certainty rather than to those who emphasise uncertainty. . . .
Another important tension exists between the requirement for individual judgement and decision-making depending on the specifics of a situation, and the demand for automated mechanical procedures that can be easily taught, easily transferred from one situation to another, justified by appealing to simple general rules . . .
P-values are so elementary and apparently simple a tool that they are particularly suitable for mechanical use and misuse. To have the data’s verdict about a scientific hypothesis summarised in a single number is a very tempting perspective, even more so if it comes without the requirement to specify a prior first, which puts many practitioners off a Bayesian approach. As a bonus, there are apparently well established cutoff values so that the number can even be reduced to a binary “accept or reject” statement. Of course all this belies the difficulty of statistics and a proper account of the specifics of the situation.
As said in the 2016 ASA Statement, the P-value is an expression of the compatibility of the data with the null model, in a certain respect that is formalised by the test statistic. As such, I have no issues with tests and P-values as long as they are not interpreted as something that they are not. . . . It seems more difficult to acknowledge how models can help us to handle reality without being true, and how finding an incompatibility between data and model can be a starting point of an investigation how exactly reality is different and what that means. . . .
And then:
As statisticians we face the dilemma that we want statistics to be popular, authoritative, and in widespread use, but we also want it to be applied carefully and correctly, avoiding oversimplification and misinterpretation. That these aims are in conflict is in my view a major reason for the trouble with P-values, and if P-values were to be replaced by other approaches, I am convinced that we would see very similar trouble with them, and to some extent we already do.
Ultimately I believe that as statisticians we should stand by the complexity and richness of our discipline, including the plurality of approaches. We should resist the temptation to give those who want a simple device to generate strong claims what they want, yet we also need to teach methods that can be widely applied, with a proper appreciation of pitfalls and limitations, because otherwise much data will be analysed with even less insight. Making reference to the second quote above, we exactly need to “contradict ourselves” in the sense of conveying what can be done, together with what the problems of any such approach are.
That’s what we try to do in Regression and Other Stories!
The “second quote above” referred to close to the end was a student feedback that a colleague of mine once received for teaching a method and then also its limitations: “The lecturer contradicts herself.”
“Another important tension exists between the requirement for individual judgement and decision-making depending on the specifics of a situation, and the demand for automated mechanical procedures that can be easily taught, easily transferred from one situation to another, justified by appealing to simple general rules . . .”
Now it’s not clear to me whether Hennig wrote this or not, but I just wanted to point out that the statisticians on this blog have consistently denied that this tension exists. My take is that (1) the vast majority of scientific studies could be easily bucketized and analyzed using a standard statistical method, (2) any paper that introduces a new stats method should be about that method and not also about something else, and (3) the current free-for-all that has followed the disavowal of p-values is bad for science and is the fault of a statistical community that cannot seem to advance beyond bickering and does not want to do the heavy lifting of developing and endorsing standardized methods!
> My take is that (1) the vast majority of scientific studies could be easily bucketized and analyzed using a standard statistical method,
My take is every study involves a different process than every other study and hence every study needs a stats analysis tuned to the knowledge of what is going on in the study of interest.
There is no such thing as good canned statistics
You can can the statistics when the experiment is the same each time… Measuring contaminants in the water, doing blood tests in a lab, etc.
“Now it’s not clear to me whether Hennig wrote this or not” – why not? Andrew says so and also provides a link. ;-)
Matt:
Regarding your last sentence: I can’t speak for others, but as an author of three textbooks, a co-developer of statistical software, and the coiner of the phrase, “Statistics is the science of defaults,” I think I’ve done my part of the “the heavy lifting of developing and endorsing standardized methods!”
What is truth? What is evidence? When those questions become simple, then statistics may be simple. I don’t think that will ever happen – nor should it. These questions are about critical thinking, and this is difficult, ambiguous (to some extent), and requires context and judgement. Perhaps the failure is our educational system that tries to parcel statistics into discrete courses that emphasize specific mechanical skills. Then we build an entire edifice (academia and its associated research industry) on this misguided over-specialization.
I find this to be a really challenging problem in teaching ‘methods’ courses: If you don’t give the ‘plug-and-chug’ version, many people don’t feel like they’re learning statistics/methods. If you make it too high-level and conceptual, they don’t get enough hands-on feel to be confident doing it on their own. In my experience, though, teaching/being taught the mechanics of analysis in a classroom setting (i.e. in R/Stan etc.) has never been all *that* helpful since it is such a trial and error kind of process. But I don’t know if that reflects a self-selected world view since these are the sorts of things I like to do for fun and I know that most students don’t have the time/interest to engage in the kind of self-study that feels essential to me.
Devil’s Advocate:
Who cares? As long as everyone agrees that p<0.05 signifies something to talk about, I'm under no further obligation to understand p-values.
G:
The problem is that it is a mistake for people to “agree that p<0.05 signifies something to talk about." P-values are noisy data summaries that sometimes can yield a useful interpretation and sometimes can't.
The Devil’s Advocate doesn’t find this argument convincing. All measurements are by definition noisy. What are your standards for ‘Usefulness?’
G:
First, no, not all measurements are noisy, “by definition” or otherwise.
Second, I realize I can’t convince everyone. When Freakonomics promoted that paper, “Beautiful parents have more daughters,” I was unhappy because that research was nothing but noise mining—they might as well have just been casting horoscopes or reading the entrails of sheep—but they didn’t care. Noise mining is fun: it allows you to make all sorts of dramatic conclusions in the name of social science.
Anyway, yeah, I’ll make my arguments and do my best work, write textbooks, etc. I recognize that the world is full of people who won’t listen.
I think that Andrew and I differ on the meaning of the words “measurement” and “noisy”….
“sometimes can yield a useful interpretation and sometimes can’t.”
Ha, that’s great!!! Oh, man, too much brutal honesty. You’re hurting people (‘s incomes and egos and ahem probably even their raison d’etre)
Here’s the True Title for every statistics book:
Title: A Book on the Sometimes-It-Works-But-We-Don’t-Know-When Method of Inquiry
SubTitle: An Alternative Method To Science
Sub Sub Title: Suitable for Generating Equivocal Evidence for Just About Anything
Sub Sub Sub Title: (Once it’s Called “Evidence” It Can’t Be Ignored No Matter How Stupid The Method By Which It Was Obtained)
Explosion Callout One: No Knowledge Required!!!
Explosion Callout Two: Learn In Two Minutes!!!
Explosion Callout Three: Make Fun Colorful Plots!!
Explosion Callout Four: Sometimes Get A Correct Answer!!
Explosion Callout Five: Great For Pedaling Bullshit!!
Explosion Callout Six: Create Your Own Fake Universe of Belief!!
I foresee separate editions for different markets: Academia (with special section on scoring grants); Policy Advocates (Industry & NGO); Industry Insiders (How to Fool Your CEO into Thinking Data Analysis Actually Works); Company IPOs (Why Your New Data Analysis Firm Has Amazing Secret Sauce Unlike The Other 5000 Data Analysis Firms With IPOs In The Last Year)
I guess we should regulate a plastic wrapper to prevent adjacent books from getting greasy.
Anon:
That’s just rude and I don’t appreciate it. I’ve written several books and hundreds of articles about how to apply statistical methods in various contexts. In the above comment, I was very specifically responding to someone’s remark that seemed to imply that it’s ok if “everyone agrees that p<0.05 signifies something to talk about." My point is not the "p<0.05" is always a useless statement but rather that it often cannot be interpreted the way many people want it to be. If you want more, you can read some of my books and articles for many many examples.
BTW my prior on the prevalence of this POV in biomedical research is 90%.
Statistics may be hard, but it has been my experience that most ‘consulting’ statisticians bring no more statistical expertise than a decent bench scientist and also fail to learn enough about the subject at hand. They might enhance a clinical study with an MD who has half a semester of biostatistics, but not by much. The gap between that level and actual expert practice seems to be huge.
The case against p-values seems to require that you already believe the case against p-values before you believe the arguments against p-values.
May I just note that the constant denigration of astrology in this blog has convinced me to study it, because any time an academician gets worked up about pseudoscience it probably means there’s something interesting going on?
With astrology, can you allow that it is a convenient way to attach stories to constellations so I can remember them and find them and better situate myself spatially in the universe? Plus, why do the stories seem to fit my particular situation at least as often as “scientific” stories?
If you think things that academics denigrate are actually good, I suggest you start with a perpetual motion machine. Once you get that working, you can use it to power your astrological studies. Let us know how it goes!
Are p values well-understood? Perhaps there are thousands who do understand them well, but I would wager that for every person who understands them, there are 10 who misunderstand them. So, I would argue that p values are among the “worst understood” concepts in statistics, perhaps even among concepts overall.
The question is how hard those who misunderstand them have tried. I’d say there is a potential to understand them well, which isn’t realised by most people, at least partly for the reasons given in my text. You find misunderstandings of pretty much all other statistical concepts if enough people use them.
The same tension existed for Calculus circa 1670. It required enormous art and tact to use, which meant few were able to. Today though, it’s taught to millions of Freshman worldwide. That change happened because the foundations of the subject were sorted out; fleshed out, corrected, and simplified.
Statistics is stuck in the “art and tact” phase because the foundations are a mess. So that natural progression from an “esoteric tool available to few” to “common tool available to many” never happened. It was derailed mostly by the Bayesian-Frequentist debacle of the 20th century.
Statistics textbooks are left in the awkward position of trying to pretend it’s a “common tool for many”, like Freshman calculus is, because it should be by now.
This wont change soon though. Most Statisticians deny there’s much to be done on the foundations of statistics, in part because they don’t see a good way to improve it, and in part because they personally can use statistics well enough to get by. Each Statistician has their own little way of muddling through it all. Indeed, their paycheck often depends on them being able to muddle through it while most others cannot.
I believe modelling of uncertainty is essentially different from and more difficult than calculus. Calculus is deterministic and every result can easily be verified or falsified. Probabilities are about what could’ve happened other than what actually happened, and therefore refer to some extent to what is essentially unobservable.
I disagree for a bunch of reasons, but will just say that Calculus circa 1670 was at least as difficult to connect to the real world, and get right, as statistics is today. That changed, and there’s no reason to think statics is inherently incapable of a similar change.
Thanks though for illustrating the complacency I was trying to get at in my last paragraph!
As long as you don’t give your “bunch of reasons” we cannot discuss. I do agree that there is much to be done about the foundations of statistics and teaching of statistics, and I try to do my bit (some of which you can find if you look around, though chances are you won’t agree with much of it)), so accusing me of complacency is off the mark. Even if we want to improve we need to know our limitations.
Perhaps I was being to verbose. How about this?
“Statistics teaching is screwed up because Statistics is screwed up.”
Fix the later, and the former wont be a problem. It’s worked for every other subject.
I often use, and refer to Andrew’s quote “people want certainty, and they cannot have it”. Christian is describing the general attitude towards statistics, in science and in the public at large. It is always surprising to realize how disconnected statistics academia is from the realities in business and industry.
What Christian describes is several times accentuated in business and industry. Your role as a statistician is to help decision makers. This requires a life cycle engagement, from problem elicitation to communication of findings.
The pandemia forced statisticians to look carefully at the mirror and decide how, when and where to contribute. The tension between, on the one hand, i) a sense of responsibility that statistics disciplines need to play a role in pandemic management and ii) the gap between decision makers and statistics language produced a bridge that had to be crossed.
We recently published a review of our experience on this in Israel. See https://rdcu.be/cLMKZ
Three comments on Christian’s blog:
1. Statisticians are not alone in the analytic playground. We need to wake up to this new reality.
2. . I like the Applications-Mathematics-Computation triangle in the Efron Hastie CUP book on Computer Age Statistical Inference (p. 448). We need to move away from the Mathematics corner.
3. Such a move requires is openness to a discussion on methods to present and generalize findings (not only using parametric estimates or posteriors). For example dealing with verbal claims (using plain language).
Gelman and Carlin’s S type error are useful in using verbal representations of claims. I was glad to see it referred to in https://onlinelibrary.wiley.com/doi/full/10.1002/sim.9406. See my proposals in https://dl.acm.org/doi/abs/10.1007/s11192-021-03914-1
PS In the paper last quoted in my response the typesetters introduced a typo, sin type should be S type. A corrected version is available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2355474
The great hope of statistics for me is that it _not_ be restricted to a small group of very highly qualified people but should be available – at some level – to anybody with a case to make or a problem to solve. See for example “Box on Quality and Discovery”. It would be both inefficient and unjust if the ability to obtain support from data was limited to those with a decade of specialised education or access to and familiarity with specialised computer software.
This discussion might benefit from a clearer (louder?) separation of statistical *methods* from statistical *thinking*. Whenever I hear someone trashing p-values, it kicks off a little internal dialogue where I have to remind myself that the speaker couldn’t possibly be saying that the *idea* of a Neyman interval is a bad way to think about a data set’s compatibility with a given hypothesis. To me, the concept of a Neyman interval is a core element of statistical thinking (yeah, yeah, from a frequentist point of view…), and p-values for a simple mean or a regression model are just examples that help illustrate the idea (and which can sometimes be informative in reality).
I recently read (skimmed) a report from a group called the World Weather Attribution initiative, which tries to informing the question of extent to which global warming played a role in the latest weather disaster. For example, the Pacific Northwest heat dome of 2021 saw temperatures elevated about 30 deg F above their normal range for over a week. This appears to be the only time this has ever happened, at least since before the last ice age or something like that, so it’s the only data point of its kind. Anyway, one of their go-to tools is running meteorological models (presumably with different seeds or something like that), both with older historical parameters and with newer ones (which reflect the ~1 deg F warming we’ve seen so far), and they then check how frequently something like this happens under the two sets of assumptions. I can’t speak to the realism of the whole endeavor, but the general framework strikes me as something that is very natural to anyone who has absorbed the idea of a Neyman intervals (am I giving Neyman too much credit, BTW?). My point is that the practicing scientists who developed the approach needed a good grounding in statistical thinking, but a statistics-is-hard mastery of a bunch of named methods could have been more hindrance than help.
There’s obviously nothing novel in my point – Andrew’s books bend over backwards to illustrate fundamental ideas through accessible simulations. But it does seem like the separation between fundamental ideas and implementation details gets lost in translation in a lot of discussion on the hardness of statistics (and probably in a lot of teaching/learning of statistics as well).
I should have brushed up my history on Wikipedia before I posted. The core idea I was talking about was the general null hypothesis approach (i.e., applying whatever modeling tools you have to check how consistent the observed data is with the null hypothesis, and doing so by operating your tools under the assumption that the null hypothesis is correct). Apparently this basic idea dates back to at least the early 1700s, when rejecting the null hypothesis sounded like this: “From whence it follows, that it is Art, not Chance, that governs.” Advanced age notwithstanding, I think this remains a core element of statistical thinking, and I’m not sure there are very many core ideas that I would consider similarly fundamental.
Also, I really wish I had taken Hennig’s ASA quote, that the p-value is just an “expression of the compatibility of the data with the null model, in a certain respect that is formalized by the test statistic,” as my starting point because it succinctly hints that the details of formalization in a particular test statistic is sometimes beside the point.
This is an interesting point. I believe though that what is hard is more than just technical details; even more or less basic statistical thinking is hard to some extent, particularly when it comes to the role that models play. One hard issue is the interpretation that is to be given to the rejection (i.e. incompatibility with data) of a model. Usually the substantial interpretation of a model relies on some but not all technical characteristics of the model, and incompatibilities are usually interpreted as having refuted the substantial meaning of the model, but may in fact be due to details of the model that do not contribute to that meaning. Whether or not this is the case depends on the specific situation in ways that are hard to grasp for the non-specialist and often even for the specialist. (Similarly, results of any Bayesian analysis may rely on some details of the model specification that are not properly grounded in prior information and may be ignored when interpreting results.)
That makes sense, and is a good warning for me that my impulse to separate “fundamental ideas” from “implementation details” risks breezing past some seriously important issues. In my first read of the original post, I think I missed the significance of the statement about how “models help us handle reality without being true,” and your comment here helps me see it better. It’s hard to imagine teaching this kind of subtlety without a concrete model (or two or three) in hand, but I keep gravitating to the idea that the problem is largely about course curricula tend to fill up with formal methods at the expense of focused emphasis on more basic ideas.
Your point that formal tests don’t distinguish between rejecting the substantial meaning of a model versus rejecting some implementation detail is a good one. Some insights on this distinction may be implicit in a typical regression course segment on model building, but the issue seems to need explicit emphasis and discussion if students are to see how the problem hides in things like simple mean models or vastly complex climate models.
My main point is that distinctions like this can be accessible to pretty much anyone at a sort of common sense level. The entry point can be as simple comparing results from a simple mean model and a couple simple least-squares model that each attempt to control for a single covariate, or comparing results from a few different flavors of a complex climate model. Once you get into it, you can’t help but also notice the role of multiple comparisons.
And, while these basic ideas of statistical thinking are hard enough that they require time and focused attention to master, I just don’t think they’re hard in a way that puts them out of reach of anyone or that requires effort disproportional to their value for future scientists (or newspaper readers, for that matter).
Josh: You wont know how difficult it is to explain the role that models play until you try and if you find out why or succeed let us know.
My recent attempt with a quote from Christian in the next comment.
C. Hennig quote “The fact that P-values (and statistical reasoning in general) regard idealised models that are different from reality seems to be hard to stomach and easy to ignore; contrarily sometimes this is interpreted as testifying the uselessness of P-values (or frequentist statistical inference in general). It seems more difficult to acknowledge how models can help us to handle reality without being true, and how finding an incompatibility between data and model can be a starting point of an investigation how exactly reality is different and what that means. For this, a test gives a rough direction (such as “the mean looks too large”), which can be useful, but is certainly limited as information.”
The primary purpose of statistics is to make future sense of current observations given we need to best anticipate what to expect in the future. More informed expectations should allow one to interact with the world with less frustration and surprise. It’s all about profiting from observations in hand as well as those that can be obtained.
This purpose of statistics may be hard for many to discern given the plethora of weird formulas, procedures and rules usually encountered in most courses, talks and books on statistics. It seems like just complicated mathematics. But those are just some of the many tools that can be useful to achieve more informed expectations, if and only if they clearly understood for that purpose.
In order to understand the purpose of statistical methods, the why, how and what for, an adequate grasp of empirical scientific inquiry is required. Fortunately that is just everyday thinking made more explicit and controlled. Current neurological research even suggests scientific inquiry happens in all animals. To survive they need obtain informed expectations about what will happen in their environment. However, that likely only becomes conscious (explicit) and controlled in humans. Briefly, sensations about their environment need to be internally used to represent aspects of the environment and then those representations interpreted as a model of the environment in order to select a survival enhancing response. Eat or run.
Now, the almost insurmountable challenge statistics has to address is that all observations are about the past, but we need to deal with the future. The past is dead. Though it is helpful to be aware of the past but we need informed expectations of what will happen in the future. Which for many interesting and important things we know the future will not be exactly like the past. Below we offer some metaphors that might make this challenge clearer, as well as explicate how statistics addresses it.
Metaphors represent or present a parallelism in something that then attempts to transport an understanding from a more familiar area (a [schema](https://en.wikipedia.org/wiki/Schema_(psychology)) ) to a less familiar area. In statistics, the attempt is to transport an understanding of what to make of a familiar experience such as seeing shadows to what to make of outputs from a statistical analysis.
Think of learning about an object – just from the shadows it casts – while being unable to look directly at the object. We see those shadows but really are only interested in what is actually casting them. In statistics, observed experimental results are the shadows and we want to get some sense of what will cast them in the future – informed future expectations. The metaphorical parallel: we don’t take shadows as real but rather just connected to something that is real that we can’t see; the sample mean observed in a study is not the true mean but just a noisy estimate (shadow) of the true mean that we can’t “see” that informs what our future expectations should be.
Analytical chemistry maybe another helpful metaphor for statistics. Various machine readings are made of a chemical present, say in water in a test tube, that has been spiked with an exactly known amount (just that amount is added). How do the noisy machine readings (e.g. InfraRed spectroscopy) relate to the spiked amount known to be present, which again usually cannot be directly seen or measured once put into the test tube? The machine’s readings are the shadows here. The metaphorical parallel: machine readings are only informative when situated with in the distribution of differences from the spiked amount that would repeatedly observed (quantified schema); observed data should only be taken as informative in terms of what would be repeatedly observed. That is understood in the context of the future sampling distribution (quantified schema such as a predictive distribution). So to use observations to get informed future expectations, a sense of what would be repeatedly observed is required.
Scientific (experimental) statistics is not really like analytic chemistry. No one can simply spike say a known rate which rats will develop cancer. So statistics must make use of abstract representations of such possible realities – these are just possible worlds with set cancer rates. Such abstract possible worlds are purely mathematical. An abstraction we make to be exactly what we want it to be.
Now, the sole purpose of mathematics is to understand some or all hypotheses concerning the forms of relations in some abstract construction. Often referred to as the implications of the assumptions or model, in the end all one can do is discern the “reality” of a abstract possible world if it was as we imagined it to be.
All mathematical knowledge thus has a hypothetical structure: if such and such entities and structures are supposed to exist, then this and that follows. But in statistics we want to learn about the actual world so we can choose actions that are unlikely to lead to frustration and surprise. However we are stuck having to use mathematics to discern what would repeatedly happen. Recall one can only put one’s foot in the same river once. However the mathematics is just an initial step that is often an impediment rather than helpful for most. Statistics is not mathematics even though it must use mathematics. Fortunately, with modern computation the mathematics for applied statistics need not be hard to understand.
Now, any reliable way to discern the hypothetical structure of an abstraction is acceptable mathematics and fortunately most of the mathematics used in applied statistics can be recast visually into diagrams. With these diagrams, possible future observations can be simulated from abstract possible worlds and the need for formulas and mathematical analysis can be avoided, while clearly discerning what would repeatedly happen in such a possible world. So the mathematics impediment can be avoided but we still are only learning about abstract possible worlds. How do we address the real world with what was learned in such abstractions?
The first judgment required is to define abstract possible worlds that are not too different from our world in respect to aspects that affect what we are trying learn about. (Picasso joke*) But that is always failable as we never really know how the world actually is. We just can’t “see” as it is but only receive sensations about it. So we iterate, remake and rejudge possible worlds that we construct and simulate from – do the repeated observations look like they could have come from our world. If not, modify the possible world we defined until the repeated observations do look like they could have come from our world. Then we have to make an unavoidable bet that the current possible world we defined is close enough to our world. At least for adequately learning about some aspect our world to some degree of error. If it is, we have what statistician George Box coined a useful but still wrong model (representation). Unfortunately the only real signal we will ever get about our choice of a possible world is when we later notice it has mislead us.
Fortunately statistics does have a secret weapon that statisticians have become more aware of and are making much more use of these days, in large part due to ever increasing computational speeds. Simulation, the same technique we outlined above to remove mathematical impediments. It is a secret weapon because we can simulate observations from possible world A and then analyze those simulated observations assuming we believed they came from possible world B. Experience the effeciveness of a wrong model. Here it will be known just how wrong possible world B is compared possible world A. This is because we have simulated from possible world A so that A is the “true” world and hence we can assess how that wrong model B of a given degree retarded what we learned about possible world A. Possible world A is a surrogate for the real world that was failably analyzed by the statistician whom assumed the simulated observations were from possible world B.
Now, statistics almost exclusively uses probability models to represent possible worlds with regard to what would be repeatedly observed with what frequency. With more space and perhaps highly motivated readers this could be worked up into thought experiments to show how [diagrammatical reasoning](https://en.wikipedia.org/wiki/Diagrammatic_reasoning) can provide a vivid view of the logic of statistics. Where that logic is simply understood as using probability models to explicitly or implicitly represent a possible world in a mathematical form in order to discern what would be repeatedly observed in that possible world. From this it will be clearer that statistics in general can be seen as formalizing ways to learn from observations using mathematics rather than being mathematics.
@Josh: I may be misunderstood as saying that statistics should be left to highly trained experts (it may not be you who misunderstand me this way, but it has occurred in this thread before, and your “I just don’t think…” makes me wonder).
This is not what I want to say, even though I can see how I can be read in this way. For me it is a given that data are analysed and statistics is applied not only by experts, and I don’t believe anybody else is doomed to get things wrong. When I work with students or advisees, I try to be always optimistic about what they can understand, and for sure we should make our best attempt to teach and convey our discipline. I do believe that statistics is very hard; in fact it is also hard for us as experts (which we often don’t like to admit), but that doesn’t mean we or anyone else shouldn’t try. I’m not using this in an elitist manner to separate experts in the know from ignorant non-experts; rather I think that part of the proper understanding is to understand the limits of what statistics can do, and the limits of our own understanding and knowledge. This, I’d think, is within reach at any level.
Not sure whether you followed Andrew’s link to my full posting, but it starts with this quote:
“I work on Multidimensional Scaling for more than 40 years, and the longer I work on it, the more I realise how much of it I don’t understand. This presentation is about my current state of not understanding.” (John Gower, world leading expert on Multidimensional Scaling, on a conference in 2009)
Hennig quote “The fact that P-values (and statistical reasoning in general) regard idealised models that are different from reality seems to be hard to stomach and easy to ignore; contrarily sometimes this is interpreted as testifying the uselessness of P-values (or frequentist statistical inference in general). It seems more difficult to acknowledge how models can help us to handle reality without being true, and how finding an incompatibility between data and model can be a starting point of an investigation how exactly reality is different and what that means. For this, a test gives a rough direction (such as “the mean looks too large”), which can be useful, but is certainly limited as information.”
The primary purpose of statistics is to make future sense of current observations given we need to best anticipate what to expect in the future. More informed expectations should allow one to interact with the world with less frustration and surprise. It’s all about profiting from observations in hand as well as those that can be obtained.
This purpose of statistics may be hard for many to discern given the plethora of weird formulas, procedures and rules usually encountered in most courses, talks and books on statistics. It seems like just complicated mathematics. But those are just some of the many tools that can be useful to achieve more informed expectations, if and only if they clearly understood for that purpose.
In order to understand the purpose of statistical methods, the why, how and what for, an adequate grasp of empirical scientific inquiry is required. Fortunately that is just everyday thinking made more explicit and controlled. Current neurological research even suggests scientific inquiry happens in all animals. To survive they need obtain informed expectations about what will happen in their environment. However, that likely only becomes conscious (explicit) and controlled in humans. Briefly, sensations about their environment need to be internally used to represent aspects of the environment and then those representations interpreted as a model of the environment in order to select a survival enhancing response. Eat or run.
Now, the almost insurmountable challenge statistics has to address is that all observations are about the past, but we need to deal with the future. The past is dead. Though it is helpful to be aware of the past but we need informed expectations of what will happen in the future. Which for many interesting and important things we know the future will not be exactly like the past. Below we offer some metaphors that might make this challenge clearer, as well as explicate how statistics addresses it.
Metaphors represent or present a parallelism in something that then attempts to transport an understanding from a more familiar area (a [schema](https://en.wikipedia.org/wiki/Schema_(psychology)) ) to a less familiar area. In statistics, the attempt is to transport an understanding of what to make of a familiar experience such as seeing shadows to what to make of outputs from a statistical analysis.
Think of learning about an object – just from the shadows it casts – while being unable to look directly at the object. We see those shadows but really are only interested in what is actually casting them. In statistics, observed experimental results are the shadows and we want to get some sense of what will cast them in the future – informed future expectations. The metaphorical parallel: we don’t take shadows as real but rather just connected to something that is real that we can’t see; the sample mean observed in a study is not the true mean but just a noisy estimate (shadow) of the true mean that we can’t “see” that informs what our future expectations should be.
Analytical chemistry maybe another helpful metaphor for statistics. Various machine readings are made of a chemical present, say in water in a test tube, that has been spiked with an exactly known amount (just that amount is added). How do the noisy machine readings (e.g. InfraRed spectroscopy) relate to the spiked amount known to be present, which again usually cannot be directly seen or measured once put into the test tube? The machine’s readings are the shadows here. The metaphorical parallel: machine readings are only informative when situated with in the distribution of differences from the spiked amount that would repeatedly observed (quantified schema); observed data should only be taken as informative in terms of what would be repeatedly observed. That is understood in the context of the future sampling distribution (quantified schema such as a predictive distribution). So to use observations to get informed future expectations, a sense of what would be repeatedly observed is required.
Scientific (experimental) statistics is not really like analytic chemistry. No one can simply spike say a known rate which rats will develop cancer. So statistics must make use of abstract representations of such possible realities – these are just possible worlds with set cancer rates. Such abstract possible worlds are purely mathematical. An abstraction we make to be exactly what we want it to be.
Now, the sole purpose of mathematics is to understand some or all hypotheses concerning the forms of relations in some abstract construction. Often referred to as the implications of the assumptions or model, in the end all one can do is discern the “reality” of a abstract possible world if it was as we imagined it to be.
All mathematical knowledge thus has a hypothetical structure: if such and such entities and structures are supposed to exist, then this and that follows. But in statistics we want to learn about the actual world so we can choose actions that are unlikely to lead to frustration and surprise. However we are stuck having to use mathematics to discern what would repeatedly happen. Recall one can only put one’s foot in the same river once. However the mathematics is just an initial step that is often an impediment rather than helpful for most. Statistics is not mathematics even though it must use mathematics. Fortunately, with modern computation the mathematics for applied statistics need not be hard to understand.
Now, any reliable way to discern the hypothetical structure of an abstraction is acceptable mathematics and fortunately most of the mathematics used in applied statistics can be recast visually into diagrams. With these diagrams, possible future observations can be simulated from abstract possible worlds and the need for formulas and mathematical analysis can be avoided, while clearly discerning what would repeatedly happen in such a possible world. So the mathematics impediment can be avoided but we still are only learning about abstract possible worlds. How do we address the real world with what was learned in such abstractions?
The first judgment required is to define abstract possible worlds that are not too different from our world in respect to aspects that affect what we are trying learn about. (Picasso joke*) But that is always failable as we never really know how the world actually is. We just can’t “see” as it is but only receive sensations about it. So we iterate, remake and rejudge possible worlds that we construct and simulate from – do the repeated observations look like they could have come from our world. If not, modify the possible world we defined until the repeated observations do look like they could have come from our world. Then we have to make an unavoidable bet that the current possible world we defined is close enough to our world. At least for adequately learning about some aspect our world to some degree of error. If it is, we have what statistician George Box coined a useful but still wrong model (representation). Unfortunately the only real signal we will ever get about our choice of a possible world is when we later notice it has mislead us.
Fortunately statistics does have a secret weapon that statisticians have become more aware of and are making much more use of these days, in large part due to ever increasing computational speeds. Simulation, the same technique we outlined above to remove mathematical impediments. It is a secret weapon because we can simulate observations from possible world A and then analyze those simulated observations assuming we believed they came from possible world B. Experience the effeciveness of a wrong model. Here it will be known just how wrong possible world B is compared possible world A. This is because we have simulated from possible world A so that A is the “true” world and hence we can assess how that wrong model B of a given degree retarded what we learned about possible world A. Possible world A is a surrogate for the real world that was failably analyzed by the statistician whom assumed the simulated observations were from possible world B.
Now, statistics almost exclusively uses probability models to represent possible worlds with regard to what would be repeatedly observed with what frequency. With more space and perhaps highly motivated readers this could be worked up into thought experiments to show how diagrammatical reasoning can provide a vivid view of the logic of statistics. Where that logic is simply understood as using probability models to explicitly or implicitly represent a possible world in a mathematical form in order to discern what would be repeatedly observed in that possible world. From this it will be clearer that statistics in general can be seen as formalizing ways to learn from observations using mathematics rather than being mathematics. However, we can provide a brief overview or introduction.
That’s very elaborate to be put as comment 32 or so under a blog post, but I’m honoured that you put it under this one. I’m not sure if I’d be as optimistic as you seem to be about “avoiding mathematical analysis” – ultimately the workings of diagrams and simulations you seem to have in mind here rely on mathematics, don’t they? How far can the understanding of these “items” go without knowledge of the underlying mathematics? Don’t get me wrong, I’m all in favour of using diagrams and simulations to make statistical reasoning more accessible and comprehensible, yet there are limits to this, and I suspect at some point a deeper delve into the maths may still be required.
While I can’t speak for Keith, my own thinking about pedagogy has evolved to be similar to his. My impression is that mathematical analysis comprises a kind of “omnibus simulation”. Imagine you want to know how likely it is for the mean in one experimental condition to be greater than in another. One way to come by this estimate is to construct a simulation of how you think the data in each condition would come about (so considering the variability among cases, measurement error, plausible effects, etc.) and run this many times. Another way is to use analytic expressions for the variability, measurement error, effect distribution, and then use this to derive the borders of the region of the parameter space in which your condition holds. The mathematics tells you how an entire set of simulations would turn out, but you could (at least, in principle) find that out by simulating directly.
Both ways eventually lead you to the same understanding, although they have pluses and minuses. The simulation approach is more flexible and direct, but can be very inefficient if there are many parameters. The analytic approach can be more comprehensive and sometimes faster, but requires that predictions be derivable analytically, which restricts the kinds of models you can consider.
In my view, simulation and mathematical analysis are two roads to the same understanding of how the parameters of a system relate to its behavior. Given that both are useful, I think it is important for students to get training in both approaches, although I think it is often better to start with simulation. But the ultimate form by which you represent your understanding of the system need not itself be in terms of simulation or math.
I agree, except for this “But the ultimate form by which you represent your understanding of the system need not itself be in terms of simulation or math.” which I am not sure what you mean in a statistical context.
A representation is an abstraction and to best grasp it implications one should do diagrammatic reasoning involving some form of math, either simulation or algebraic analysis. A vague grasp might be by some form of translation or analogy to a re-representation whose implications are better grasped? An example might be trying to intuitively reason out whether Monte opening a door is informative in the Monte Hall problem.
> the ultimate form by which you represent your understanding of the system need not itself be in terms of simulation or math
Here, I was trying to emphasize that simulation and mathematics are tools for communicating with others and ourselves, for getting stuff “outside the head” where it is visible for all to see. I realize “representation” is an overloaded term, but I was trying to distinguish between external representations (like sim and math) and internal cognitive representations “inside the head”, as it were. So you might have some cognitive representation of a system that can be expressed in math or in the ability to construct a simulation, but maybe it is also expressed via ability to control the system (e.g., driving a car, kicking a field goal, etc.).
> to best grasp it implications one should do diagrammatic reasoning
When you say diagrammatic reasoning are you thinking specifically flow charts? Like bubbles connected with arrows?
I feel like diagrams can often be super misleading, even in systems where it seems like they would be useful (is information being sent from A to B? or B to A? or both? does an arrow from A to B mean A pushes information to B or B pulls information from A? or just mostly one of those two?).
My gut feeling here is I’d like pseudo-code better, though for the diagrams I know are confusing I don’t have clear psuedocode to convince myself that is better.
Anyway, are your diagrams actual literal graphical diagrams or are you including code/psuedocode as part of this?
gec – thanks, yes I am thinking of external representations as who knows what is going on in the head.
Ben – this would be a start https://en.wikipedia.org/wiki/Diagrammatic_reasoning however other mediums to workout the implications of abstractions such as code and algebra maybe better for some abstraction of for some people.
There many examples of mislead proofs, Andrew has claimed responsibility for providing at least one of those.
Ah I see — thanks Keith!
With regard to the use of math in general, statistical or otherwise, it would be useful for statisticians to understand the history of efforts to understand the geometry and dynamics of the solar system over time. The Copernican revolution wasn’t about new mathematical techniques. It was about a better conceptual model of the solar system – which required people to accept the previously unthinkable premise that the earth moves.
The point is that the math was consistent with several different concepts, but it couldn’t find the answer on its own. Once the correct conceptual model was in play, eventually refinements in measurement and math confirmed it. But without the correct conceptual model the math couldn’t do the job.
Ultimately whatever you’re modeling, without accurate conceptual models that are based on accurate observations of relationships, you’re doomed. Statistics alone doesn’t answer questions and it never will.
> the history of efforts to understand the geometry and dynamics of the solar system over time
Great example of scientific reasoning informed by statistics.
> very elaborate to be put as comment
I did provide a warning ;-)
> I suspect at some point a deeper delve into the maths may still be required.
More maths will always be helpful, at least up to some point. Somehow what I write seems to suggest to many that I am being dismissive of math. Rather just like like gec below I think that simulation is the better place to start.
> workings of diagrams and simulations you seem to have in mind here rely on mathematics … How far can the … without knowledge of the underlying mathematics?
Some mathematicians/philosophers argue that diagrammatical reasoning or experimenting on diagrams is math and some even that it is the highest form of math. It was disparaged widely until recently, when some showed it could be as rigorous as any form of math once the diagrammatical reasoning is fully developed. See Reasoning with Diagrams. Editors Amirouche Moktefi and Sun-Joo Shin. https://link.springer.com/book/10.1007%2F978-3-0348-0600-8 Given these arguments it can go all the way though that may not be the most effective/efficient way.
Wow, I really like the shadow analogy! But as you describe it, it’s a little too simple.
In many statistical applications, the shadow is potentially caused by dozens of entities in close proximity, potentially ranging in size from ants to mountains. In many applications, scientists don’t know how many entities and don’t know their size, nor the relative position of the light casting the shadow. In others, scientists may know some of the entities and have a rough idea of the size, but still don’t know how many additional entities or their size. The challenge is to separate the component and if possible the exact outline of the total shadow that belongs to each entity.
Using this analogy we can think of, for example, a drug trial as inserting a known entity into an unknown mix and attempting to distinguish it’s shadow. We know the entity casts a shadow, but we don’t know how large the shadow (the drug shows some effectiveness in simple chemical tests or roughly analogous contexts) will be relative to the other entities in the mix. If the entity is a relative elephant and the mix is a relative group of rodents, we can identify the entity and claim some success in the trial. However, if the entity is a deer and the mix is a group of elk (slightly larger than deer), we won’t be able to identify its unique shadow.
The other key point in your discussion:
“In order to understand the purpose of statistical methods, the why, how and what for, an adequate grasp of empirical scientific inquiry is required.”
Excellent. Yes. Without that, methods are irrelevant.
Thanks.
I found the shadow analogy is awkward to make increasingly realistic not that it could not be.
But at some point you likely will want to move on to abstract possibilities that some of us some sometimes refer to as fake worlds. Those allow you far more flexibility.
One way to make the shadow analogy more realistic is to note that least-squares regression is literally the projection (a type of shadow if ever there was one!) of the dependent variable onto the space spanned by the explanatory variables. This doesn’t really fit the analogy as you’ve set it up though because in the least-squares projection experimental results would be “real” and their projection onto explanatory variables would be the shadows. Your set-up puts it the other way around, which kind of makes me think of structural equation modeling, though I don’t know enough about SEM to say anything of substance on that.
Josh: Yes, you have repositioned the shadow analogy into the universe of mathematics where one can discern the “reality” of an abstract possible world if it was as we imagined it to be.
I left the shadow metaphor squarely in the universe of existences – some thing casts the shadow and the shadow blocks physical light. The analytic chemistry metaphor starts with something put into a test tube and the machine readings are connected to that (an index). Then I move to the universe of mathematical abstractions – no “direct connection” to anything in the universe of existences.
The whole challenge or raisons d’être of statistics, is to somehow transverse the universes of mathematical abstractions and existences. That need to make sense of observations for the future.
Now CS Peirce argued for three kinds a reals, the universe of existences that everyone accepts, the universes of mathematical abstractions (less mystical? than Platonic realism) and universe of “would be’s” which I currently understand (incorrectly?) as “would be understood as it should after potentially infinite adequate enquiry” or fully comprehensible. Both just being ideals that may never be realized.
To me, that is what is needed to understand statistical practice – fully.