Skip to content
 

Is “abandon statistical significance” like organically fed, free-range chicken?

The question: is good statistics scalable?

This comes up a lot in discussions on abandoning statistical significance, null-hypothesis significance testing, p-value thresholding, etc. I recommend accepting uncertainty, but what if it’s decision time—what to do?

How can the world function if the millions of scientific decisions currently made using statistical significance somehow have to be done another way? From that perspective, the suggestion to abandon statistical significance is like a recommendation that we all switch to eating organically fed, free-range chicken. This might be a good idea for any of us individually or with small groups, but it would just be too expensive to do on a national scale. (I don’t know if that’s true when it comes to chicken farming; I’m just making a general analogy here.)

Even if you agree with me that null-hypothesis significance testing is almost always a bad idea, that it would be better to accept uncertainty and propagate it through our decision making process rather than collapsing the wavefunction with every little experiment, even if you agree that current practices of reporting statistically significant comparisons as real and non-significant comparisons as zero are harmful and impede our scientific understanding, even if you’d rather use prior information in the steps of inference and reporting of results, even if you don’t believe in ESP, himmicanes, ages ending in 9, embodied cognition, and all the other silly and unreplicated results that were originally sold on the basis if statistically significance, even if you don’t think it’s correct to say that stents don’t work just because p was 0.20, even if . . . etc. . . . even if all that, you might still feel that our proposal to abandon statistical significance is unrealistic.

Sure, sure, you might say, if researchers who have the luxury to propagate their uncertainty, fine, but what if they need to make a decision right now about what ideas to pursue next? Sure, sure, null hypothesis significance testing is a joke, and Psychological Science has published a lot of bad papers, but journals have to do something, they need some rule, right? And there aren’t enough statisticians out there to carefully evaluate each claim. It’s not like every paper sent to a psychology journal can be sent to Uri Simonsohn, Greg Francis, etc., for review.

So, the argument goes, yes, there’s a place for context-appropriate statistical inference and decision making, but such analyses have to be done one at a time. Artisanal statistics may be something for all researchers to aspire to, but in the here and now they need effective, mass-produced tools, and p-values and statistical significance is what we’ve got.

My response

McShane, Gal, Robert, Tackett, and I wrote:

One might object here and call our position naive: do not editors and reviewers require some bright-line threshold to decide whether the data supporting a claim is far enough from pure noise to support publication? Do not statistical thresholds provide objective standards for what constitutes evidence, and does this not in turn provide a valuable brake on the subjectivity and personal biases of editors and reviewers?

We responded to this concern in two ways.

First:

Even were such a threshold needed, it would not make sense to set it based on the p-value given that it seldom makes sense to calibrate evidence as a function of this statistic and given that the costs and benefits of publishing noisy results varies by field. Additionally, the p-value is not a purely objective standard: different model specifications and statistical tests for the same data and null hypothesis yield different p-values; to complicate matters further, many subjective decisions regarding data protocols and analysis procedures such as coding and exclusion are required in practice and these often strongly impact the p-value ultimately reported.

Second:

We fail to see why such a threshold screening rule is needed: editors and reviewers already make publication decisions one at a time based on qualitative factors, and this could continue to happen if the p-value were demoted from its threshold screening rule to just one among many pieces of evidence.

To say it again:

Journals, regulatory agencies, and other decision-making bodies already use qualitative processes to make their decisions. Journals are already evaluating papers one at a time using a labor-intensive process. I don’t see that removing a de facto “p less than 0.05” rule would make this process any more difficult.

48 Comments

  1. Jon Baron says:

    As a journal editor, I sort of half agree. However, p-values to me are descriptively useful. For standard statistics like t-tests and correlations (possibly 90% of what I see), I understand what they mean. I try to keep up with other approaches, but my experience is much more limited, and I worry that things like Bayes factors, as used, have problems on the same order as those of standard p-values. In these ways, I think I am a “typical journal editor”.

    That said, I do not insist on p-values if something else convinces me that the claims are consistent with the results, and, yes, I frequently reject papers (or conclusions) that have very low p-values on other grounds, even other doubts about the credibility of the effects reported. And I am trying to discourage authors from filling up every sentence of the “Results” section with what look to me like splotches on the page, trying to satisfy every possible “best practice” with statistics, p-values, effects sizes and confidence intervals. Likewise correlation tables with asterisks, as if there were 20 different hypotheses being tested.

    • Do you endorse a lowered threshold as explicated by Benjamin et Al, ‘retire stat sig as by Valentin Amrhein & Sander Greenland, or as in ‘abandon stat-sig by Gelman & McShane?

      Somewhere there is a chart that enlists the strengths & drawbacks for each choice.

    • Matt Skaggs says:

      Jon Baron wrote:

      “…p-values to me are descriptively useful. For standard statistics like t-tests and correlations (possibly 90% of what I see), I understand what they mean. I try to keep up with other approaches, but my experience is much more limited, and I worry that things like Bayes factors, as used, have problems on the same order as those of standard p-values.”

      There is a key point here, the “language” of statistics. For many years, operas just had to be written in Latin, which few understood.

      The last thing we want is for pointless statistical razzle-dazzle to improve the odds of getting a paper published. What happens when a competent journal editor looks at the statistical treatment and decides that it is way too complicated to parse?

      One of the most frustrating experiences of my career was having an intentionally simple model criticized and rejected in favor of a more sophisticated model. The problem was that my model worked really well at achieving the stated goal, while the rival model included measurement results that were very noisy and hard to get, but made no demonstrable improvement to the accuracy. I could just see it in the decisionmaker’s eyes. They barely understood what the models were doing, but did not want to reveal their ignorance by picking the simpler model. It did not go well. The extraneous measurements were expensive and budgets were tight. A couple years later we quietly started using my model.

      With so much variation in statistical comprehension in academia, the last thing fields like psychology need is statistical sophistication creep. How can you open things up to alternative approaches while avoiding the sophistication creep?

  2. Reminds me of when surgeons complained of have to wash their hands, given they had a pedicure earlier in the week.

    Came across this passage from Six Persistent Research Misconceptions. Kenneth J. Rothman https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4061362/ that seems to have a lot of explanatory power regarding persistence of substandard methods in academia.

    “Why do such important misconceptions about research persist? To a large extent these misconceptions represent substitutes for more thoughtful and difficult tasks. … These misconceptions involve taking the low road [to achieving and maintaining academic prestige in methodology] , but when that road is crowded with others taking the same path, there may be little reason to question the route.”

  3. Noah Motion says:

    I like the organic, free range chicken analogy, and I want to push it a little bit further by pointing out that organic, free range chicken is much more scalable if people also cut down on how much chicken they eat. By analogy, statistical-analysis-as-uncertainty-propagation should be more scalable if people also cut down on how much inferential statistical analysis they do.

    • P-luck, p-luck, p-luck bard in the hen yard for two hens squaring off while we trynna decide who rules the rooster. lol

    • Phil says:

      Seconding Noah here. I’ve had this discussion with other people who have said that treating animals humanely is impossible: there’s not enough farmland to provide them a decent amount of space, so we have to keep them crammed into buildings: required amount of space = number of animals x space per animal, and if the number of animals stays the same while the space per animal goes way up, that’s not going to work.

      But this is just the ‘all else equal’ fallacy again. The reason people do ‘factory farming’ is that it is a much cheaper way to raise a given amount of meat than traditional farming (as it used to be called; now that factory farming has been dominant for decades, I suppose raising free-ranging animals outdoors is ‘alternative’ or something). If all animals were raised humanely, meat would be a lot more expensive so people would eat less of it. There wouldn’t be nearly as many animals, so although the space per animal goes way up, the number of animals goes way down. I don’t know which way the product would go, but it would not go up nearly as much as it would under the assumption that the number of animals stays constant.

      And at any rate I don’t see how, if everyone eats organic chicken, the result can be “too expensive at a national scale”: if raising such-and-such number of chickens organically is ‘too expensive’ then that number of chickens will not be raised.

      • Andrew says:

        There’s gotta be a word for this, when a concept is used as a comparison point of an analogy, but then there’s a debate about the comparison point.

        When writing this post, I wasn’t thinking about having a discussion regarding free-range chicken—it was just my convenient analogy—but, yeah, this all makes sense.

      • Yeah, but then there’s the question of you gotta get the calories some way or another. Right now, a fixed number of calories taken from vegetables will be much more expensive than for chicken… so if the price of chicken goes up, and you have to keep your calories constant to live, then just the price of food goes up, the demand for chicken doesn’t drop until it gets expensive enough that eating a ton of veggies is cost competitive… at which point people may be dying from lack of calories… or not. I haven’t done the calculation, but I know if you want to eat 1000 calories of zucchini it will require setting aside several hours to stuff it into your gut…that’s a lot of zucchini at 21 calories per 100g… I’m not sure I can eat 5 kg of zucchini a day, and it’s maybe $1.50/lb so it’ll cost you $16.

        Chicken is 165 cal/100g and cost about $1.75/lb so 600g = 1000cal, costs you $2.31

        • Martha (Smith) says:

          Who the heck would think of eating zucchini for calories?? It’s known for being a low-calorie food (hence the current popularity of “zoodles”).

          A more reasonable source for calories from vegetables: potatoes — a once staple food. They have about 360 calories per lb, and cost about 75 cents per lb, so 1000 calories would cost a bit less than $2.15 — a little less than your calculation for chicken. And having a little less than a pound of potatoes for each of three meals is quite feasible — however they don’t have a lot of protein (but a little more than zucchini).

          For more protein, you probably need to go to beans — I’ll let you do the calculations.

          • Well, I just picked a vegetable I liked… but there’s no question that if you want calories from veggies they will come from starchy foods, grains, tubers…

            Obviously it’s possible to be a vegetarian, but I’m not convinced it’s possible to be a healthy vegetarian at lower cost. of course part of that is all the subsidies going to ranchers and soforth. in the absence of all market manipulation by the govt, perhaps veggies would be cheaper

          • Noah Motion says:

            Also, I think people in the US, Canada, Western Europe, and probably a bunch of other heavily meat-eating cultures are pretty far from needing to worry about survival-level caloric intake.

          • I prefer pea protein over soy b/c the latter is claimed to have estrogenic effects. Not fully up on the literature tho.

          • Noah: if you tripled the price of their meat, and they just cut it out not so much.

            My point was more like this: if the price of umbrellas goes up a thousand fold, you can just walk around in the rain… but if the price of food goes up a thousand fold people die, you can’t just forgo food. This is a reason why I think both food and shelter should be the core of any poverty index.

          • Also, the “healthiness” of a typical American vegetarian’s diet is due to the fact that they don’t primarily eat beans, rice, and potatoes all day long. That diet is more like what you’d see in rural Mexico or some such thing.

            On the other hand, suppose we convert to free range chicken farming and the price of chicken goes from $1.50 to $7 a pound… Now hundreds of millions of people will convert to vegetarian diets, and the price of grain, potatoes, corn, and also all the healthy variety produce such as spinach and collard and avocado and kale and etc will go up considerably as will the expansion of land use for vegetables.

            On such a large scale, it’s hard to understand where that would leave the diet and prices and production levels. It could ultimately be a fine thing, but it could also be a disaster in the short term.

            • Carlos Ungil says:

              > the “healthiness” of a typical American vegetarian’s diet is due to the fact that they don’t primarily eat beans, rice, and potatoes all day long

              What’s “unhealthy” about those?

              • I don’t think it’s unhealthy to eat those things, I do think it’s unhealthy to eat those things and not much else.

                This whole conversation is about the difference between a small scale change and a large scale change. In the small scale: if one person decides to become a vegetarian, it doesn’t affect the price of anything at all… they have lots of broccoli and kale and green beans and spinach to choose from to add onto their main sources of calories, which will inevitably be grains and tubers and legumes and oily fruits like avocado or olives or whatever…

                On the large scale, if we simply stop “factory farming” animals, the effect is clearly more drastic. There is *no question* that we have 7 B people on the earth today because super-cost-efficient farming enables us to feed them all. If we go to “organic free range” meat would the carrying capacity of the earth drop from say upwards of 7B down to say 4.5B or 2B or something so that 2-5 B people would die over the next couple decades?

                I don’t think it’s trivial to answer that. What we can say is that heading back to say 1900’s type agriculture, the population of the world was about 1.5B and it kinked up sharply as industrialized agriculture took off, also that the fraction of people employed in farming plummeted from upwards of 50% of many countries, to the single digit percentages today.

                https://en.wikipedia.org/wiki/World_population

                https://ourworldindata.org/employment-in-agriculture

                Large scale changes are nothing like small scale changes, but it’s useful to consider what the constraints on the system are. You can’t forgo food is a pretty strong constraint on what is possible at the large scale.

              • Carlos Ungil says:

                It seems that the word you were looking for was “exclusively”, not “primarily”.

              • Well, primarily to me means something like maybe more than 80 percent, exclusively means exactly 100%… and I think even getting 80 or 90% of your food from rice, beans, and potatoes would be unhealthy.

                Obviously, getting 80 or 90% of your food from chicken would be unhealthy too… but getting 35% from chicken and 45% from a mixture of rice, beans, and potatoes, and 20% from a mixture of green vegetables and fruit would be pretty good. Eliminating the chicken and replacing it with rice, beans, and potatoes leaving you at 80% rice, beans, and potatoes, and 20% mixed veg and fruit… I think that would be a significant decrease in quality.

                If in this hypothetical elimination of chicken you went to say 55% rice, beans and potatoes, and 45% mixed veg and fruit it might well be a health benefit, but now you are trying to get a lot more calories from mixed veg and fruit, and it’ll be both a larger volume to get the calories, and a lot more money.

              • To bring it all back to the topic at hand: if you eliminate rubber-stamp NHST and go 100% artisanal Bayesian modeling, there is no constraint that tells you that you have to keep the total volume of papers coming out the same, unlike the constraint that you have to keep the total calories per person constant with food.

                We could dramatically scale back the bullshit, spend more time on quality research, and get dramatically better research outcomes with a lot less noise. On the other hand, the number of jobs for people who are good at “playing the academic game” using the “p=0.02 so it’s real” technology would plummet. That would be a good thing for the world, but is obviously not something those people are going to roll over on easily, and those people have a bunch of political power today.

              • Martha (Smith) says:

                Daniel said,

                ” I think even getting 80 or 90% of your food from rice, beans, and potatoes would be unhealthy.”

                How do you define “80 or 90% of your food”?

                Have you ever heard of a “well balanced diet?” (See e.g., https://www.nhs.uk/live-well/eat-well/ or https://www.nutritionaction.com/daily/what-to-eat/the-grandparents-diet/)

              • Daniel Lakeland says:

                I was talking about calorie fraction…

                one way to describe my point is that there are two important constraints, one is total calories, and the other is the need to maintain variety. when you remove chicken you lose calories and variety, so it’s not ok to just make up the calories by the cheapest per calorie available food, because you don’t maintain the variety.

  4. There’s nothing the slightest bit healthy for humans or animals in calling for a ban on thresholds for testing, statistical or nonstatistical. There is no testing and no falsification (even of a statistical sort) without them. See my post https://errorstatistics.com/2019/11/14/the-asas-p-value-project-why-its-doing-more-harm-than-good-cont-from-11-4-19/
    “The ASA’s P-value Project: Why it’s Doing More Harm than Good”. It is a companion post to my previous “On Some Self-Defeating Aspects of the ASA’s 2019 Recommendations on Statistical Significance Tests” 

    Yet you claim to be a falsificationist!

    Do you infer there’s a flaw in your model when you find a very small P-value? Or not? What if you find small P-a few times? If yes, then you’re inferring from small P-values that there’s evidence of a genuine problem in your model. So you can’t at the same time claim it’s non-sensical.

    In a world without P-value thresholds, the eager investigator will still need to arrive at a small P-value to claim to have evidence the observed difference couldn’t readily have arisen by chance fluctuations. Else he is saying, even though you can frequently generate even larger differences than mine by chance alone, I claim my observed difference indicates it did not come about by chance alone. So he’ll still need to “spin” or ransack (now unblinded) data to come up with a post hoc factor that attains nominally small P-values–only now we’re barred from holding him accountable for cheating.

    • Andrew says:

      Deborah:

      You write, “There’s nothing the slightest bit healthy for humans or animals in calling for a ban on thresholds for testing, statistical or nonstatistical.”

      Whoever you’re arguing with here, it’s not me! I’ve never called for a ban on anything.

      Getting to the specifics of your comments: I think that there are settings where p-values can be useful. I still don’t see the need for a binary threshold. Thresholding just adds noise—in many cases, a lot of noise.

      Anyway, my collaborators and I are not trying to “ban” this; we’re just recommending what I consider to be superior alternatives, and we’re pointing out flaws in the logic and practice associated with p-value thresholding (or null hypothesis significance testing more generally).

      • RE: ANDREW:
        ‘Journals, regulatory agencies, and other decision-making bodies already use qualitative processes to make their decisions. Journals are already evaluating papers one at a time using a labor-intensive process. I don’t see that removing a de facto “p less than 0.05” rule would make this process any more difficult.’
        ——–
        These ‘qualitative’ processes should be evaluated even more transparently. I like the proposal bandied by a few experts is to ‘live stream’ the decision processes. Such an endeavor does call into debate who should have access to such a live stream. Clearly, consumers of information and patients are major stakeholders. After all, who are we to privilege? And why?

        Some experts contradict themselves by suggesting that non-experts need professional and academic training to be included. This is especially curious to hear when a non-professionally & academically trained can forge some of the same insights that have been circulating for decades. And not very complex statistical training is required either for such inclusion. Where I think some outsiders can contribute is to critically think through some of the illustrations provided by expertise b/c these illustrations may not be so relevant as deemed.

      • Anonymous says:

        As you know, I have opposed binary classifications, ignoring the actual outcome, for donkey’s years. Using statistical significance tests needn’t be abused as you aver, where a stat sig effect is taken as evidence for a causal or other substantive claim, and nonsignificance is taken as 0 effect. It is a mistake as well to confuse a pre-designated threshold, set taking account of discrepancies of interest as well as other background information, with a binary report. As a leading voice in “abandon” significance, you encourage the view that assessing the P-value for the task of distinguishing real from spurious effects should be dropped. What is your alternative method for this task?

        • Andrew says:

          Deborah:

          For most of the problems I’ve seen, I don’t think that “distinguishing real from spurious effects” is a useful goal. For example, does the famous “power pose” study represent a real or a spurious effect? I don’t think there’s a good answer here. Presumably, power pose has some effects, positive for some people and negative for others. We can make statistical statements such as, If the average effect is X, with variation Z, then we’d expect a study of N people to estimate X within an accuracy of . . . etc. But I don’t see the need or benefit for a method for doing the impossible task of distinguishing real from spurious effects.

          If I did want to do something similar to distinguishing real from spurious effects, I’d do it modeling the potential real effects and performing inference on them, not by look at a tail-area probability with respect to a null hypothesis of zero that I don’t believe.

    • Andrew says:

      Deborah:

      There’s something in your comment that perhaps we can discuss. You say, No threshold, no falsification. I disagree. I think I can falsify all the time without a threshold. That said, my statistical falsification is not deterministic. I think you’re making a mistake by trying to create a deterministic form of falsification from statistical data. Even in the famous Rosenkrantz and Guildenstern example, the falsification is only probabilistic.

      To put it another way, I think you’re too eager to collapse the wavefunction, as it were. To me, thresholding is an artificial step that might sometimes be done for convenience but has no general value.

      • Anonymous says:

        Physicists significance testing given by Jaynes here: https://bayes.wustl.edu/etj/articles/what.question.pdf
        at section 4 “what is is our rational” on the 6th page and carrying on for a couple of pages.

        Key quote:

        “It is in the criterion for retaining H0 that we seem to differ; contrast the physicist’s rational with that usually advanced by statisticians, Bayesian or otherwise. When we retain the null hypothesis, our reason is not that it has emerged from the test with a high posterior probability, or even that it has accounted well for the data. H0 is retained for the totally different reason that if the most sensitive available test fails to detect its existence, the new effect can have no observable consequences. That is, we are still free to adopt the alternative H1 if we wish to; but then we shall be obliged to use a value of lambda so close the previous lambda that all our resulting predictive distributions be indistinguishable from those based on H0.”

        The same page includes an example of not using thresholds you might find agreeable.

        The assumption on this blog is that scientific inference is something inherently difficult and that’s why there’s we’re in the current quagmire. In reality the average scientist had no trouble making scientific inferences before the rise of Frequentist Statistical testing and the philosophy that went along with it.

        • Andrew says:

          Anon:

          1. The Jaynes quote is similar to something that I have said, which is that rejection of a null hypothesis doesn’t tell us very much (as we already know ahead of time that our hypotheses are false), but non-rejection does tell us something. Interestingly, this is pretty much the opposite of what is taught in statistics classes and it’s pretty much the opposite of how statistical hypothesis tests are usually applied.

          2. It may be that for well-defined physics models, scientific inference is easy. For models in social and environmental science, it’s not so easy at all!

        • jim says:

          “In reality the average scientist had no trouble making scientific inferences before the rise of Frequentist Statistical testing and the philosophy that went along with it.”

          LOL!!! :))))

          That’s because never before was there a method that, with horrible method abuse and poor lab work, became even *more* likely to produce a publishable result! :))))

          Suddenly I see the sense behind mostly killing the P cut-off. You can still keep P as a useful method under appropriate circumstances. But P < 0.05 shouldn't be a universal equation for publication nor any kind of stamp of scientific validity.

      • Deborah G. Mayo says:

        Andrew: My last comment accidentally went out without any name. But it only repeats what I and others have said before. Using statistical significance tests for the task of distinguishing genuine from spurious effects needn’t be done in the fallacious manner wherein a stat sig result is taken as evidence for a substantive or causal claim. (The latter hasn’t been probed by the stat test.) I don’t know what your alt method is for this piecemeal task.
        I’m only talking of falsifying statistically–that’s why tests use a P-value threshold. When you falsify, if you ever do, it’s because given outcomes are taken as incompatible with a claim–even though the data aren’t logically inconsistent with it.

      • Justin Smith says:

        “To me, thresholding is an artificial step that might sometimes be done for convenience but has no general value.”

        In Boole’s “The Laws of Thought”, he put forth that we learn from and function in the world through the ‘law of thought’ X^2=X or X(1-X)=0, where X is a class and the operations are class operations. That is, he is essentially saying dichotomania is a requirement.

        Of course, the Statistics Police were on the case immediately! See https://dichotomania.com/blog/f/banned-book-8

        Justin

        • Anonymous says:

          Justin,

          Had you considered mixing in some humor with the sarcasm?

          How about “Protesters demanded Boolean Algebra be banned chanting ‘Hey, hey, ho, ho, dichotomous thinking probably has to go!'”

  5. Zad Chow says:

    Wow. 450~ comments on the original discussion.

  6. A.G.McDowell says:

    What worries me is somebody saying. “Well of course, if you use the simplistic p-value analysis seen in this paper there would appear to be a serious problem. In fact, we would be seriously morally derelict if we didn’t do something immediately, at some cost to ourselves. However, we all know that p-values are utterly simplistic, and we have found this much more sophisticated analysis which, in our qualitative judgement, shows that the case for the paper has not been proven. So we reject it and – thank goodness for us – don’t have to do anything.”

  7. Emmanuel Charpentier says:

    Andrew,

    What would you think about using a Bayes factor for assessing the weight of evidence for/against an alternative to an hypothetical “null model” (or, better, a function expressing said Bayes factor as a function of prior distribution parameters for the excluded factor) in the classical case of nested, one-parameter “testing” ?

  8. Andrew says:

    Emmanuel:

    I’ve never seen an example where this makes sense to me, but I’m open to the possibility that this method could work.

  9. Ron Kenett says:

    The what and the how
    ——————–

    This post by Andrew seems to reverse the apparent beligerate position of some against significance and p – values.

    Specifically it states: “How can the world function if the millions of scientific decisions currently made using statistical significance somehow have to be done another way? From that perspective, the suggestion to abandon statistical significance is like a recommendation that we all switch to eating organically fed, free-range chicken.”

    Absurdly the free-range chicken became the main topic of discussion, but let me leave this curiosity aspect.

    In Hebrew there is a proverb saying: “A clever person solves a problem. A wise person avoids it.”

    Several statisticians who signed petitions and made contributions to ASA II (as Mayo labeled this second move by ASA) are apparently clever, but do not seem wise.

    As Andrew writes, these type of self destructing statements show (to me) an academic misguided view of reality. Academia should prove more responsible and avoid the problems described at the beginning of this post. Avoid means “wise” not “clever.”

    Discussion should be encouraged, divergent views should be given a platform to be expressed. Improvements are possible. Focus on what to do and not on what to not do…. Severe testing by Mayo is one example. My contribution on finding generalisation with alternative representations is another. See https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3035070. I know that the approach of setting a boundary of meaning (BOM) works because clinicians are using this and understand it. The Bayes factor I tried to explain to them was often treated with cynical remarks. Statistics that is not used is not worth much. Again, it can be considered “clever” but it is not necessarily “wise”.

    I believe that the reality test which gets us to avoid problems needs to be taken more seriously.

    • To think that the conversation above was about chicken was to miss the point of the whole thing. The point was Andrew gave an analogy where someone moves from one scalable practice to another non-scalable practice… and we discussed the factors which might be similar and different. For example, with food, you have two kinds of constraints: people need a certain number of calories per day for survival, and people need a certain amount of variety, and so they can’t simply substitute the cheapest alternative source of calories.

      The ways in which this are similar and different to the NHST vs Bespoke Bayesian analysis switch are:

      1) Without a sufficient quantity of NHST there are vast populations of scientists who would “die off” because there are people who are “carnivores” that can only survive in science by “eating this cheap meat” of NHST. However, unlike a famine die off of the population, getting a lot of people who are primarily not contributing to knowledge out of science would be a good thing (the Wansinks of the world). We can radically scale back what we are doing today, and do 100% bespoke thoughtful science, and be better off for it.

      2) There is no real variety requirement… we can do 100% bespoke analysis and this won’t cause us to lose out on essential nutrients etc. bespoke analysis is like a high quality health-food diet with whole grains, leafy vegetables, meats, fish, seafood, etc. It in and of itself is a fully balanced way to do science, it’s just not cheap because it requires experts to think about and design experiments, data collection, or analyses for each different question. However, we’ve been fooling ourselves that we can substitute this “healthy balanced diet” with Ho-Hos and Twinkies, which is what NHST is the intellectual equivalent of… it doesn’t work.

      • Justin says:

        “However, we’ve been fooling ourselves that we can substitute this “healthy balanced diet” with Ho-Hos and Twinkies, which is what NHST is the intellectual equivalent of… it doesn’t work.”

        That sure is a fun mantra to repeat ad infinitum, but it did in fact work for some recent and past Nobel Prize winners and also Google analyzing data to demonstrate quantum supremacy, for some quick examples.

        Justin
        http://www.statisticool.com

        • Anonymous says:

          Another guy who posts links to blogs covered in advertising, in this case really covered in it… like several 4 inch by 8 inch banner ads on a big landing page where you have to click the word “content” in order to get to another page covered in large ads where you have a couple different words you can click to get to… more pages with enormous ads and links to pages with a sentence or so on them…

  10. Ron Kenett says:

    Daniel – I do not think you understood my comment on the free-range chicken, it was meant to be ironic. The discussion should be indeed on your points 1) and 2). You have the “Is” model and the “Should” model. Focusing on the “should” without offering a bridge to the “is” is counterproductive. This was my point.

    Stating that “we’ve been fooling ourselves that we can substitute this “healthy balanced diet” with Ho-Hos and Twinkies, which is what NHST is the intellectual equivalent of… it doesn’t work.” seems to ignore this.

    For some suggestions regarding such a bridge see https://blogisbis.wordpress.com/2019/11/12/a-pragmatic-view-on-the-role-of-statistics-and-statisticians-in-modern-data-analytics/

    • well, I read your link and found a long list of self-promotions, basically a chunk of your CV, names dropped Willy nilly and a blog covered in advertising, which at the time I clicked included sensational advertisements utilizing photographs of a woman with severe craniofacial deformities as if she were some sort of circus geek.

      I see your picture here is also basically you promoting your book on “Information Quality” I’ll let the irony of that sink in.

      • Ron Kenett says:

        “sensational advertisements utilizing photographs of a woman with severe craniofacial deformities as if she were some sort of circus geek” – no idea what you are talking about. This is not in the material I provided.

        “promoting your book on Information Quality” – yes, I feel information quality is an important topic to promote.

        Sorry you could not connect to the ideas, methods and examples I provided. They represent years of experience and lessons learned in a wide range of applications. Looked at your website, does look interesting.

Leave a Reply