Here are some examples of real-world statistical analyses that don’t use p-values and significance testing.

Joe Nadeau writes:

I’ve followed the issues about p-values, signif. testing et al. both on blogs and in the literature. I appreciate the points raised, and the pointers to alternative approaches. All very interesting, provocative.

My question is whether you and your colleagues can point to real world examples of these alternative approaches. It’s somewhat easy to point to mistakes in the literature. It’s harder, and more instructive, to learn from good analyses of empirical studies.

My reply:

I have lots of examples of alternative approaches; see the applied papers here.

And here are two particular examples:

The Millennium Villages Project: a retrospective,observational, endline evaluation

Analysis of Local Decisions Using Hierarchical Modeling, Applied to Home Radon Measurement and Remediation

41 thoughts on “Here are some examples of real-world statistical analyses that don’t use p-values and significance testing.

    • Since first encountering this blog a few months ago, I’ve often been confused by the complete dismissal of anything with even a whiff of “significance testing” as it applies to actually getting research results published in any sort of mainstream literature.

      I finally surmised that even some of the most vocal opponents of the conventional requirements are forced to compromise by presenting what they feel are the most principled possible “significance test” equivalents. Perhaps if the Jeremiads continue long enough, those of us down the trenches of day-to-day statistical practice will be able to suggest no-significance-testing options to our colleagues in some future studies.

      But at the moment, I see no such options in my own working life. If even Gelman, et. al. are forced to do “excluded zero” then no new day has dawned as of yet.

      • Yes, my experience was that all anyone cares about is “was it significant?” They said it was “too hard/complicated” to do anything else. Academia is almost totally populated by people doing pseudoscience at this point.

    • Michael:

      Indeed. See this post from last year, where I wrote:

      It took us three years to do this retrospective evaluation, from designing sampling plans, gathering background data, designing the comparisons, and performing the statistical analysis.

      At the very beginning of the project, we made it clear that our goal was not to find “statistical significant” effects, that we’d do our best and report what we found. Unfortunately, some of the results in the paper are summarized by statistical significance. You can’t fight City Hall. But we tried our best to minimize such statements.

    • Just to elaborate further: Our analysis used alternative approaches. It did not use p-values or significance testing. In order to communicate to certain audiences (including one of the journal reviewers), we translated some of our results into significance-testing language.

        • Evan:

          I would not say that our analysis “used” p-values and significance testing. Our analysis was based on estimation. We did include those statistical significance statements: they were a summary of our analysis that we presented to make a reviewer happy. I would’ve preferred not to include this summary as it can be confusing, but it seemed like too much trouble to argue with the editors over it, and possibly the summary would be useful to communicating to certain audiences.

    • Maybe I’m just not very smart, but it looks like the Millenium Villages paper has statements of statsitical significance in the abstract.

      I’m really confused right now.

    • Andrew — I’m sorry to say this, but reading this blog first convinced me that statistical significance testing is bad and then convinced me that I should use statistical significance testing. The problem is that many who advocate against significance use it themselves, including recently and including in their best papers.

      Your 2018 Millenium Villages paper is a good example. Reading through it once, I think it is an excellent paper, with a level of quality that I would be so proud to achieve. However, it centres around the significance of the findings based on thresholds. The abstract includes text like,

      * “The project was estimated to have no significant impact on the consumption-based measures of poverty, but a significant favourable impact on an index of asset ownership.”
      * “impact estimates for 30 of 40 outcomes were significant (95% uncertainty intervals [UIs] for these outcomes excluded zero)”
      * “Impacts on nutrition and education outcomes were often inconclusive (95% UIs included zero).”
      * “the project had significant favourable impacts on agriculture, nutrition, education, child health, maternal health, HIV and malaria, and water and sanitation.”

      The findings are also described in the main text with significance-based language. Significant results are starred in the table.

      So the paper makes many of the “statistical significance” mistakes that have been argued against on this blog, and that are trashed by opponents of statistical significance on Twitter.

      Based on the tone of criticism surrounding statistical significance, I had viewed “avoiding statistical significance” as an important moral issue in statistics. But I increasingly see it more like a choice between buying organic or non-organic produce — buying organic is better when feasible, but buying non-organic is not all that big a deal. Does anyone disagree with this perspective?

      • There is no saving academia in my opinion. So I am done. If I want to do science it is always self-funded from here on out.

        My guess is that is going to keep going like this until we have another reformation and then enlightenment type event. Even today you can see the Catholic Church still exists and has exceptional power, but greatly reduced from its former glory. I imagine these people wasting their time with statistical significance will be like that in the future. NIH, FDA, CDC, etc (the whole government-academia complex) will still exist and make their proclamations but most people will ignore them.

        • Quote from above: “There is no saving academia in my opinion. So I am done.”

          I recently came to the exact same two conclusions and/or thoughts.

          It feels good for my mind and soul that others feel and think similarly regarding this.

          In that light, i thank you for this comment.

        • You have mentioned your disillusionment with academia before. I completely understand it.

          A cautionary note though. I remember being very upset when I first discovered that some segments of society are corrupt (I’m speaking broadly here). I was very upset at those particular segments and thought those particular segments were particularly evil and most other segments were non-evil. But as time went on, I found more and more segments that were also dishonest/corrupt/hypocritical/etc. Even segments that I found to be basically useful and rather honest had corrupt elements and influences and practices.

          Nowadays, I assume corruption and dishonesty are almost everywhere (but not everywhere) and I try to think of it as part of the human condition.

          So, my point is to be careful about completely jettisoning academia. Maybe there is something worthwhile in or near academia. Going somewhere else could be just as corrupt. (To be clear, I find most of academia to be bogus. Only the highest levels of research seem non-bogus. You can make a good living by successfully playing the academic game though, and it is a comfy life (it sure beats working!), but the bogus feeling will likely always be there.)

          Personal relationships are another matter. You can find integrity there, and you only need a few … maybe only one. When you do, “Grapple them to thy soul with hoops of steel”. (And children change everything. Almost everything of importance in the world will then fit into a minivan. Then the bogosity of the world outside the family becomes less important and more of a means to an end.)

        • >Deep Bow to Terry of Wisdom<

          Corruption IS everywhere and everyone IS full of shit. Yet despite all my rage…

          But even though academics is f'd up, there are still plenty of good things happening. If nothing else, we're learning what not to do. And disillusionment with academics is relative – all of us thinking about careers in academics hold the practice in such high esteem that, when it doesn't meet our lofty standard, its devastating – even though it probably has a higher proportion of thoughtful quality people than most careers.

          But there IS a lot of great stuff out there to do!

        • I appreciate the comments by Terry and Jim to bring some (possibly needed) nuance in this all. I appreciate it for other readers, but not for myself. Not anymore.

          This blog is the only “scientific” thing i still sometimes (want to) participate in, because it usually at least leaves me with a sense of sanity after reading, and/or participating, in a discussion.

          I have tried to “focus on the positive” and tried to help “improve” matters. I now feel (sections of) all this “let’s improve science” stuff are possible just as corrupt, unscientific, and incapable, as your “career academics”.

          I also lost faith in the utility of even writing papers to try and express some of these things, and/or to try and come up with better things. I concluded that the impact of papers is largely due to who you know, how “hip” people think you are, and since recently how many twitter-followers you have, etc. This is all possibly reinforced by the fact that i don’t want to publish “official” papers anymore, because i don’t want to participate in the journal/editor/peer reviewer -“let’s make publishers rich”- scheme.

          I reason that whether this is all truly the case or not, the fact that i think, and/or feel, this way is reason enough to stop with all this science stuff and go do something else.

        • “that i think, and/or feel, this way is reason enough to stop with all this science stuff and go do something else.”

          Definitely! There’s tons of cool stuff to do. Lots of work for stats people in tech oh my, Amazon is dying to hear from you!

        • On the other hand, if you really don’t like academia, go somewhere else and do something different. Its a big world out there. What do I know, and who can predict the future with any confidence?

        • Quote from above: “There is no saving academia in my opinion. So I am done.”

          I recently came to the exact same two conclusions and/or thoughts.

          It feels good for my mind and soul that others feel and think similarly regarding this.

          In that light, i thank you for this comment.

          Let’s make a pact that whoever gets a billion dollars first (or whatever has the equivalent purchasing power in the future) gives the other a research grant.

        • Quote from above: “Let’s make a pact that whoever gets a billion dollars first (or whatever has the equivalent purchasing power in the future) gives the other a research grant.”

          Haha!

          On a serious note:

          1) i don’t want to participate in science anymore, for real. And i also never really cared about doing research myself. I only care(d) about “solving a problem”, or “solving a puzzle”. I think that’s because that’s how my brain works sometimes. Once i think i “got the solution”, i don’t care very much who executes the research, or idea, or plan, etc. I just care that it is being done. So, you can give your money to someone else should you one day get it, and

          2) i think it’s a scientist’s responsibility to not waste resources and money. Perhaps it’s therefore not surprising that i only spent about half of the (whopping!) 500 euros that were available to me for the only research i ever did: the study that i had to do to get my university diploma.

          I also cringe every time i see researchers spend tens-, or hundreds of thousands of dollars (or the equivalent) on the next “big project” that is often not even in line with what i view as “good” science, and of which i wonder if it’s worth all the money that they got to do it.

          So, if i ever were to get billions of dollars, and decided to spend it on science, i would hand out very small grants to many researchers. I would ask them how they think they could use relatively small grants to optimally perform Psychological research, starting by finding out what the goals of science even are, and how to get achieve them.

          I hope that that would lead to researchers taking things into account such as optimally formulating and testing theories, incorporating replication, focusing on the size and relative importance of effects and not “statistical significance”, etc. Or, perhaps asking these types of questions could lead to providing some possibley useful clarity concerning the possible goals, princples, values, and responsibilities of science and scientists.

          I would hereby show them my idea of how to possible optimally perform Psychological science (https://statmodeling.stat.columbia.edu/2017/12/17/stranger-than-fiction/#comment-628652), and say that this is the best i could come up with, that it hope it could be a possible start to reason and think about things should it be useful.

      • AnonymousCommentator:

        You write:

        Based on the tone of criticism surrounding statistical significance, I had viewed “avoiding statistical significance” as an important moral issue in statistics.

        I think you’re putting the morality in the wrong place here. As I see it, the moral issues are mostly in the consequences: we want to use statistics to save lives, allow people to live life more happily, and productively, etc., wasting minimal resources when doing so. Good statistics helps us use our data more efficiently, it helps us incorporate more data into our analyses, and it helps us gather data more effectively. This can be a big deal! It’s not that significance testing is itself a moral problem, it’s that in many cases we can do better work by using more flexible methods. In many cases, doing an analysis based on statistical significance (or null hypothesis testing more generally) is essentially throwing away data. But it depends on the example.

        Regarding the Millennium Villages paper, as I wrote in comments above, our analysis did not use p-values or significance testing. I would not say that it “centres around the significance of the findings based on thresholds.” Rather, in order to communicate to certain audiences (including one of the journal reviewers), we translated some of our results into significance-testing language.

        If anyone wants to “trash” that aspect of our paper on twitter (as you put it), they should feel free to do so! I agree with such hypothetical trashers that our paper would be better off without the stars and the significance-based language, which pretty much just add a layer of noise to the analyses that we did.

        As I wrote in comments above, it seemed like too much trouble to argue with the editors over this, and possibly the summary would be useful to communicating to certain audiences. There are tradeoffs.

        • Good points. Sometimes you just need to talk to your audience in language they understand more-or-less, rather than in language that may be technically more accurate, but that will glaze their eyes over and not convey a meaningful message to them. (Half a loaf is better than none?)

        • Andrew — Thank you for responding, I appreciate it. Currently, you are probably the most famous opponent to statistical significance in the world. If you and other famous statisticians are not comfortable arguing with reviewers about statistical significance, then nobody will. No matter how intense the public rhetoric about abandoning statistical significance, it won’t be abandoned.

          If I could make a positive suggestion: When there is a worthwhile issue that almost no individual is willing to stand up for alone, sometimes group action can be achieved through a clever kind of pledge. The pledge is of the type, “If more than X% of people in our group sign this pledge, then everyone who signed the pledge will agree to abide by its conditions. Until X% of people sign the pledge, each person may do whatever they please.”

          An example of this is the National Popular Vote Interstate Compact, which is meant to remove the electoral college. The compact is signed by U.S. states. Each signatory agrees that, once the signatories collectively account for an absolute majority of votes in the U.S. electoral college (270 votes), then all signatories will make their electoral college votes in favour of whomever wins the U.S. popular vote, instead of following the current electoral college system. Currently, signatories to the compact account for 72% of the 270 necessary electoral college votes for the compact to come into force.

          In the case of statistical significance, the pledge would be more like, “If at least X% of tenured statistics professors at top universities and at least X% of editors at top journals sign this pledge, then all signatories to this pledge will agree to only use statistical significance in those cases where they believe it is actually statistically warranted, and will NOT add assessments of statistical significance in response to reviewers’ or co-authors comments.”

          Then when asked to add statistical significance, authors can say “I can’t — I signed this pledge. Look, it was signed by all these important people too.”

          I’m not in a position to know whether that pledge would or would not be achievable in the statistics community, but it is at least a thought.

          PS
          Apologies if I missed it, but I don’t see any text in the 2018 Millennium Villages article or supplement that explains how your classical results are different from usual statistical significance. In the article, it appears that evaluated impact and target attainments are judged inconclusive if 95% uncertainty intervals contain zero and are judged significant/attained if 95% uncertainty intervals exceed zero. Though I like the paper a lot, I don’t see a real difference from p<0.05 thinking, as Evan notes above.

        • I don’t see it… Who wants to spend their time getting a bunch of people stuck in their ways to sign a petition when they could be analyzing data and such?

          There is just little overlap between the people who love science and data vs those who would not find petitioning to be one of the most boring/annoying tasks imaginable.

        • > famous statisticians are not comfortable arguing with reviewers about statistical significance, then nobody will.
          It is a slow iterative process – Rome did not fall apart in one day though in some places it fell apart rather fast.

          Peter and Sander point to places/instances of faster progress.

          However, I believe there is a serious issue of inertia at play. Those who are best able to advocate against significance are highly experienced successful researchers. Hence, they have established habits of being successful in academia as it was and is.

          Losing those habits takes time and sometimes costly sacrifice. Unfortunately those sacrifices often involve their students and junior colleagues. So picking premature battles likely will limit their ability to successfully continue to advocate against significance.

          Of course, a perhaps bigger worry is a small subset whose success has gone to their heads and so they feel that they can do no wrong, everyone should trust them as they are trustworthy be fiat and they know how to fix things with new rules. But strangely those new rules do not apply to people like them…

        • I can only guess who you count as the ‘small subset.’ Some members of the subset do seem to redeem themselves at some point after a heady disagreement.

          RE: Those who are best able to advocate against significance are highly experienced successful researchers. Hence, they have established habits of being successful in academia as it was and is.

          Me: That is true. If they can translate their experience into plain english, that would even more effective. Resorting to technical elaborations has been the pattern, from my own observation. In other words, the experienced researchers communicate with other experienced researchers, clients, and students. It’s a narrow audience pool so engaged. Advocacy is much about being able to communicate statistics well to wider audiences: prospective consumer of products & services.

          Martha makes a similar point above.

        • Actually, I was just paraphrasing some posts I did a couple years ago – the subsets change over time and subject matter.

          Those posts mostly were about reaching a broader audience (which I referred to as the masses)

          First – https://statmodeling.stat.columbia.edu/2017/05/16/higher-credence-masses-washed-wonder-woman/
          Second – https://statmodeling.stat.columbia.edu/2017/05/24/take-two-laura-arnolds-tedx-talk/

          p.s. In the second someone privately suggested I was mistaken about my interpretation of _our_ and _us_ in “they did not blame reporters or careerist researchers but instead claimed it was _our_ fault and that _we_ needed to be prepared to contribute to it being resolved”. They suggested _our_ and _us_ referred to philanthropists and policymakers not the wider public.

  1. I was proud of avoiding p-values in this quantitative paper on lethal overdose trends in the United States:
    https://www.peterphalen.com/publications/FentanylrelatedoverdoseinIndianapolisEstimatingtrendsusingmultilevelBayesianmodels.pdf

    The peer reviewers were satisfied with our presentation… In framing my statistical methods and results, I used this article by McElreath and Koster as an incredibly helpful example/guide to follow, which Nadeau may find useful as well:

    https://escholarship.org/content/qt2qk1v7rk/qt2qk1v7rk.pdf

    McElreath and Koster make no mention of p-values despite the heavy quant focus…

    One strategy that I’ve found useful is to explicitly frame methods/results in the language of estimation. So, replace “we wanted to test whether emotion dysregulation is related to suicidality in people with psychosis”, with, “we wanted to estimate the relationship between emotion dysregulation and suicidality in people with psychosis.” Consistently writing in this way can steer reviewers away from their learned expectation of a table of p-values.

  2. The chances that Andrew would misuse significance tests in the way they’ve been misused are <.001. I think I understand the meaning of p values and find them somewhat useful (p<.05 << p<.001 << p<.000000001, etc.).

    So I'd prefer that he include such results, perhaps with appropriate warnings.

    • The chances that Andrew would misuse significance tests in the way they’ve been misused are <.001. I think I understand the meaning of p values and find them somewhat useful (p<.05 << p<.001 << p<.000000001, etc.).

      People who understand what statistical significance means don’t use it… You have mathematicians on one side who don’t understand empirical science, then poorly trained researchers on the other side who don’t understand math or logic.

  3. About a year ago (https://statmodeling.stat.columbia.edu/2018/09/21/psychology-researcher-uses-stan-multiverse-open-data-exploration-explore-human-memory/), Andrew was kind enough to share here some work that I’ve done on human memory, which avoids p-values and null hypotheses and is entirely Bayesian (implemented in Stan):

    https://osf.io/es463/

    We did make some discrete decisions regarding how many dimensions to include in our factor analyses, but these are tempered by the “multiverse”-style analyses in the Appendices which examine the consequences of different choices at that stage.

    We were fortunate to have a very supportive set of reviewers (with one odd exception who was not so much opposed to non-traditional stats as he/she was to reading our manuscript at all) and importantly a good editor as well. I also want to thank the editorial staff at the APA publications office for dealing with our myriad non-standard tables and figures—I suspect a lot of the momentum behind old stats is that there is already a standard pipeline for turning those analyses into publishable units.

    I agree with Peter above who suggests framing everything in terms of estimation rather than discrete causal identification so that no one expects to see a p-value. Our work was explicitly correlational (and exploratory, though what research isn’t?), so this wasn’t a big conceptual leap in our case, but I think it is still a good rhetorical strategy.

    P.S.: I should point out that I’m not a strict partisan for Bayes—I believe in the right tool for the right job and it just happens that Bayesian methods often do a better job of addressing the questions that I have. Other questions might be better answered by model comparisons (which may or may not be Bayesian) or even by hypothesis tests (e.g., “is my model any good at all?”).

    • +1 for the phrase “rhetorical strategy”.

      People in this thread are occasionally conflating “science” with “scientific writing”, and the conflation is making it hard to problem-solve.

      The overarching strategy that we’ve mostly seen from anti-p-value researchers is:
      (1) do the actual science without p-values;
      (2) use p-values (or some substantive equivalent) in the resulting scientific article;
      (3) tell everyone what you *actually* did in blogs and on twitter to make sure that the right scientific methods still get disseminated.
      That’s basically what Andrew did in his Millienium Villages Project. He performed good, p-value-less analyses but ended up writing up the article using the language of NHST and statistical significance, which “pretty much just add[ed] a layer of noise to the analyses.” The science didn’t involve p-values or NHST, but the scientific writing basically did (sorry Andrew–most of your articles are pure, but I think that was a bad example).

      A different strategy would be:
      (1) do the actual science without p-values;
      (2) don’t use p-values (or their substantive equivalents) in the resulting scientific article.
      Pulling off step 2 means learning a new way of writing and a new way of presenting a scientific argument.

      I think the primary obstacle to doing scientific writing without p-values or NHST language is that none of us were taught how to write this sort of article well (or at all) in University. Articles like Gregory Cox’s or McElreath and Koster’s, linked above, are important because they can serve as a good model for how to structure scientific articles convincingly without resorting to p-values or NHST language. In this respect, it’s their style/structure that’s important, not the science that was actually done. I was able to use McElreath and Koster’s article as a pretty direct template to structure mine even though I was working in a totally different scientific domain.

      Sidenote: another scientific writing strategy that I’ve found useful when ridding myself of p-values is to use lots of figures, and to replace any tables of estimates with figures. Peer-reviewers expect tables to include asterisks indicating statistical significance. They’re less likely to expect this from figures. Also, figures are able to convey a *ton* of complex information, which can itself help the reader intuit that p-values wouldn’t be a useful addition. For an example of this, see figure 3 of my paper on lethal overdoses linked to above–that information could have been presented as a table of estimates with asterisks indicating significance, but I think peer reviewers would have felt silly asking for it to be presented in that format (and none of them did).

      • It’s most useful in my opinion to avoid (in these meta-discussions) saying “p-value” at all. Stick to “significance testing” for clarity of intent.

        I understand that it is far, far better to use a workflow that avoids any possibility of significance-test-based discard of potential analysis paths. And if we strenuously do that and simply apply after the fact wallpaper of significance testing to the results of a significance-testing-agnostic workflow, then no harm done. Except…

        We are inevitably obfuscating what was actually done. When everyone in the world, except for a tiny (mostly Bayesian) elite, let the direction of their analysis be determined in large part by significance testing then whatever we write with significance tests attached will be assumed to reflect that conventional practice.

        Now this need not be true if were are clear, even aggressive, in the way we present our analysis methodology. But I’m thinking that such caveats will mostly act as dog whistles to that same elite.

        I know this sort of talking out of both sides of our mouth is necessary at this point in history. But whatever eventual possibility there is of bringing future generations of statisticians and researchers into a significance-testing-free environment has to be massively undercut by continuing to see various significance-testing commonplaces prominently featured in the work of the most vocal objectors to significance testing per se.

        • Yeah. Step (3), where you tell the internet that you *actually* did your analyses without significance testing and only had the p-values in your write-up to make peer-review easier, is going to reach just those people who follow you online. And your followers are mostly on board already…

          (also yeah: my post reads more clearly if you simply replace instances of “p-value” or “NHST” with “significance testing.”)

        • Peter and Brent —
          In the process you explain, anti-p-value researchers are choosing whether to describe their results as statistically significant after seeing the results of their analyses. As you point out, when results are favourable and statistically significant, then researchers are incentivised to use significance-based language because reviewers and audiences like it, and because it can be very difficult to publish in top journals otherwise.

          But what happens when the results aren’t favourable and significant? Now the incentives are different…

          * If results are favourable and non-significant, researchers are incentivised to argue that the statistical significance paradigm is flawed and that their new treatment/intervention should be recommended — irrespective of the presence or absence of statistically significant benefit.

          * If results are unfavourable and non-significant, researchers are incentivised to apply the language of statistical significance and describe their new treatment/intervention as “statistically comparable” or “statistically consistent” with the comparator.

          * If results are unfavourable and significant, researchers are incentivised to argue that the statistical significance paradigm is flawed and that their new treatment/intervention should not be ruled out.

          So the ability to choose between “significance-free language” and “significance-based language” can be more dangerous than either approach alone, at least in some situations.

          I worry that we will see more and more of this because many doctors and pharma companies, at least, will definitely follow the incentive scheme described above when given permission to avoid the language of statistical significance in some situations, but permission to use it in other situations. The whole process doesn’t even require conscious planning — it can arise organically and without premeditation as a result of the pressures of reviewer comments and the desire for publication.

          The end result will be more useless or harmful (but profitable) treatments.

      • Peter:

        Sure, but the Millennium Villages paper is just one example, an example where I was just one of many coauthors, it was a big project, and I didn’t feel comfortable derailing the publication over this particular issue. I’ve published hundreds of applied analyses, and I’m pretty sure that most of them don’t use p-values (or the equivalent) at all. At least, in recent years I’ve tried to be careful to not engage in significance testing in the analysis or the writeup.

        So please don’t judge my practice based on this one paper, which was a big collaboration on which I was only one part. I did insist before the project began that statistical significance could not be our goal, and we would’ve published our findings pretty much the same way and with pretty much the same publicity had they gone otherwise.

        • Hey Andrew! Totally hear you. I meant what I said about most of your articles being pure / significance-free, and almost put some of my favorite examples into that comment, but left them out for some reason. Here’s one of yours that I like because it’s explicitly a test but still manages to avoid the language of significance testing: http://www.stat.columbia.edu/~gelman/research/published/w14103.pdf

          I hope I don’t sound like I’m on a high horse–I have articles in press right now that are full of p-values, including at least one first-author where I don’t have the excuse of pressure from collaborators. It’s hard to avoid significance testing, which makes these conversations that much more valuable to me.

          (also, some version of this comment is either lost or awaiting moderation, so apologies if this is a repost)

        • This is why I think “purity” is, for the moment, too high a standard to hold everyone to. I think we ought to write different papers for different audiences, rather than just tweeting/blogging about the good stuff. It should just be understood that the paper you get published in the journal that requires/is biased toward p-values is just one take. If you have the reputation or influence or your work is significant enough to persuade journals to let you exclude p-values, or if you can afford to submit only to journals that do allow it, that’s wonderful, but it can’t be the standard. In this particular case, you could negotiate, in advance, with your collaborators for the option to write up the best version of the analysis as a separate paper. There would be no need to “defend” your “purity.” I also think this isn’t just a stopgap, it can be a way to change institutions and practices: if everyone knows that the “real” paper is on arXive, journals will either adopt the same standards or become marginalized.

        • “I also think this isn’t just a stopgap, it can be a way to change institutions and practices: if everyone knows that the “real” paper is on arXive, journals will either adopt the same standards or become marginalized.”

          Interesting possibility.

        • I’d go further and say that “purity”—where I am taking this to the extreme to mean “only certain statistical practices are allowed”—is not even a standard to which we should aspire. The NHST framework has led us down a lot of unproductive roads not because it is immoral or untenable on its own, but because it has been used irresponsibly to support claims that it cannot logically uphold and has driven the design of experiments that are ill-suited to answer the real research questions scientists have.

          This gets to my use of the phrase, “rhetorical strategy” and the theme of communicating science. What statistics you use and how you report them should, ideally, be driven by your specific research question, i.e., “you [the audience] should come to the conclusion X because analysis F applied to data D yields result Y.”

          Obviously the scientist needs to make sure their data D are capable of supporting analyses that produce results that are capable of saying anything about X, which is nontrivial since X is usually about theoretical constructs that are not directly observed (and this isn’t just in psychology or social science, I’m looking at you, electrons!).

          Coupled to this, the scientist needs to understand their data generating process well enough to construct their analysis F such that when F is applied to any data (not just the observed data D) it is capable of yielding results that can speak to conclusion X. Here’s where I think stats education has failed many scientists by teaching a set of off-the-shelf methods applicable to specific settings rather than general model-building tools that can be applied more broadly. But this is also why I think “purity” is not so great, because it limits the tools at the scientist’s disposal to construct well-tuned analysis procedures for answering specific research questions.

          Purity, of course, also limits the final stage, where rhetoric comes in: the audience needs to be able to understand why X is the only (or one of just a few) states of the world that would allow F(D) = Y. So it is also important to be able to use tools the audience can understand BUT that still logically allow for conclusion X. Here’s where things get tricky, though, because it is easy to fall into a trap of constructing the argument for X based on analysis F but then swapping out F for analysis G but keeping X the same—and there are very few cases where that doesn’t break the logic. I think this latter trap is what Peter, Anon, and other commenters are worried about.

  4. If anyone is looking for examples of real reports that do not use significance tests in a journal that welcomes any other established analysis approach, try looking at the journal Epidemiology founded by Kenneth Rothman in 1990. They restrict use of P-values to fit tests and other uses where estimation of an intelligible parameter is unfamiliar. Responding to protests, they invited a challenge to that restriction, which was given in the trilogy on p. 62-78 of the journal in 2013:
    Greenland S, Poole C (2013). Living with P-values: Resurrecting a Bayesian
    perspective.
    Gelman A (2013). P values and statistical practice.
    Greenland S, Poole C (2013). Living with statistics in observational research.
    They also recently published an article on using expected CI behavior rather than power for sample-size calculations:
    Rothman KJ Greenland S (2018). Planning study size based on precision rather
    than power. Epidemiology, 29, 599-603, doi: 10.1097/EDE.0000000000000876.

Leave a Reply to Anoneuoid Cancel reply

Your email address will not be published. Required fields are marked *