Is causality as explicit in fake data simulation as it should be?

Sander Greenland recently published a paper with a very clear and thoughtful exposition on why causality, logic and context need full consideration in any statistical analysis, even strictly descriptive or predictive analysis.

For instance, in the concluding section – “Statistical science (as opposed to mathematical statistics) involves far more than data – it requires realistic causal models for the generation of that data and the deduction of their empirical consequences. Evaluating the realism of those models in turn requires immersion in the subject matter (context) under study.”

Now, when I was reading the paper I started to think how these three ingredients are or should be included in most or all fake data simulation. Whether one is simulating fake data for a randomized experiment or a non-randomized comparative study, the simulations need to adequately represent the likely underlying realities of the actual study. Only have to add simulation to this excerpt from the paper “[Simulation] must deal with causation if it is to represent adequately the underlying reality of how we came to observe what was seen – that is, the causal network leading to the data”.  For instance, it is obvious that sex is determined before treatment assignment or selection (and should be in the simulations), but some features may not be so obvious.

Once someone offered me a proof that the simulated censored survival times they generated where the censoring time was set before the survival time (or some weird variation on that) would be meet the definition of non-informative censoring. Perhaps there was a flaw in the proof, but the assessed properties of repeated trials we wanted to understand, were noticeably different than when survival times were first generated and then censoring times generated and then applied. In that way, simulations likely better reflect the underlying reality as we understand it. And others (including future selves) more likely to raise criticisms about this.

So I then worried about how clear I had been in my seminars and talks on using fake data simulation to better understand statistical inference, both frequentist and Bayes. At first, I thought I had, but on further thought I am not so sure. One possibly misleading footnote on the bootstrap and cross-validation I gave likely needs revision, as that did not reflect causation at all.

Continue reading

Everything that can be said can be said clearly.

The title as many may know, is a quote from Wittgenstein. It is one that has haunted me for many years. As a first year undergrad, I had mistakenly enrolled in a second year course that was almost entirely based on Wittgenstein’s  Tractatus. Alarmingly, the drop date had passed before I grasped I was supposed to understand (at least some of) the Tractatus to pass. That forced me to repeatedly re-read it numerous times. I did pass the course.

However, I now think the statement is mistaken. At least, outside mathematics in subjects where what is being said is an attempt to say something about the world – that reality that is beyond our direct access. Here some vagueness has its place or may even necessary. What is being said will unlikely be exactly right. Some vagueness may be helpful here in the same way that sheet metal needs to stretch to be an adequate material for a not quite fully rigid structure.

Now, what I am thinking about trying to say more clearly at present is how diagrammatic reasoning, experiments performed on diagrams as a choice of mathematical analysis of probability models utilizing simulation, will enable more to grasp statistical reasoning better. OK, maybe the Wittgenstein quote was mostly click bait.

My current attempt at saying it with 4 paragraphs:

Continue reading

Rethinking Rob Kass’ recent talk on science in a less statistics-centric way.

Reflection on a recent post on a talk by Rob Kass’ has lead me to write this post. I liked the talk very much and found it informative. Perhaps especially for it’s call to clearly distinguish abstract models from brute force reality. I believe that is a very important point that has often been lost sight of by many statisticians in the past. I would actually point to many indicating Box’s quote  “all models are wrong, but some are useful” as being insightful rather as something already at the top on most statisticians minds, as evidence of that.

However, the reflection has lead me to think Kass’ talk is too statistics-centic. Now Kass’ talk was only about 25 minutes long while being on subtle topic. It is very hard to be both concise and fully balanced, but I believe we have a different perspective and I would like to bring that out here. For instance, I think this  statement “I [Kass] conclude by saying that science and the world as a whole would function better if scientific narratives were informed consistently by statistical thinking” would be better put as saying that statistics and the statistical discipline as a whole would function better if statistical methods and practice were informed consistently by purposeful experimental thinking (AKA scientific thinking).

Additionally, this statement ““the essential flaw in the ways we talk about science is that they neglect the fundamental process of reasoning from data” seems somewhat dismissive of science being even more fundamental about process of reasoning from data, with statistics being a specialization when data is noisy or varies haphazardly. In fact, Steven Stigler has argued that statistics arose as a result of astronomers trying to make sense of observations that varied when they believed what was being observed did not.

Finally, this statement “the aim of science is to explain how things work” I would rework into the aim (logic) of science is to understand how experiments can bring out how things work in this world by using abstractions that themselves are understood by using experiments. So experiments all the way up.

As usual, I am drawing heavily on my grasp of writings by CS Peirce. He seemed to think that everything should be thought as an experiment. Including mathematics that he defined as experiments performed on diagrams or symbols rather than chemicals or physical objects. Some quotes from his 1905 paper What pragmatism is.  “Whenever a man acts purposively, he acts under a belief in some experimental phenomenon. … some unchanging idea may come to influence a man more than it had done; but only because some experience equivalent to an experiment has brought its truth home to him more intimately than before…”

I do find the thinking of anything one can as an experiment as being helpful. For instance, in this previous post discussion led to a comment by Andrew that “Mathematics is simulation by other means”. One way to unpack this by thinking of mathematics as experiments on diagrams or symbols would be to claim that calculus is one design of an experiment while simulation is just another design. Different costs and advantages, that’s all.  It’s the idea to be experimental and experiment most appropriately that one can – that is fundamental. Then sorting out most appropriately would point to economy of research as the other fundamental piece.

 

 

 

Some possibly different experiences of being a statistician working with an international collaborative research group like OHDSI.

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author.

Starting at the end of March, I thought it would be good idea to let folks here know about early research efforts being launched on Covid19 by OHDSI in their study-a-thon.

So almost six months later and we are where we are.

I was an observer on that study_a_thon and some of the later work done afterwards. I did not actually work with the group, but just watched and listened and so my views may be somewhat uninformed.

However, it occurred to me that many statistician might like to be more aware of opportunities that such international research groups might offer. Most I think work at a single institution where the analyses they get to be involved in have a single data set (at least at any one point in time), a single research group is involved in the project and with the unfortunate pressure to get something published in a journal. Eventually the research encounters the usual less than adequate peer review from journal editors and reviewers. Then if post-peer review occurs, some involved in the research “demand” the wagons to be circled and all members remain inside.

Well that was my career, at some places, when I was in academia. Also on the other side, those in the research group can only (easily) work with statisticians that are in their institution or otherwise available to them. Those statisticians may not have much expertise for what is specifically needed for their project. Or be able to easily draw on expertise of other statisticians.

Recently, in a series of talks by members from OHDSI at the virtual JSM2020, some real differences in opportunities to the above for statisticians seemed apparent. Briefly, rather than a single data set there are multiple sources of data sets, summaries of the separate results are made available for contrast and comparison, the researchers are often from multiple institutions around the globe, there is a methodological group that can be drawn on for specific expertise and code on github and peer review can potentially be part of the process by others in OHDSI not directly involved. Still there seems to be that unfortunate pressure to get something published in a journal, at least for many in OHDSI.

These may be just my impressions, but I think statisticians would benefit by knowing more about groups like OHDSI. I am sure there are more and I am expecting more in the future.

There are three talks at this link listed below. For those who signed up for JSM, the accompanying talks should be available until the end of August.

August 2020
2020 Joint Statistical Meetings
Session: The OHDSI Collaboration: Generating Reliable Evidence from Large-Scale Healthcare Data

Patrick Ryan – Janssen, Columbia University
The OHDSI Collaboration: Mission, Accomplishments, and the Road Ahead
Presentation Slides [Main message: Scientific harmony is achieved through collaboration, not randomization?]

David Madigan – Northeastern University
OHDSI Methods for Causal Effect Estimation
Presentation Slides [Main message: A new approach that is reproducible, systematized, open source and at scale?]

Marc Suchard – UCLA
Large-Scale Evidence Generation in a Network of Databases (LEGEND) Methodology and the Hypertension Study
Presentation Slides [Main message: A substantive case study?]

 

 

Somethings do not seem to spread easily – the role of simulation in statistical practice and perhaps theory.

Unlike Covid19, somethings don’t seem to spread easily and the role of simulation in statistical practice (and perhaps theory) may well be one of those.

In a recent comment, Andrew provided a link to an interview about the new book Regression and Other Stories by Aki Vehtari, Andrew Gelman, and Jennifer Hill. An interview that covered many of the aspects of the book, but the comments on the role of fake data simulation caught my interest the most. 

In fact, I was surprised by the comments in that the recommended role of simulation seemed much more substantial than I would have expected from participating on this blog. For at least the last 10 years I have been promoting the use of simulation in teaching and statistical practice with seemingly little uptake from other statisticians. For instance my intro to Bayes seminar and some recent material here (downloadable HTML from google drive).

My sense was that those who eat, drink and dream in mathematics [edit] see simulation as awkward and tedious. But maybe that’s just me, but statisticians have published comments very similar to this [edit].  But Aki, Andrew and Jennifer seem to increasingly disagree.

For instance, at 29:30 in the interview there is about 3 minutes from Andrew that all of statistical theory is a kind of shortcut to fake data simulation and you don’t need to know any statistical theory as long as you are willing to do fake data simulation on everything. However, it is hard work to do fake data simulation well [building a credible fake world and how that is sampled from]. Soon after, Aki commented that it is only with fake data simulation that you have access to the truth in addition to data estimates. That to me is the most important aspect – you know the truth.
Also at 49:25 Jennifer disclosed that she changed her teaching recently to be based largely on fake data simulation and is finding that the students having to construct the fake world and understand how the analysis works there provides a better educational experience.
Now in a short email exchange Andrew did let me know that the role of simulation increased as they worked on the book and Jennifer let me know that there is simulation exercises in the causal inference topics.
I think the vocabulary they and others have developed (fake data, fake world, Bayesian reference set generated by sampling from the prior, etc.) will help more see why statistical theory is a kind of shortcut  to simulation. I especially like this vocabulary and recently switched from fake universe to fake world in my own work.
However, when I initially tried using simulation in webinars and seminars, many did not seem to get it at all.
p.s. When I did this post I wanted to keep it short and mainly call attention to Aki, Andrew and Jennifer’s (to me increasing important) views on simulation. The topic is complicated, more so than I believe most people appreciate and I wanted to avoid a long complicated post. I anticipated doing many posts over the coming months and comments seem to support that.
However, Phil points out that I did not define what I meant by “fake data simulation” and I admittedly had assumed readers would be familiar with what Aki, Andrew and Jennifer meant by it (as well as myself). To _me_ it is simply drawing pseudo-random numbers from a probability model. The “fake data” label emphasizes its an abstraction that used abstractly to represent haphazardly varying observations and unknowns.  This does not exclude any Monte-Carlo simulation but just emphasizes one way it could be profitably used.
For instance, in the simple bootstrap, real data is used but the re-sampling draws are fake (abstract) and are being used to represent possible future samples. So here the probability model is discrete with support only on the observations in hand with probabilities implicitly defined by the re-sampling rules. So there is a probability model and simulation is being done. (However I would call it a degenerate probability model given loss of flexibility in choices).

Continue reading

More than one, always more than one to address the real uncertainty.

The OHDSI study-a-thon group has a pre-print An international characterisation of patients hospitalised with COVID-19 and a comparison with those previously hospitalised with influenza.

What is encouraging with this one over yesterday’s study, is multiple data sources and almost too many co-authors to count (take that Nature’s editors).

So an opportunity to see the variation and some assurance that many eyes had an opportunity to see and question the protocol and the study work.

Results: 6,806 (US: 1,634, South Korea: 5,172) individuals hospitalised with COVID-19 were included. Patients in the US were majority male (VA OMOP: 94%, STARR-OMOP: 57%, CUIMC: 52%), but were majority female in HIRA (56%). Age profiles varied across data sources. Prevalence of asthma ranged from 7% to 14%, diabetes from 18% to 43%, and hypertensive disorder from 22% to 70% across data sources, while between 9% and 39% were taking drugs acting on the renin-angiotensin system in the 30 days prior to their hospitalisation. Compared to 52,422 individuals hospitalised with influenza, patients admitted with COVID-19 were more likely male, younger, and, in the US, had fewer comorbidities and lower medication use.

Now, it may be important to note that none of the authors had direct access to the very confidential patient data. They write analysis scripts which the data holders run (separately) and return data quality diagnostics and the summaries reported in the paper. Some ability to query the study here as well a access protocol and code.

Now this a live entity, the scripts can be run by any data holder at least after the data has been transformed into a standard format. Hopefully that can be done for the enterprise electronic health record (Sunrise Clinical Manager; Allscripts) reporting database from yesterday’s study.

If not, why not?

p.s. CS Peirce quote – “one man’s [group’s] experience is nothing if it stands alone. If he sees what others cannot see, we call it hallucination” (CP5.402n)

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author.

 

Usual channels of clinical research dissemination getting somewhat clogged: What can go wrong – does.

A few weeks ago I was an observer on the OHDSI Covid19 study-a-thon (March 26 – 29). Four days of intensive collaboration among numerous clinical researchers working with previously established technology to enable high quality research with data access up to 500 million patients.

Current status here.

This is a good summary of what happened: “I am extremely proud to see what our community accomplished, but we are well aware that this is merely the beginning stage of a long research agenda,” said George Hripcsak, MD, MS, the Vivian Beaumont Allen Professor and Chair of the Columbia Department of Biomedical Informatics. “Our international network is committed to continuing work in this area until this pandemic has ended.”                   [no bolding in the original].

In clinical research the devil is in the details and about a week after the study-a-thon, the first pre-print was ready to share (April 5).  Yeah!

Submitted to medRxiv it’s appearance was delayed by almost a full week :-(

Not ideal!

Now, dozens of other studies by the group need to be finalized and disseminated, but it is worrisome that the usual channels of clinical research dissemination seem to be getting somewhat clogged. What is likely is happening is anyone and their brother who can get their hands on 50 or so patient’s data are quickly doing some sort of analysis and submitting a not so high quality paper.

Now, I am biased but this international group has access to unbelievably large data sets and a tested out methodology to do better than average studies – cooperatively.

As always, the initial dissemination of study’s results needs sometime for other experts to digest it, raise concerns about what might be wrong and suggest ways to mitigate that. Delays in this process are regrettable.

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author. As with any critically-thinking inquirer, the views behind these deliberations are always subject to rethinking and revision at any time.

 

 

Update: OHDSI COVID-19 study-a-thon.

Thought a summary in the read below section might be helpful as the main page might be a lot to digest.

The OHDSI Covid 19 group re-convenes at 6:00 (EST I think) Monday for updates.

For those who want to do modelling, you cannot get the data but must write analysis scripts that data holders will run on their computers and return results. My guess is that might be most doable through here where custom R scripts can be implemented that data holders might be able to run. Maybe some RStan experts can try to work this through.

Continue reading

And the band played on: Low quality studies being published on Covid19 prediction.

According to Laure Wynants et al Systematic review and critical appraisal of prediction models for diagnosis and prognosis of COVID-19 infection  most of the recent published studies on prediction of Covid19 are of rather low quality.

Information is desperately needed but not misleading information :-(

Conclusion: COVID-19 related prediction models for diagnosis and prognosis are quickly
entering the academic literature through publications and preprint reports, aiming to support
medical decision making in a time where this is needed urgently. Many models were poorly
reported and all appraised as high risk of bias. We call for immediate sharing of the individual
participant data from COVID-19 studies worldwide to support collaborative efforts in
building more rigorously developed and validated COVID-19 related prediction models. The
predictors identified in current studies should be considered for potential inclusion in new
models. We also stress the need to adhere to methodological standards when developing and
evaluating COVID-19 related predictions models, as unreliable predictions may cause more
harm than benefit when used to guide clinical decisions about COVID-19 in the current
pandemic.

OHDSI COVID-19 study-a-thon.

The OHDSI COVID-19 study-a-thon started early on Thursday morning – 3 am for me.

The wrap up session – of the START of the Odyssey that needs to continue – will be available at 7 pm eastern time / EDT.

This will give anyone who might be able to contribute  to a world wide collaboration to enable better decision making and research on Covid19 a sense about what has happened so far.

I’ll add the link when I get it or anyone commenting that gets please share it here.

Sorry for delay – this is the link https://www.ohdsi.org/covid-19-updates/

Slides are now available . Other groups will be re-running the analyses on other data providers data and pointing out what they learned.

PRESENTATIONS WITHIN THE #OHDSICOVID19 WRAP-UP CALL (Full Slidedeck)

Introduction – Daniel Prieto-Alhambra and Patrick Ryan (Slides)
Literature Review – Jennifer Lane (22:00 • Slides)
Data Network In Action – Kristin Kostka (26:10• Slides)
Phenotype Development – Anna Ostropolets (31:38• Slides)
Clinical Characterization of COVID-19 – Ed Burn (42:10 • Slides)
The Journey Through Patient-Level Prediction – Peter Rijnbeek (50:12 • Slides)
Prediction #1: Amongst Patients Presenting with COVID-19, Influenza, or Associated Symptoms, Who Are Most Likely to be Admitted to the Hospital in the Next 30 Days? – Jenna Reps (56:55 • Slides)
Prediction #2: Amongst Patients at GP Presenting with Virus or Associated Symptoms with/without Pneumonia Who Are Sent Home, Who Are Most Likely to Require Hospitalization in the Next 30 Days? – Ross Williams (1:08:42 • Slides)
Prediction #3: Amongst Patients Hospitalized with Pneumonia, Who Are Most Likely To Require Intensive Services or Die? – Aniek Markus (1:15:25 • Slides)
Estimation #1: Hydroxychloroquine – Daniel Prieto-Alhambra (1:23:32 • Slides)
Estimation #2: Safety of HIV/HepC Protease Inhibitors – Albert Prats (1:31:24 • Slides)
Estimation #3: Association of Angiotensin Converting Enzyme (ACE) Inhibitors and Angiotensin II Receptor Blockers (ARB) on COVID Incidence and Complications – Daniel Morales (1:36:58 • Slides)
#OpenData4COVID19 – Seng Chan You (1:45:32 • Slides)
The Journey Ahead – Patrick Ryan (1:50:28 • Slides)
Questions & Answers – Daniel Prieto-Alhambra, Peter Rijnbeek and Patrick Ryan (2:08:15)

We need to practice our best science hygiene.

Of course I am not referring to hand-washing and social distancing but rather heightened social interactions between those now engaged or who can get engaged in trying to get less wrong about Covid19.

That is, being open about one’s intentions (the purpose of the effort), one’s methods and one’s data and data sources.

For instance these data sources, Canada testing and results, US testing and results

and some information on ongoing trials  (which underlines the need to for good expertise and advice.)

I know, conjectures and opinions can be help, but I would suggest comments here be limited to data sources and methods to analyse data sources and of course trial designs.

p.s. We each need to find where our particular mix of skills will be most useful and join in there if and when we can. I am currently on standby where I work, so I won’t know exactly what I will be working on. In light of this, I am trying to get a scan of were good clinical research/evaluation material and advice might be located.

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author.

 

Just some numbers from Canada

One of my colleagues posted this link yesterday to a shiny app giving Covid19 testing and results for all provinces in Canada. Seems to match all other sources I have heard from.

About 43,000 tests and 600 positive. The cumulative graphs of cases by province indicates Alberta is currently having the fastest increases.

Hopefully the numbers will get on a github at some point.

Anything similar in the US yet, giving total and by state testing and results?

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author.

Attempts at providing helpful explanations of statistics must avoid instilling misleading or harmful notions: ‘Statistical significance just tells us whether or not something definitely does or definitely doesn’t cause cancer’

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author. As with any critically-thinking inquirer, the views behind these deliberations are always subject to rethinking and revision at any time.

Getting across (scientifically) profitable notions of statistics to non-statisticians (as well as fellow statisticians) ain’t easy.

Statistics is what it is, but explaining it as what it ain’t just so it is easy to understand (and thereby likely to make you more popular) should no longer be tolerated. What folks take away from easy to understand incorrect explanations can be dangerous to them and others. Worse they can become more gruesome than even vampirical ideas – false notions that can’t be killed by reason.

I recently came across the quoted explanation in the title of this post in a youtube a colleague tweeted How Not to Fall for Bad Statistics – with Jennifer Rogers.

The offending explanation of statistics as the alchemy of converting uncertainty into certainty occurs at around 7 minutes. Again, “Statistical significance just tells us whether or not something definitely does or definitely doesn’t cause cancer.” So if you were uncertain if something caused cancer, just use  statistical significance to determine if it definitely does or definitely doesn’t. Easy peasy. If p > .05 nothing to worry about. On the other hand, if p < .05 do whatever you can to avoid it. Nooooooo!

Now, if only a statistician was doing such a talk or maybe a highly credentialed statistician – but at the time Jennifer Rogers was the Director of Statistical Consultancy Services at the University of Oxford, an associate professor at Oxford and still is vice president for external affairs of the Royal Statistical Society. And has a TEDx talk list on her personal page. How could they have gotten statistical significance so wrong?

OK, at another point in the talk she did give a correct definition of p_values and at another point she explained a confidence interval as an interval of plausible values [Note from Keith – a comment from Anonymous lead  me to realise that I was mistaken here regarding an interval of plausible values being an acceptable explanation. I now see it as a totally wrong and likely to lead others to believe the confidence interval is a probability interval. More explanation here.]  But then she claimed for a particular confidence interval  at around 37 minutes “I would expect 95% of them between 38 and 66” where she seems to be referring to future estimates or maybe even the “truth”. Again getting across (scientifically) profitable notions of statistics to non-statisticians (as well as fellow statisticians) ain’t easy. We all are at risk of accidentally giving incorrect definitions and explanations. Unfortunately those are the ones folks are most likely to take away as they are much easier to make sense of and seemingly more profitable for what they want to do.

So we all need to speak up about them and retract ones we make. This video has had almost 50,000 views!!!

Unfortunately, there is more to complain about in the talk. Most of the discussion about confidence intervals seemed to be just a demonstration of how to determine statistical significance with them. The example made this especially perplexing to me being that it addressed a survey to determine how many agreed with an advertisement claim – of 52 surveyed 52% agreed. Now when I first went to university, I wanted to go into advertising (there was even a club for that at the University of Toronto). Things may have changed since then, but then getting even 10% of people to accept an adverting claim would have to considered a success.

But here the uncertainty in the survey results is assessed primarily using a null hypothesis of 50% agreement. What? As if we are really worried that 52 people flipped a random coin to answer the survey. Really? However, with that convenient assumption it is all about whether the confidence interval includes 50% or not. At around 36 minutes if the confidence interval does not cross 50% “I say it’s a statistically significant result” QED.

Perhaps the bottom line here is that as with journalists who would benefit from statisticians giving advice as to how to avoid being mislead by statistics, all statisticians need other statisticians to help them avoiding explanations of statistics that may instil misleading notions of what statistics are, can do and especially what one should make of them. So we all need to speak up about them and retract ones we make.

P.S. from Andrew based on discussion comments: Let me just emphasize a couple of things that Keith wrote above:

Getting across (scientifically) profitable notions of statistics to non-statisticians (as well as fellow statisticians) ain’t easy.

We all are at risk [emphasis added] of accidentally giving incorrect definitions and explanations.

All statisticians need other statisticians to help them avoiding explanations of statistics that may instill misleading notions of what statistics are, can do and especially what one should make of them. So we all need to speak up about them and retract ones we make.

As Keith notes, we all are at risk. That includes you and me. The point of the above post is not that the particular speaker made uniquely bad errors. The point is that we all make these sorts of errors—even when we are being careful, even when we are trying to explain to others how to avoid errors. Even statistics experts make errors. I make errors all the time. It’s important for us to recognize our errors and correct them when we see them.

P.S. from Keith about the intended tone of the post.

As I wrote privately to a colleague involved with RSS “Tried not too be too negative while being firm on concerns.”

Also, my comment “thereby likely to make you more popular” was meant to be descriptive of the effect not the motivation. Though I can see it being interpreted otherwise.

P.S2. from Keith: A way forward?

From Andrew’s comment below “Given Rogers’s expertise in statistics, I’m sure that she doesn’t really think that statistical significance can tell us whether or not something definitely does or definitely doesn’t cause cancer. But that’s Keith’s point: even experts can make mistakes when writing or speaking, and this mistakes can mislead non-experts, hence the value of corrections.” I should have written something like that what he put in the first sentence, argued these mishaps can cause damage in even the best of presentations and regardless they need to be pointed out to the author, who then hopefully will try to correct possible misinterpretations in most of their same audience. Something like “some of the wording was unfortunate and was not meant to give the impression statistical significance made anything definite. Additionally, showing how a confidence interval could be used to assess statistical significance was not meant to suggest that is how they should be interpreted. ”

 

A Bayesian view of data augmentation.

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author. As with any critically-thinking inquirer, the views behind these deliberations are always subject to rethinking and revision at any time.

After my lecture on Principled Bayesian Workflow for a group of machine learners back in August, a discussion arose about data augmentation. The comments were about how it made the data more informative. I questioned that as there is only so much information in the data. In the view of the model assumptions, just the likelihood. So simply modifying the data, information should not increase but only possibly decrease (non-invertible modification).

Later, when I actually saw an example of data augmentation and I thought about this more carefully, I changed my mind. I now realise background knowledge is being brought to bear on how the data is being modified. So data augmentation is just a away of being Bayesian by incorporating prior probabilities. Right?

Then thinking some more, it became all trivial as the equations below show.

P(u|x) ~ P(u) * P(x|u)   [Bayes with just the data.]
~  P(u) * P(x|u) * P(ax|u)   [Add the augmented data.]
P(u|x,ax) ~ P(u) * P(x|u) * P(ax|u) [That’s just the posterior given ax.]
P(u|x,ax) ~ P(u) * P(ax|u) * P(x|u) [Change the order of x and ax.]

Now, augmented data is not real data and should not be conditioned on as real. Arguably it is just part of (re)making the prior specification from P(u) into P.au(u) = P(u) * P(ax|u).

So change the notation to P(u|x) ~ P.au(u) * P(x|u).

If you data augment (and you are using likelihood based ML, implicitly starting with P(u) = 1), you are being a Bayesian whether you like it or not.

So I goggled a bit and asked a colleague in ML about the above. They said it makes sense to me when I think about it, but that was not immediately obvious to me. They also said it was not common knowledge – so here it is.

Now better googling gets more stuff such as  Augmentation is also a form of adding prior knowledge to a model; e.g. images are rotated, which you know does not change the class label. and this paper A Kernel Theory of Modern Data Augmentation Dao et al.  where in the introduction they state “Data augmentation can encode prior knowledge about data or task-specific invariances, act as regularizer to make the resulting model more robust, and provide resources to data-hungry deep learning models.” Although the connection to Bayes in either does not seem to be discussed.

Further scholarship likely would lead me to consider deleting this post, but what’s the fun in that?

P.S. In the comments, Anonymous argued “we should have that I(a,u) >= I(ax, u)” which I am now guessing was about putting the augmentation into the model instead of introducing it through fake data examples. So instead of modifying the data in ways that are irrelevant to the prediction (e.g. small translations, rotations, or deformations for handwritten digits), put it into the prior. So instead of obtaining P.axu(u) = P(u) * P(ax|u) based on n augmentations of the data make P.au(u) mathematically (sort of an infinite number of augmentations of the data).

Then Mark van der Wilk adds a comment about actually doing that for multiple possible P.au(u),s and then compares these using the marginal likelihood in a paper with colleagues.

Now, there could not be a better motivation for my post then this from their introduction “This human input makes data augmentation undesirable from a machine learning perspective, akin to hand-crafting features. It is also unsatisfactory from a Bayesian perspective, according to which assumptions and expert knowledge should be explicitly encoded in the prior distribution only. By adding data that are not true observations, the posterior may become overconfident, and the marginal likelihood can no longer be used to compare to other models.”

Thanks Mark.

 

 

 

 

 

Zombie semantics spread in the hope of keeping most on the same low road you are comfortable with now: Delaying the hardship of learning better methodology.

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author. As with any critically-thinking inquirer, the views behind these deliberations are always subject to rethinking and revision at any time.

Now, everything is connected, but this is not primarily about persistent research misconceptions such as statistical significance.

Instead it is about (inherently) interpretable ML versus (misleading with some nonzero frequency) explanatory ML that I previously blogged on just over a year ago.

That was when I first become aware of work by Cynthia Rudin (Duke) that argues upgraded versions of easy to interpret machine learning (ML) technologies (e.g. Cart constrained optimisation to get sparse rule lists, trees, linear integer models, etc.) can offer similar predictive performance of new(er) ML (e.g. deep neural nets) with the added benefit of inherent interpretability. In that initial post, I overlooked the need to define (inherently) interpretability ML as ML where the connection between the inputs given and prediction made is direct. That is, it is simply clear how the ML predicts but not necessarily why such predictions would make sense – understanding how the model works but not an explanation of how the world works.

What’s new? Not much and that’s troubling.

For instance, policy makers are still widely accepting black box models without significant attempts at getting interpretable (rather than explainable) models that would be even better. Apparently, the current lack of  interpretable models with comparable performance to black box models in some high profile applications is being taken as the usual situation without question. To dismiss consideration of interpretable models? Or maybe it is just wishful thinking?

Now there have been both improvements in interpretable methods and their exposition.

For instance, an interpretable ML achieved comparable accuracy to black box ML and received  the FICO Recognition Award. That acknowledging the interpretable submission for going above and beyond expectations with a fully transparent global model that did not need explanation. Additionally there was a user-friendly dashboard to allow users to explore the global model and its interpretations. So a nice very visible success.

Additionally, theoretical work has proceeded to discern if accurate interpretable models could possibly exist in many if most applications.  It avoids Occham’s-Razor-style arguments about the world being truly simple by using a technical argument about function classes, and in particular, Rashomon Sets.

As for their exposition, there is now a succinct 10 minute youtube  Please Stop Doing “Explainable” ML that hits many of the key points along with a highly readable technical exposition that further fleshes out these points: Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead .

However as pointed out in the paper the problem persists that “Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to explain black box models, rather than creating models that are interpretable in the first place, is likely to perpetuate bad practices and can potentially cause catastrophic harm to society”.

Continue reading

Filling/emptying the half empty/full glass of profitable science: Different views on retiring versus retaining thresholds for statistical significance.

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author. As with any critically-thinking inquirer, the views behind these deliberations are always subject to rethinking and revision at any time.

Unless you are new to this blog, you likely will know what this is about.

Now, by profitable science in the title is meant repeatedly producing logically good explanations  which “through subjection to the test of experiment experiment, to lead to the avoidance of all surprise and to the establishment of a habit of positive expectation that shall not be disappointed.” CS Peirce

It all started with a Nature commentary by Valentin Amrhein, Sander Greenland, and Blake McShane. Then the discussion , then thinking about it , then an argument that it is sensible and practical , then an example of statistical significance not working and then a dissenting opinion by Deborah Mayo .

Notice the lack of finally!

However, Valentin Amrhein, Sander Greenland, and Blake McShane have responded with a focused and concise discernment why they think retiring statistical significance will fill up the glass of profitable science while maintaining hard default thresholds for declaring statistical significance will continue to empty it. Statistical significance gives bias a free pass. This is their just published letter to the editor (JPA Ioannidis) on TA Hardwicke and JPA Ioannidis’ Petitions in scientific argumentation: Dissecting the request to retire statistical significance, where Hardwicke and Ioannidis argued (almost) the exact opposite.

“In contrast to Ioannidis, we and others hold that it is using – not retiring – statistical significance as a “filtering process” or “gatekeeper” that “gives bias a free pass”. “

A two sentence excerpt that I liked the most was “Instead, it [retiring statistical significance] encourages honest description of all results and humility about conclusions, thereby reducing selection and publication biases. The aim of single studies should be to report uncensored information that can later be used to make more general conclusions based on cumulative evidence from multiple studies.”

However, the full letter to the editor is only slightly longer than two pages – so should be read in full – Statistical significance gives bias a free pass.

I also can’t help but wonder how much of the discussion that ensued from the initial  Nature commentary could have been avoided if less strict page limitations had been allowed.

Now it may seem strange for an editor who is also an author on the paper drawing a critical letter to the editor – accepts it. It happens, but not always. I also submitted a letter to the editor on this same paper and the same editor rejected it without giving a specific reason. That full letter of mine is below for those who might be interested.

My letter was less focused but had three main points. Someone with a strong position on a topic that undertakes to do a survey themselves displaces the opportunity for others without such strong positions to learn more, univariate  summaries of responses can be misleading and pre-registration (minor) violations and comments (only given in the appendix) can provided insight into the quality of the design and execution of thw survey. For instance, the authors had anticipated analyzing nominal responses with correlation analysis.

Read more.

Continue reading

The virtue of fake universes: A purposeful and safe way to explain empirical inference.

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author. As with any critically-thinking inquirer, the views behind these deliberations are always subject to rethinking and revision at any time.

I keep being drawn to thinking there is a away to explain statistical reasoning to others that will actually do more good than harm. Now, I also keep thinking I should know better – but can’t stop. 

My recent attempt starts with a shadow metaphor, then a review of analytical chemistry and moves to the concept of abstract fake universes (AFUs). AFU’s allow you to be the god of a universe, though not a real one ;-). However, ones you can conveniently define using probability models where it is easy to discern what would repeated happen – given an exactly set truth. 

The shadow metaphor emphasizes that though you see shadows, you are really interested in what is casting the shadows. The analytical chemistry metaphor emphasizes the advantage of making exactly known truths by spiking a set amount of a chemical in test tubes and repeatedly measuring the test tube contents with the inherently noisy assays.  For many empirical questions such spiking is not possible (e.g. underlying cancer incidence) so we have no choice but the think abstractly. Now, abstractly a probability model is an abstract shadow generating machine: with set parameters values it can generate shadows. Well actually samples. Then it seems advantageous to think of probability models as an ideal means to make AFUs with exactly set truths where it is easy to discern what would repeated happen.   

Now, my enthusiasm is buoyed by the realization that one of the best routes for doing that is the prior predictive. The prior predictive generates a large set of hopefully appropriate fake universes, where you know the truth (prior parameters drawn) and can discern what the posterior would be (given the joint model proposed for the analysis and the fake data generated). That is, in each fake universe from the prior predictive, you have the true parameter value(s), the fake sample that the data generating model generated from these and can discern what the posterior would have been calculated to be. Immediately (given computation time) one obtains a large sample of what would repeatedly happen using the proposed joint model in varied fake universes. Various measures of goodness can then be assessed and various averages then calculated.

Good for what? And in which appropriate collective of AFUs (aka Bayesian reference set)?

An earlier attempt of mine to do this in a lecture was recorded and Bob Carpenter has kindly uploaded it as Lectures: Principled Bayesian Workflow—Practicing Safe Bayes (YouTube) Keith O’Rourke (2019). I you decide to watch it, I would suggest setting the play back speed at 1.25. For those who don’t like videos slides and code are here.

The rest of the post below provides some background material for those who may lack background in prior predictive simulation and two stage sampling to obtain a sample from the posterior.

Continue reading

Brief summary notes on Statistical Thinking for enabling better review of clinical trials.

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author. As with any critically-thinking inquirer, the views behind these deliberations are always subject to rethinking and revision at any time.

Now it was spurred by Andrew’s recent post on Statistical Thinking enabling good science.

The day of that post, I happened to look in my email’s trash and noticed that it went back to 2011. One email way back then had an attachment entitled Learning Priorities of RCT versus Non-RCTs. I had forgotten about it. It was one of the last things I had worked on when I last worked in drug regulation.

It was draft of summary points I was putting together for clinical reviewers (clinicians and biologists working in a regulatory agency) to give them a sense of (hopefully good) statistical thinking in reviewing clinical trials for drug approval. I though it brought out many of the key points that were in Andrew’s post and in the paper by Tong that Andrew was discussing.

Now my summary points are in terms of statistical significance, type one error and power, but was 2011. Additionally, I do believe (along with David Spiegelhalter) that regulatory agencies do need to have lines drawn in the sand or set cut points. They have to approve or not approve.  As the seriousness of the approval increases, arguably these set cut points should move from  being almost automatic defaults, to inputs into a weight of evidence evaluation that may overturn them. Now I am working on a post to give an outline of what usually happens in drug regulation. I have received some links to material from a former colleague to help update my 2011 experience base.

In this post, I have made some minor edits, it is not meant to be polished prose but simply summary notes. I thought it might of interest to some and hey I have not posted in over a year and this one was quick and easy.

What can you learn from randomized versus non-randomized comparisons?
What You Can’t Learn (WYCL);
How/Why That’s Critical (HWTC);
Anticipate How To Lessen these limitations (AHTL)

Continue reading

Explainable ML versus Interpretable ML

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author. As with any critically-thinking inquirer, the views behind these deliberations are always subject to rethinking and revision at any time.

First, I want to share something I was taught in MBA school –  all new (and old but still promoted) technologies exaggerate their benefits, are overly dismissive of difficulties, underestimate the true costs and fail to anticipate how older (less promoted) technologies can adapt and offer similar and/or even better benefits and/or with less difficulties and/or less costs.

Now I have recently become aware of work by Cynthia Rudin (Duke) that argues upgraded versions of easy to interpret machine learning (ML) technologies (e.g. Cart) can offer similar predictive performance of new(er) ML (e.g. deep neural nets) with the added benefit of interpret-ability.  But I am also trying to keep in mind or even anticipate how newer ML (e.g. deep neural nets)  can adapt to (re-)match this.

Never say never.

The abstract from Learning customized and optimized lists of rules with mathematical programming. Cynthia Rudin and Seyda Ertekin may suffice to provide a good enough sense for this post.

We introduce a mathematical programming approach to building rule lists, which are a type of interpretable, nonlinear, and logical machine learning classifier involving IF-THEN rules. Unlike traditional decision tree algorithms like CART and C5.0, this method does not use greedy splitting and pruning. Instead, it aims to fully optimize a combination of accuracy and sparsity, obeying user-defined constraints. This method is useful for producing non-black-box predictive models, and has the benefit of a clear user-defined tradeoff between training accuracy and sparsity. The flexible framework of mathematical programming allows users to create customized models with a provable guarantee of optimality. 

For those with less background in ML, think of regression trees or decision trees (Cart) on numerical steroids.

For those with more background in predictive modelling this may be the quickest way to get a sense of what is at stake (and the  challenges). Start at 17:00 and its done by 28:00 – so 10 minutes.

My 9 line summary notes of Rudin’s talk (link above): Please stop doing “Explainable” ML [for high-stakes decisions].

Explainable ML – using a black box and explaining it afterwards.
Interpretable ML – using a model that is not black box.

Advantages of interpret-able ML are mainly for high-stakes decisions.

Accuracy/interpretabilty tradeoff is a myth – in particular for problems with good data representations – all ML methods perform about the same.

[This does leave many application areas where it is not a myth and Explainable or even un-explainable ML will have accuracy advantages.]

Explainable ML is flawed, there are two models the black box model and an understudy model that is explainable and predicts similarly but not identically (exactly the same x% of the time). And sometimes the explanations do not make sense.

p.s. Added a bit more about the other side: problematic obsession with transparency and arguments why “arguments by authority” [black boxes] although the worst kind of arguments are all that most people will accept and make use of  here

p.s2. Just picked up same nice explanation about explanation from Stephen Wolfram’s post . At the very end there is an insightful paragraph which I’ll quote a couple sentences from.  “If we choose to interact only with systems that are computationally much simpler than our brains, then, yes, we can expect to use our brains to systematically understand what the systems are doing … But if we actually want to make full use of the computational capabilities that our universe makes possible, … —we’ll never be able to systematically “outthink” or “understand” those systems … But at some level of abstraction we know enough to be able to see how to get purposes we care about achieved with them.”

Continue reading