I think that “standard inference” which presupposes a finite population – is a conceptual mistake.

As Rob Kass put it “Fisher introduced the idea of a random sample drawn from a hypothetical infinite population, and

Neyman and Pearson’s work encouraged subsequent mathematical statisticians to drop the word “hypothetical” and instead describe statistical inference as analogous to simple random sampling from a finite population…. My complaint is that it is not a good general description of statistical inference” http://www.stat.cmu.edu/~kass/papers/bigpic.pdf

Consider a model of the effect of income, x, on height, y. It’s a linear model and residuals follow some likelihood function. I apply what I think is my prior knowledge on the slope, b, as a normal distribution with some positive mean. I find myself predicting that some individuals in my dataset should be -0.5 feet tall.

I obviously have to rework my model, prior distribution included. Is this inconsistent with choosing a prior to represent my prior knowledge? But obviously, I knew beforehand that people cannot be negative any feet tall.

The prior distribution is part of the model which captures some regularizing behaviors on the parameters that I would expect based on my prior knowledge, but neither it, nor the model itself, represents the total set of my knowledge.

]]>The problem here is the use of a data based prior like “prior probability that a person has a disease = probability that anyone in the population has a disease,” without updating that prior conditionally before the test based on info like “person walked into the clinic with symptoms.”

In practice, physicians are ALL using an informal bayesian style of reasoning (important to not conflate that with formal bayesian inference) that takes into account the information gleaned with their human eyes from the patient in concert with test results.

> a suspected disease is formulated as a diagnostic hypothesis without any prior evidence for or against it, regardless of its prior probability. The disease is then inferred if a test leads to a result known to have a high frequency of leading to an accurate inference.

This is just wrong. The test, under a frequentist framework, cannot provide a “frequency of leading to an accurate inference.” It has frequency of a true positive and a false negative, assuming the disease is there, and the frequency of a true negative and a false positive, assuming that it is not. The frequency of an “accurate inference” depends on the prior probabilities. If you maintained the exact same thresholds on the tests, but sampled people truly randomly out in the world instead of people who walked into a hospital with symptoms, you would see the “frequency of accurate inference” change. The general “accuracy” is not stable under a change in priors; rather, hospital procedure has been designed with the priors of a hospital setting in mind, and those priors are relatively stable.

> And even if a test is performed,the high likelihood ratio of a highly ‘diagnostic’ test is combined with this very low prior probability to generate a posterior probability close to 50 percent which leads to an erroneous inference of the disease being indeterminate.

As professor Gelman would you, an inference that fails obvious posterior predictive checks means that your prior is not incorporating some prior information that you do have.

]]>I have looked at a number of your papers and find your description of Bayesianism, which consists of model creation with emphasis on model checking by severe testing, to be very different from what I thought it was, which is a subjective inference from an updated posterior probability. It may well be that Bayesian reasoning as you describe is employed by physicians for inference during diagnosis in practice. ]]>

And of course if “understanding the prior” means understanding how the posterior depends on the prior that will always require looking at the interaction of prior, model and data.

(One could also adopt the “nothing exists outside the model” view, where any knowledge can only be expressed and understood within a model, making the statement even less interesting.)

]]>Perhaps in Frequentist inference.

I drop ball bearings from selected and precisely measured heights h in an evacuated tube, and I measure the time it takes to hit the bottom of the tube. The predicted fall time is sqrt(2*g*h)+epsilon with epsilon determined by measurement error, and inference on g is my goal.

what population is g associated with?

]]>TNS replied: “Sorry but not in agreement here.

In standard inference, the purpose of a parameter is to describe a population.

What you currently know about the population (as encapsulated by the prior over the parameters) should not be affected by how you may (or may not) intend to go about collecting more information about the population, i.e. affected by the choice of the sampling model.”

Methinks there is a need for some clarification, so I’ll try — and whoever wants to agree, disagree, elaborate, or improve can chime in:

As I understand Daniel’s statement (based on previous things he has said), he is describing a random variable as a random generating process. Since a random variable refers to a population (not to a sample), a parameter for the random variable does refer to a population, not to a sample. (However, I can see how Daniel’s phrase “data generating process” might lead one to believe that he is talking about a sample, so that might be the source of the confusion.)

]]>In standard inference, the purpose of a parameter is to describe a population.

What you currently know about the population (as encapsulated by the prior over the parameters) should not be affected by how you may (or may not) intend to go about collecting more information about the population, i.e. affected by the choice of the sampling model.

This is inconvenient I know.

]]>The reasoning I described in my 5.05am comment can also be applied to working through a checklist of possible sources of bias and other methodological errors in a publication before concluding that the work was probably sound and worthy of statistical analysis. I think that this is one aspect of Mayo’s severe testing.

It also can be used as a guide for hypothetico-deductive scientific reasoning with a number of possible scientific explanations.

]]>You mention CPCs published in the NEJM. In the references of Chapter 1 that follows from the link in my response below at 5.05am, I cite a paper analysing in detail the reasoning used in the CPS: Eddy DM. Clanton CH (1982) The art of diagnosis: solving the clinico-pathological conference. N. Engl J Med 306, 1263-8.

They use the term ‘pivot’ whereas I have always used the term ‘diagnostic lead’ and they use ‘pruner’ whereas I have used ‘eliminator’ (or probabilistic eliminator’).

This reference and the link to Chapter 2 below at 5.05am may also interest you too Anoneuoid.

]]>Yes! Much agree!

]]>“But I’ve noticed similar behavior in human docs. Modern health care professionals dont want to do gross things or use older methods. They want to rely on simple easy things.”

I’d say they may also be biased in favor of things that they are familiar with, and especially toward things that are easily “fixed”. Case in point: When I had severe abdominal pain after something seemed to snap in an exercise class, my GP gave me ulcer medication, because the pain sounded like ulcer pain to him. That didn’t help and made matters worse, so he sent me for an upper GI to check for a hernia, saying “because we know how to fix that”. It took a long time for me to drink enough of the “milk shake” for the upper GI (the main symptom of the injury was that I wasn’t able to get much down to my stomach), which irritated the radiologist, and the upper GI showed nothing wrong. So I suggested to the GP that a physical therapist might be able to help. He said, “That’s a good idea!” and gave me a referral. The physical therapist did a thorough exam, having me get in different positions and move different ways. She then said, “It appears that you’ve pulled your rectus abdominis”, explained that a pulled muscle contracts severely, and that a pulled rectus presses on the stomach and would cause my symptoms. She then put me me on a machine that allowed me to give gentle exercise to loosen up my rectus a little. That night, I was able to eat about half a normal dinner, which was more than I had been able to get down in one sitting since the injury.

]]>I assume that an undiagnosed nonunion fracture (one that doesn’t heal naturally and remains fractured for long periods of time) in a cat is probably rare.

“It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth.” — Sherlock Bayes

]]>+2 for commenting on two of our favorite topics here: cats and poop.

]]>The pragmatic meaning would simply be the the data that would be repeatedly generated given the parameter values at that point.

To answer this requires that a particular data generating model be chosen.

Choose a different data generating model and the particular point in the parameter space means something different.

]]>For the coefficients of your model to have meaning it needs to be properly specified, or at least be thought to be some approximation the correct model.

Maybe dining hall A gives you the option to walk up a flight of stairs, but B does not. So you could include a variable reflecting that which changes your conclusion about diet. Maybe there was a flu/cold going around hall A during one of the weighing sessions so students tended to be more dehydrated, so you should control for that. Etc, etc…

My point is that trying to interpret the coefficients of these arbitrary statistical models is a fool’s errand.

There was just a post on here about a paper where they looked at over 600 million plausible linear model specifications for the same data and the coefficient of interest varied from positive to negative. Then they still had to admit the correct specification is probably nonlinear and there could easily some other variable not collected that would change the estimate.

Here it is: https://statmodeling.stat.columbia.edu/2019/08/01/the-garden-of-forking-paths/

]]>Andrew, another question. Lately I came across a thing called “Lords Paradox”. It is essentially this: suppose you are doing an experiment with 2 groups. You could arrive at different conclusions if you do a repeated measures ANOVA vs. an ANCOVA controlling for initial values of the outcome. A Professor of mine recently had to re-do an Analysis of another research-group because they did the ANCOVA and arrived at wrong conclusions, and I personally often read papers where they analyse those research designs with controlling for the baseline value.

So apparently this is still a thing, Pearl published a paper about it in 2016:

Pearl, J. (2016). Lord’s paradox revisited–(oh lord! kumbaya!). Journal of Causal Inference, 4(2).

However, its really confusing, but I oft learn much about statistics by revisiting the basics, so it would be interesting to hear your opinion about this.

Bimal wrote: “It would be immensely helpful if you, Andrew or someone else were to look at some of these diagnostic exercises in real patients, such as clinical problem solving exercises or clinicopathologic conferences ( CPCs ) published regularly in the New England Journal of Medicine and tell us what the method employed in them is.”

I have being doing just that for many years. The outcome is summarised in the final chapter Oxford Handbook of Clinical Diagnosis. It can be accessed via the following link: http://oxfordmedicine.com/view/10.1093/med/9780199679867.001.0001/med-9780199679867-chapter-13

In summary, diagnostic reasoning can be modelled with a theorem derived from Bayes’s EXTENDED rule. It is applicable with dependence assumptions (as well as limited independence assumptions). The theorem, proof and corollaries are provided at the end of the chapter.

The ‘priors’ used are conditional priors that incorporate an unconditional diagnostic prior and likelihood of the initial finding conditional on that diagnosis. These provide the conditional priors of the differential diagnosis conditional on that initial finding (eg chest pain).

This differential diagnosis is investigated by seeking other findings (eg an EKG result) that are likely to occur in one diagnostic possibility but unlikely in another, making the latter less probable and the former more probable.

The tactics of this reasoning process are explained in Chapter 1 http://oxfordmedicine.com/view/10.1093/med/9780199679867.001.0001/med-9780199679867-chapter-1 and in more detail with an example in Chapter 2 http://oxfordmedicine.com/view/10.1093/med/9780199679867.001.0001/med-9780199679867-chapter-2

So Andrew, here was another hole in Bayesian Philosophy, one which I have been filling in!

]]>Ok, this is a cat. But ive noticed similar behavior in human docs. Modern health care professionals dont want to do gross things or use older methods. They want to rely on simple easy things.

]]>You missed the point of my comment. I was not addressing, either positively or negatively, your description of what doctors actually do. My point was that the method you describe as “Bayesian” is, from the evidence provided in your description, not actually Bayesian. This happens all the time, that people have a crappy statistical procedure that they label as being from some popular method. The general problem of people taking crappy methods and giving them inaccurate labels is important; it’s just not the subject of the talk I’ll be giving tomorrow.

]]>Also, it’s a bit humorous to suggest physicians conjure up hypothesies about what diseases their patients might have without any prior knowledge. If they have no prior knowledge where do the hypothesies come from I wonder?

]]>This is a fact that physicians are diagnosing and treating diseases fairly well on a daily basis.My interest is in knowing the method of statistical inference they employ during diagnosis in practice.I have looked at published diagnostic exercises in real patients and my assessment is, the method employed in them is not Bayesian.

It would be immensely helpful if you, Andrew or someone else were to look at some of these diagnostic exercises in real patients, such as clinical problem solving exercises or clinicopathologic conferences ( CPCs ) published regularly in the New England Journal of Medicine and tell us what the method employed in them is.

My perspective about this issue is different from practically everyone else in this blog which I find is mainly theoretical.

My perspective is that of a user of a statistical method on the ground.If someone can show me, the method in published exercises is Bayesian, I shall agree.And if he or she finds the method is not Bayesian, but a Bayesian method would be better, I would like to see how. ]]>

in any case the point about doctors not being stupid is another way of saying they will use prior information, like the arrow through the chest or the signs of pneumonia.

to claim that what doctors do is to always use frequentists methods with no prior info is to completely misunderstand what priors are

]]>From a practicing physician’s viewpoint, the Bayesian method does not work well for statistical inference during diagnosis in practice.

[…]

In practice, a suspected disease is formulated as a diagnostic hypothesis without any prior evidence for or against it, regardless of its prior probability. The disease is then inferred if a test leads to a result known to have a high frequency of leading to an accurate inference.

The method employed is closely similar or identical to the frequentist method of statistical inference.

As I said, the frequentist method often gives the same result as the bayesian method with a uniform prior. Ie, it is an approximation of a special case of the bayesian method. In cases where this is not true, it seems it is always the frequentist method that returns nonsense.

Also, I highly doubt you devise a diagnostic hypothesis without coming up with some type idea like “the probability the person has disease x”. If that is what you do, you are using Bayesian reasoning.

In the Bayesian method, a very low prior probability represents very strong prior evidence for or a very strong belief in absence of disease which is likely to lead to this disease being ruled out without testing. And even if a test is performed,the high likelihood ratio of a highly ‘diagnostic’ test is combined with this very low prior probability to generate a posterior probability close to 50 percent which leads to an erroneous inference of the disease being indeterminate.

I have no idea what you are describing but it sounds like a nonsense method. Also, I have had many first/second-hand experiences where second and third independent medical opinions differ greatly from the first opinion. I would even say that is expected.

So it wouldn’t surprise me if indeterminate diagnoses should be far, far more common than we currently see. Likely if you don’t know what is going on the best course of action is nothing or placebo, harder to make money that way though…

]]>That 7% that apparently you cannot come to terms with is for young women not at risk and not presenting typical angina pain [*]. So there would be no particular reasons to suspect a heart attack more than anything else, and it is to let the doctor know that fact that those statistics are compiled. To be of use as a baseline in those cases were the origin of the pain is unkown. Doctors can then build on that according to additional information and will just ignore it when patients come into the examination room with an arrow traversing their chest, or suffering an asthma attack, or with any other obvious indication of what’s wrong. Because the doctors writing and applying the guidelines are not complete morons.

[*] And even the typical symptoms are not a very good indicator: “Investigators at a single center in New York City conducted a retrospective study involving 2525 patients with no previous history of myocardial infarction or coronary revascularization who were evaluated for ACS in an emergency department–based chest pain unit. Typical angina was defined as “the presence of substernal chest pain or discomfort that was provoked by exertion or emotional stress and was relieved by rest and/or nitroglycerin.” All patients underwent provocative stress testing after serial biomarkers were obtained.

Presenting symptoms did not vary significantly by sex, age, or history of diabetes. Ischemia was induced by stress testing in 14% of 231 patients with typical angina, 11% of 2140 patients with atypical chest pain, and 16% of 153 patients with no chest pain at presentation. Thus, patients with typical angina were not significantly more likely than those with no or atypical chest pain to have inducible myocardial ischemia.”

https://www.jwatch.org/jc201007070000002/2010/07/07/typical-angina-vs-atypical-chest-pain

]]>Like any worker anywhere in any job they *typically* do what is the least work for them with the lowest potential for adverse consequences: A. Lowest chance of incorrect diagnosis, least effort, most defensible to insurance.

]]>Indiscriminate multiple testing is a particular problem when the tests have low specificity. Abnormal liver function tests for instance are very common and have a list of causes as long as your arm (“the liver is a stupid organ – it can only grunt”). It’s not such a problem with highly specific tests – a pregnancy test is hard to mis-interpret. ]]>

For example in Bimal’s original example from a previous post of an ER patient with heart attack. Suppose among all patients who doctors actually order EKGs on, if the finding is positive for EKG abnormality then 95% of these people actually have had a heart attack.

Now suppose Bimal plugs in his 7% of people coming into the ER with chest pain have heart attack, and together with the data on the calibration of the EKG results, he calculates 50% posterior probability of heart attack. Then in fact he has good reason to dislike this method, because 95% of the people who had the EKG ordered and it came back positive actually had heart attack, but the calculation suggests 50%… what went wrong?

What went wrong was that doctors have additional information and they don’t order EKG tests on people who obviously have broken ribs, were punched in the chest, have obvious gallbladder problems, have obvious pneumonia or severe asthma, a pleural infection, etc etc. So the information that should be used in terms of prior is the *prior probability that the doctor assigns to the patient given everything the doctor knows*. Just knowing that the doctor thinks an EKG is a good idea after exam should increase our notion of prior probability of heart attack above the basal rate for all people showing up in the ER with just “chest pain”. If that basal rate is 7%, then the rate for “people who we order an EKG on” is likely 15% or 30% or 40%, something similar.

I would find it hard to believe that 93% of the time a doctor suspects heart attack and orders an EKG for a patient in the ER with chest pain the patient will be having something other than a heart attack.

The relevant question to ask is “after taking a history and a brief physical exam, given that a trained ER doctor thinks an EKG is needed, and given the type and severity and timing of reported symptoms, what is the probability the patient has a heart attack?” not “what fraction of all people showing up in the ER who have chest pain included on their chart have heart attack”.

]]>It sounds to me that “the Bayesian method” you describe is not actual Bayesian inference, but rather some wrong calculations that someone has mistakenly labeled as Bayesian inference.

I’m sure there are a lot of bad analyses being done that are labeled as Bayes but aren’t. This is a completely different problem than what I discuss in my talk, which is analyses that are actually Bayesian, but which use bad models and as a result give bad results.

]]>The prior can often only be understood in the context of the likelihood.

]]>A) Order every test available that might be relevant to people with any sort of abdominal pain and diagnose whatever disease comes back with a positive test provided that it offers high frequency of correct diagnosis and regardless and independent of the other results obtained from other tests (so that most patients will be diagnosed with anywhere from 0 to say 10 or 20 different simultaneous ailments after taking 1575 tests.

B) Order a liver test and 2 other tests for less likely issues but still within the realm of plausible because the prior information the doctor has on the basis of history and palpation alone suggests that one of these 3 conditions is almost certainly the one of interest.

If Bimal’s assertion were correct, then A would be what doctors do, and should do. If on the other hand Doctors go based on their state of information about the patient and order only things that test for conditions having some nontrivial prior probability of occurrence, then they should be found doing B.

I wonder which one doctors actually do?

]]>In the Bayesian method, a very low prior probability represents very strong prior evidence for or a very strong belief in absence of disease which is likely to lead to this disease being ruled out without testing. And even if a test is performed,the high likelihood ratio of a highly ‘diagnostic’ test is combined with this very low prior probability to generate a posterior probability close to 50 percent which leads to an erroneous inference of the disease being indeterminate.

In practice, a suspected disease is formulated as a diagnostic hypothesis without any prior evidence for or against it, regardless of its prior probability. The disease is then inferred if a test leads to a result known to have a high frequency of leading to an accurate inference.

The method employed is closely similar or identical to the frequentist method of statistical inference.

The surprising thing is the Bayesian method has been prescribed as the normatively correct method in diagnosis, but when we look at published diagnostic exercises in real patients, we find it has not been employed in any one of them. ]]>

I think there is probably some difference in focus between mathematicians vs science here. In science you always have other approximations, etc anyway so you don’t expect anything to be perfect. In mathematics you expect no flaws at all.

]]>A prior that is very spiky for one parameterisation of the model can be a flat prior by parameterising the model in a different way.

Therefore, we naturally will try to take model parameterisation out of the picture.

However, the question of what is the right parameterisation of the model can not be answered sensibly.

Hence, some may begin to think about model dependent priors (e.g. Jeffreys prior) and then find that they are probably opening up an even bigger can of worms (e.g. they realise that they are doing something that is not truly Bayesian).

]]>Also, I’d argue that:

1) there has been near universal misinterpretation of a confidence interval as a HDI (credible interval)

2) There is often numerical similarity of a confidence interval with x% coverage to an HDI containing x% of the posterior when using a flat prior

Taken together I conclude that almost everyone has been effectively drawing conclusions from models using flat priors for the last ~80 years.

]]>