Comments on: No, its not correct to say that you can be 95% sure that the true value will be in the confidence interval

By: Mark Schaffer

Mark Schaffer — Sun, 28 Apr 2019 20:43:34 +0000

In reply to george.

I ordered the book, and it’s great, but it turns out the book uses archery, not ring toss, to explain CIs.

I think they missed a trick here….

By: Elin J Waring

Elin J Waring — Sun, 28 Apr 2019 19:32:55 +0000

In reply to Martha (Smith). +1 thanks!

By: Martha (Smith)

Martha (Smith) — Sat, 27 Apr 2019 02:59:36 +0000

In reply to Anoneuoid. Yes, the actual biting is not a big deal; it's the itching that drives you crazy -- although the sting of the bite is usually worse the hotter the weather.

By: Anoneuoid

Anoneuoid — Fri, 26 Apr 2019 22:13:47 +0000

In reply to Anoneuoid.

Thanks for the advice. I think I must be near the end for this instance though.

PS My guess is that the severity of symptoms depends on many factors, e.g., strain of ant, weather, sensitivity of individual, number of bites, previous exposure to the ant “venom”.

Yes, I think I also lucked out pain wise because it was early in the year. Supposedly the bites get worse in mid-summer because the concentrations of various venom components are seasonal:
https://www.sciencedirect.com/science/article/pii/S0091674986800859

Basically, I was surprised at how long the itchiness is lasting given how little pain I experienced compared to what others reported. I’d even describe my initial sensation as more a slight pinch followed by tingling than pain. Perhaps it was even a different type of ant though.

By: Martha (Smith)

Martha (Smith) — Fri, 26 Apr 2019 19:13:07 +0000

In reply to Anoneuoid.

Anon,
Here’s the method that works better than anything else I’ve tried for dealing with ant bites:
As soon as possible, use a cotton swab to put a *very small* dab of hydrocortisone cream on the pustule. Then cover it with a bandaid (whatever size and shape works best — I sometimes get multiple bites near each other so need to choose and position the bandaids to cover all, but not have adhesive directly on any pustule). Repeat after you bathe or sweat excessively. The bandaid serves three purposes: It keeps you from automatically scratching the bite, it prevents clothing etc. that brushes against the bite from initiating itching, and it keeps the hydrocortisone in place.

PS My guess is that the severity of symptoms depends on many factors, e.g., strain of ant, weather, sensitivity of individual, number of bites, previous exposure to the ant “venom”.

By: Anoneuoid

Anoneuoid — Fri, 26 Apr 2019 17:42:21 +0000

In reply to Daniel Lakeland.

In medicine for example, you recruit some people for a blood pressure trial, you randomize and you find the drug reduces blood pressure and has few side effects. Fine, but you did the trial in southern Germany, where genetics, diet, exercise level, climate, jobs, social activities, and soforth all vary *considerably* from say southeast asia. Now if you give the blood pressure drug to people in Thailand or Cambodia or something, what will be the outcome *and why* which of the many variable factors are critical to the outcome of the drug treatment? For example activity of certain liver enzymes, or fish vs sausage in diets, or wealthy access to mercedes benz automobiles for travel vs biking everywhere with a heavy bike trailer and hence having different exercise patterns? What?

Trying to extract useful information from the medical literature can be infuriating. For example about a week ago I got bit by some ants (I assume fireants), the pustules are still somewhat itchy a week later. I wanted to know how unusual this is, so I look for a timecourse that shows the percent of people whose symptoms resolved after x number of days. This data doesn’t seem to exist. All I found is “folk” knowledge that differs across various sources. Some saying it should resolve in a few days, others say several weeks.

Eventually, I found some useful info from a study where volunteers were bitten by the ants and observed. This was from all the way back in 1957:
https://jamanetwork.com/journals/jamadermatology/article-abstract/524964

Unfortunately they only describe typical results and a few case studies, so I didn’t get the timecourse I wanted. But it did answer my question.

By: Daniel Lakeland

Daniel Lakeland — Fri, 26 Apr 2019 15:56:34 +0000

In reply to Anoneuoid.

>That could be just because they were mistrained to do that already though.

Yes I think so. There’s nothing particularly Bayesian about Bayes Factors. They are maybe a useful tool to figure out which models you can drop from consideration as a computational simplification, or if you have to decide between a small discrete set of models, like whether you’ve detected a whale or a submarine via sonar or whatever.

One way that I like to think about this debate is how the central limit theorem functions. It’s a mathematical fact that if you add up almost any of the possible subsets of a bunch of numbers the sum will be close to some value so long as the population of numbers isn’t too weird (ie. has some extreme outliers). This mathematical fact *does not rely on any fact about physics, biology, chemistry, psychology, social interaction, ecology, law, available medical treatments, etc* it is entirely derivable from a counting argument about how many subsets its even possible to form that have averages far from the overall average, and so focusing on the reliability of this fact and attributing the fact incorrectly to some fundamental physical property of the world is wrongheaded.

At best, it is a way to improve measurements by cancelling out measurement error, at worst it’s like asking a ouija board which scientific laws you should believe in… The complaints you often give about testing null hypotheses are a kind of subset of the problem. A big problem is that in many fields we are replacing *theory* with instead *measuring things and then post-hoc theorizing about why we got those measurements as if the measurements are inevitable facts about the world* This is sure to fool you almost every time, as we aren’t even asking what quantities are the important determinants of the measurement. We can get RCT results and it means that we can be reasonably sure that the thing we did caused the change in measurement… but generalization error can easily be huge when you move from the RCT to actual usage, because we have ignored all the important determinants of the outcome.

In medicine for example, you recruit some people for a blood pressure trial, you randomize and you find the drug reduces blood pressure and has few side effects. Fine, but you did the trial in southern Germany, where genetics, diet, exercise level, climate, jobs, social activities, and soforth all vary *considerably* from say southeast asia. Now if you give the blood pressure drug to people in Thailand or Cambodia or something, what will be the outcome *and why* which of the many variable factors are critical to the outcome of the drug treatment? For example activity of certain liver enzymes, or fish vs sausage in diets, or wealthy access to mercedes benz automobiles for travel vs biking everywhere with a heavy bike trailer and hence having different exercise patterns? What?

This failure to even try to investigate a model of what happens is sometimes even taken as an *advantage*. A “model free inference”. I say bullshit.

By: Martha (Smith)

Martha (Smith) — Fri, 26 Apr 2019 15:41:01 +0000

In reply to Charlotte. Thanks for pointing out the distinction that is addressed in French but not in English terminology.

By: Charlotte

Charlotte — Fri, 26 Apr 2019 15:22:36 +0000

In reply to Nat.

Confidence intervals are not intervals that contain a specified percentage of all values.

First, confidence intervals relate to parameters calculated from multiple observations (such as a mean or proportion), not to the value of an individual observation. For individual observations, an interval in which future observations will fall with a specified probability, given previous observations (your sample), is a prediction interval (https://en.wikipedia.org/wiki/Prediction_interval).

Second, in your post, there is a confusion between an interval calculated from the actual distribution (i.e. we know the exact distribution in the population) vs. an interval estimated from a sample taken in the population (i.e. we can only estimate the exact distribution in the population). In English, there is no clear distinction, as prediction intervals are used in both cases : an interval in which the future value of an individual observation or of a parameter measured on a sample should fall with a given propability given the actual distribution in the population. French names this a “fluctuation interval” and keeps “prediction interval” for the case described in the previous paragraph (predicting future observations based on previous ones).

By: Keith O'Rourke

Keith O'Rourke — Fri, 26 Apr 2019 14:34:31 +0000

In reply to Sameera Daniels.

Agree – as in “I assure you that I am most certainly correct and those who disagree with me are most certainly wrong or at least offensive to me”

By: Anoneuoid

Anoneuoid — Fri, 26 Apr 2019 14:34:12 +0000

In reply to Daniel Lakeland. I almost put something in that post about "if people start testing their own hypotheses they will tend to become bayesian anyway, maybe that is the real reason there is such resistance to dropping the default strawman null", but decided it would distract from the point. And we do see people wanting to test strawman null models with Bayes factors and the like too. That could be just because they were mistrained to do that already though.

By: Sameera Daniels

Sameera Daniels — Fri, 26 Apr 2019 14:20:14 +0000

In reply to Sameera Daniels. My apologies. I did. not mean ‘convey’ either. I meant ‘reify’. I’m on a bumpy bus.

By: Daniel Lakeland

Daniel Lakeland — Fri, 26 Apr 2019 13:51:05 +0000

In reply to Anoneuoid. to me the Bayes vs Frequentist debate is really about modeling processes vs replacing reality with random number generator. do you focus on the unknowns and trying to discover how they work, or only on data and how functions of data behave mathematically. the Frequentist approach actively discourages mechanistic thinking and I think this is why it is so offensive to me

By: Sameera Daniels

Sameera Daniels — Fri, 26 Apr 2019 13:00:54 +0000

In reply to Anoneuoid. I think the debate got off too cranky for some reason which, while entertaining, can set the stage for theatrical extremism.

By: Sameera Daniels

Sameera Daniels — Fri, 26 Apr 2019 12:42:30 +0000

In reply to Sameera Daniels. Sorry I meant ‘convey’ on Metro which is shaky

By: Sameera Daniels

Sameera Daniels — Fri, 26 Apr 2019 12:40:06 +0000

In reply to Keith O'Rourke. The argumentative and ideological nature of current debates Reid’s the very statistics related cautions that thought leaders say to guard against. The JAMA Current has allowed for comments at the end of John Ioannidis’ own response to ‘Retire Stat Sig. But there are some gaps in reasoning to be filled.

By: Anoneuoid

Anoneuoid — Fri, 26 Apr 2019 12:38:23 +0000

In reply to Keith O'Rourke.

Right now, the X is Bayes versus Frequentist

Which is ridiculous since for what most people are doing they give the approximately the same numerical answer. So the end user is (usually) free to interpret the result of their frequentist calculation in a bayesian way. Vice versa is fine too (although I have never heard of anyone wanting to do that...). So there is this entire heated debate about something of no consequence to 99.9% of people who will use stats. Meanwhile the real problem of testing your hypothesis vs a default strawman hypothesis continues to go largely ignored and the BS conclusions are accumulating at an ever increasing pace.

By: Keith O'Rourke

Keith O'Rourke — Fri, 26 Apr 2019 12:19:49 +0000

In reply to Keith O’Rourke.

From Andrew’s Apr 25 comment “I think a big big problem is that statistical methods have been sold as automatically generating trustworthy results”

I think a big big problem is that the *correct* choice between Bayes versus Freqentist is trying to be sold by some as the only route for generating trustworthy results (and even usually just automatically).

By: Keith O'Rourke

Keith O'Rourke — Fri, 26 Apr 2019 12:05:32 +0000

In reply to Keith O'Rourke. Arghh but with NO rules, judges or methods of enforcement…

By: Keith O'Rourke

Keith O'Rourke — Fri, 26 Apr 2019 12:03:54 +0000

In reply to Thanatos Savehn.

> There’s a compromise to be had here
Whether it will be had, keeps being called into question by a few (many?) who at least verbalize a position “those who do not agree that X is the only way forward are either stupid or evil or both”.

Right now, the X is Bayes versus Frequentist, but if one side succeeded in completely annihilating the other, the X would just change to something else.

I think the bottom line is that academia is not (now) a community trying to be scientifically profitable (bending over backwards to help each other become less wrong) but rather a debating community where the winners hope to take all (but of course no prisoners).

Much more like adversarial as in civil dispute processing but with rules, judges or methods of enforcement…

By: Thanatos Savehn

Thanatos Savehn — Fri, 26 Apr 2019 07:10:15 +0000

In reply to Deborah G. Mayo. Psssst! There's a compromise to be had here, or I'm a monkey's uncle. Go with it.

By: Carlos Ungil

Carlos Ungil — Fri, 26 Apr 2019 05:54:39 +0000

> If I ask my frequentist statistician for a 95%-confidence interval, I can be 95% sure that the true value will be in the interval she just gave me.

If I throw a die I can be 50% sure that I will get an even number. This doesn’t mean that I can be 50% sure that the number that I just got is an even number. If I’ve seen the number and I know what does it mean for a number to be even then I can be 100% sure that the number is (or is not) even.

By: Corey

Corey — Fri, 26 Apr 2019 05:16:55 +0000

In reply to Justin Smith.

With that interval example, I do wonder why a) frequentists are denied from knowing/using the scientific knowledge about the minimum of the process (background knowledge can enter without using Bayesian priors) and b) frequentists are apparently banned from doing confidence intervals any other way (such as bootstrap).

This complaint simply isn't true. In my original description of the problem I thought I was pretty explicit that frequentists are not being banned from doing doing anything -- if they were, it would hardly make sense to ask "What went wrong, and how can it be fixed — how can a frequentist improve upon this confidence procedure?". I suppose I have to infer that you've not yet noticed my reply to your post below (but earlier!) in which you raise this point (sample quote: "in no way am I claiming this is the only acceptable frequentist approach to the problem").

By: Anoneuoid

Anoneuoid — Fri, 26 Apr 2019 04:03:37 +0000

In reply to Justin Smith.

Are Bayes factors remedying this though?

No, doing the same thing with Bayes factors makes no difference. That is why I said "the issue is orthogonal to the Bayesian vs Frequentist issue". The point was to get you to say what your measure of "success" was.

By: Justin Smith

Justin Smith — Fri, 26 Apr 2019 03:53:49 +0000

In reply to Justin Smith.

“I think your mistake is to think that statistics is about producing intervals that you can trust.”

I mentioned intervals because the original poster mentioned intervals (from the truncated exponential example). I think intervals are just one aspect of stats, not the whole thing.

With that interval example, I do wonder why a) frequentists are denied from knowing/using the scientific knowledge about the minimum of the process (background knowledge can enter without using Bayesian priors) and b) frequentists are apparently banned from doing confidence intervals any other way (such as bootstrap).

So again I agree with the Bayesian math here, but I don’t find it a convincing (to me) example.

“Pizzagate, himmicanes, ages ending in 9, beauty and sex ratios, power pose, ovulation and voting, ESP, etc etc etc . . . just a long stream of claims which the scientific and journalistic establishment are pushing at us without good evidence.”

Speaking of ESP, I read a paper that said “Bayesian results range from confirmation of the classical analysis to complete refutation, depending on the choice of prior.”, so it doesn’t seem like Bayesian is the answer either, or for preventing false positives. Frequentism or Bayesian, I’d be more concerned about experimental design than anything else.

Justin

By: Justin Smith

Justin Smith — Fri, 26 Apr 2019 03:42:17 +0000

In reply to Justin Smith.

“It seems to be that only ~10-50% replications get significance in the same direction, when it should be 50% for sufficiently powered studies. And then there is the issue of properly interpreting these results, which I suspect is even bigger.”

Are Bayes factors remedying this though? That is, are Bayesian approaches a) solving replication issues, and b) solving interpretation issues? Are the same people that are botching understanding p-values somehow understanding the intracacies of priors and MCMC?

Justin

By: Daniel Lakeland

Daniel Lakeland — Thu, 25 Apr 2019 21:38:08 +0000

In reply to Justin Smith.

Also it’s fine to say that in reality maybe our models are wrong, and we should check them… like maybe the truncation point changes in time in an oscillating way or whatever…

but in this case the frequency based analysis predicts the wrong thing *even if all its assumptions are exactly mathematically true* doesn’t that bother you?

Like what if you do a RCT of a medicine and there’s some kind of censoring of toxicity results because toxicity below some threshold doesn’t get detected in the 6 months of the study but over the long term you get liver cancer from the drug… and you detect one person in your RCT with liver injury, so your frequency based analysis says that you can be “95% confident that less than 1 in 1000 people will have liver injury” but a Bayesian analysis using real information that we already know about the mechanism of the drug shows that it has to be at least 35 out of 1000

is that so different a situation? no. In fact I’d be shocked if we couldn’t go and find some such similar type of result with a dramatically incorrect estimate of adverse event risk due to frequentist confidence intervals in the last 10 years of drug trials which would have resulted in a very different estimate with the use of a Bayesian model and info from say a pharmacokinetic model and some data collected in that experiment that would have ruled out low liver risk or similar.

I even remember this example from a over decade ago on this blog: https://statmodeling.stat.columbia.edu/2007/08/20/jeremy_miles_wr/

By: Daniel Lakeland

Daniel Lakeland — Thu, 25 Apr 2019 21:11:48 +0000

In reply to Justin Smith.

My point is that calling something successful because your model predicts it and its indeed seen is exactly what even Mayo for example is arguing against, you need to also see if other alternative models would also predict the same thing or alternative things… the genie-fairy model is just a stand in for whatever else might be going on. Declaring success because your frequency model predicts whatever is seen is a non-sequitur, given the mathematics, it has to be extreme variations from sequences with near 50/50 are the only things that are possible in order to get anything other than about 50/50 since almost all of the possible sequences have near 50/50…

declaring success of the frequency model is like declaring success of a medical model because your model of toenail fungus predicts that almost all patients coming to a doctor with toenail fungus will live more than 5 years after their first visit… it’s almost gotta be the case given what we know about mechanisms by which toenail fungus could cause mortality, frequency properties of RCTs etc have nothing to do with it. What would be amazing is if you showed that you had a mechanistic model that no-one had ever thought of by which toenail fungus causes extreme increases in heart attack risk and you did in fact see such extreme increases…. This would be the equivalent of finding that in all sequences of 100 flips with a given coin, all of the last 35 of them had to be heads due to mystery mechanism you’ve discovered…

everything else is just mis-attributing what is essentially a combinatorial counting argument as if it had deep meaning for the physics of flat metal discs.

By: Anoneuoid

Anoneuoid — Thu, 25 Apr 2019 20:53:34 +0000

In reply to Justin Smith. typo:

it should be 50% for sufficiently powered studies if nothing interesting is going on

By: Anoneuoid

Anoneuoid — Thu, 25 Apr 2019 20:43:09 +0000

In reply to Justin Smith.

Success from experimental design, quality control, survey sampling, and yes even coin flipping, hypothesis testing used in sciences and other areas all over the world kind of disproves the genie/fairy mocking I’d opine.

A big problem is how you measure success. There is a currently a problem that statistical significance (which is orthogonal to the frequentist vs bayesian issue) -> publication -> "success". So the very measure people have been using for "success" is flawed. Using other metrics like percent of papers that replicate tells a very different story. It seems to be that only ~10-50% replications get significance in the same direction, when it should be 50% for sufficiently powered studies. And then there is the issue of properly interpreting these results, which I suspect is even bigger.

By: Sameera Daniels

Sameera Daniels — Thu, 25 Apr 2019 20:30:46 +0000

Andrew

Re; ‘I agree with what you wrote there. I think your mistake is to think that statistics is about producing intervals that you can trust. My take on the replication crisis is that it’s all about scientists and publicists trying to use p-values and other statistical summaries to intimidate people into trusting—accepting as scientifically-proven truth—various claims for which there is no good evidence.’

Very well said. This was my guess when it came around to asking them for theories for their ‘science’ and ‘statistics’ claims. Just could not explain it all that well. Compelling evidence that something is amiss in medical and statistics education and, more fundamentally, conceptually and methodologically. Science itself is presumed to be self-correcting, but it is not a timely process by any means. I think Raymond Hubbard’s frame he refers to as ‘hypothetico-deductivism’, accepted as a model for much science and justified by the use of NHST, can be expanded. Would make for an intriguing query. I haven’t read Corrupt Research yet. But am fascinated by its reviews.

By: Justin

Justin — Thu, 25 Apr 2019 20:28:41 +0000

In reply to Justin Smith.

“It is absolutely 100% true that if you see the numbers given you can conclude “with 100% certainty, the truncation point is below min(data) = 12”

Well you can conclude, I wouldn’t conclude because I am not convinced by a maybe wrong model and an analysis based on n=3 and nonreplicated non-experiment. It is just mathematically true sure. I’d tackle this problem with bootstrapping.

“It’d also be just as much evidence that only sequences where somewhere between 45% and 55% of any sequence can be heads and if you get more than 13 heads in a row the genie will come out of the lamp and smite you and you’ll die before completing the experiment…. etc.”

We’ll just have to agree to disagree I guess. Success from experimental design, quality control, survey sampling, and yes even coin flipping, hypothesis testing used in sciences and other areas all over the world kind of disproves the genie/fairy mocking I’d opine..

Justin

By: Daniel Lakeland

Daniel Lakeland — Thu, 25 Apr 2019 19:12:23 +0000

In reply to Justin Smith.

> a-c still wouldn’t “prove” equally likely. It would, however, be evidence for equally likely.

It’d be just as much evidence that all coin flips were equally likely except one of them, we don’t know which, which *has* to be equal to the observed value… It’d also be just as much evidence that only sequences where somewhere between 45% and 55% of any sequence can be heads and if you get more than 13 heads in a row the genie will come out of the lamp and smite you and you’ll die before completing the experiment…. etc.

There are 2^100 = 1.27e30 possible sequences of 100 coinflips, you could invent any number of stories about magic fairies that intentionally exclude all but 1% of the possibilities and you’d still probably get close to a 50/50 split. The fairies would still leave you with 1.27e28 possible sequences, almost all of which necessarily have near 50/50 split. Is the fact that you got 50/50 evidence of magic fairies?

By: Daniel Lakeland

Daniel Lakeland — Thu, 25 Apr 2019 19:01:41 +0000

In reply to Justin Smith.

>1) it is a small sample setting, I don’t believe any approach is good

It is absolutely 100% true that if you see the numbers given you can conclude “with 100% certainty, the truncation point is below min(data) = 12”

So any result that gives you an interval where the truncation point lies that includes points above 12 is insane. It’s like arguing that 1+1 is somewhere between 3 and 6. We have logical certainty that it isn’t.

Doesn’t matter how small your sample is, could be just 1 point, you can always conclude with certainty at least that the truncation point is below that one data point.

> 2) how does the Bayesian interval vary with different Bayesian priors?

It will weight different parts of the interval [0,12] differently, it doesn’t matter how wacky your prior it can NEVER give you an interval that includes any points above 12… whereas the example Frequentist method is *entirely* above 12.

> 3) frequentism is not tied to always using one rule like Bayesian is tied to Bayes rule, so why is the frequentist method only allowed one method?

it isn’t, the point is that whatever the method, the thing that makes it good isn’t because it has guaranteed confidence coverage. Basically in this case the thing that makes a frequentist interval estimation method good is that it had better produce a subset of [0,12] since logic tells us that the truncation point *has to be there*, which the Bayes method does automatically. There’s also a proof (Cox’s theorem) which tells you that if the method agrees with binary logic in the limit where binary logic gives certainty, then it pretty much has to be Bayes… So there’s that.

By: Andrew

Andrew — Thu, 25 Apr 2019 18:33:11 +0000

In reply to Justin Smith.

Justin:

I think your mistake is buried within the following sentence of yours:

For n = 3, I wouldn’t personally trust any interval from any method because it just isn’t enough data.

I agree with what you wrote there. I think your mistake is to think that statistics is about producing intervals that you can trust. My take on the replication crisis is that it’s all about scientists and publicists trying to use p-values and other statistical summaries to intimidate people into trusting—accepting as scientifically-proven truth—various claims for which there is no good evidence.

Pizzagate, himmicanes, ages ending in 9, beauty and sex ratios, power pose, ovulation and voting, ESP, etc etc etc . . . just a long stream of claims which the scientific and journalistic establishment are pushing at us without good evidence. I have no problem with some of these as conjectures, but they should be presented as such.

So, sure, I agree with you about not trusting anything based on pure statistical analysis with n=3. But I’d extend that distrust more generally, and I think a big big problem is that statistical methods have been sold as automatically generating trustworthy results.

By: Justin

Justin — Thu, 25 Apr 2019 17:20:01 +0000

In reply to Justin Smith.

“You’re a frequentist fanatic,”

No, I am not, but thanks for the name-calling and poisoning of the well. I will point out when people say that frequentism supposedly doesn’t work, however.

“The Bayesian approach as given in detail in the paper works perfect.”

The math works, sure, but for n = 3, I wouldn’t personally trust any interval from any method because it just isn’t enough data.

“That’s easy enough for you to calculate. Are you insinuating different answers for different assumptions is a fatal flaw of Bayes?”

The Drake-equation-ness depends on how wacky the priors are.

“In Frequentism, however, the following scenario gets repeated ad-nausem: a flaw is found (every time without exception) and the frequentist uses their intuition to just barely ad-hoc change things to avoid the flaw.”

What you say as ad hoc and breaking the likelihood principle, others can say as flexibility, being practical, and solving problems.

“Frequentist methods don’t “rarely” lead to silly answer in practice. They usually do.”

Not in my experiences. Your experiences may vary, of course.

“Because frequencies are just functions of data. Bayesians are allowed to use frequencies or any other function of data they want. The difference between Bayes and Frequentist isn’t that one uses frequencies and the other doesn’t, rather, the different is one claims probabilities are fundamentally frequencies while the other does not.”

It seems frequencies are pretty fundamental to any type of probability then, since Bayesians rely on histograms and probability distributions.

“But suppose you assume all outcomes of 100 coin flips are equally likely, then you’ll predict roughly 50 heads and 50 tails in 100 flips. From this Frequentist proclaim this “proves’ or “justifies” the equally likely assumption. NOT TRUE!!! ”

And not at all an accurate depiction. The first sentence, OK. The sentence doesn’t follow. If there was:
a) good experiment giving about 50 heads on average
b) a few to many repetitions of a)
c) meta analysis analyzing the results

a-c still wouldn’t “prove” equally likely. It would, however, be evidence for equally likely.

Justin

By: Anonymous

Anonymous — Thu, 25 Apr 2019 09:41:10 +0000

In reply to Justin Smith.

Justin,

You’re a frequentist fanatic, so this will likely make no impact, but here goes:

1) it is a small sample setting, I don’t believe any approach is good

The Bayesian approach as given in detail in the paper works perfect.

2) how does the Bayesian interval vary with different Bayesian priors?

That’s easy enough for you to calculate. Are you insinuating different answers for different assumptions is a fatal flaw of Bayes?

3) frequentism is not tied to always using one rule like Bayesian is tied to Bayes rule, so why is the frequentist method only allowed one method? I believe order statistics (the minimum) and/or bootstrapping could give a confidence interval that makes more sense.

Bayes isn’t tied to one rule. Bayes theorem is one theorem derivable from the basic sum/product rules of probability when applied to any statement (not just repeatable “frequentist” ones). There are infinity many other theorems implies by the sum/product rules.

In Frequentism, however, the following scenario gets repeated ad-nausem: a flaw is found (every time without exception) and the frequentist uses their intuition to just barely ad-hoc change things to avoid the flaw. Each such change brings their methods closer to Bayesian ones. They stoutly refuse to recognize this and or do a complete analysis showing when all flaws are removed you get Bayes. It’s a cheap way of insulating Frequentism from every having to admit they were wrong.

4) you have to realize that allowing ‘the data to talk’ sometimes but rarely yields incongruent silly answers in frequentism, much like ‘allowing subjective beliefs to enter’ gives silly answers dictating parameter values (proof of god existing, search for MH370 being so far off track) and Drake-like equation-ness in posterior distributions

Frequentist methods don’t “rarely” lead to silly answer in practice. They usually do. I never said anything about subjective beliefs. Laplace’s definition of probability was “cases favorable divided by all cases”. Note, this is not the “frequency of occurrence” but rather a simple counting of possibilities. This definition is not a frequentist definition, but actually far more general and makes sense in singular case where no repetition is possible. It has nothing to do with “subjective beliefs”.

If frequentism is flawed, why do Bayesians (and everyone) use histograms?

Because frequencies are just functions of data. Bayesians are allowed to use frequencies or any other function of data they want. The difference between Bayes and Frequentist isn’t that one uses frequencies and the other doesn’t, rather, the different is one claims probabilities are fundamentally frequencies while the other does not.

For Bayesians, Frequencies are physical facts, like “temperature”, which are measured or estimated. Probabilities are used to describe our uncertainty about physical facts. One practical difference is that frequenices don’t change when our state of knowledge changes, but probabilities do.

Why does the Strong Law of Large Numbers simply work?

It doesn’t far more than people think. But when it does work there’s a simple Bayesian explanation (it was after all original proved by one of the Bernoulli’s thinking along Bayesian lines) that can be most succinctly state as: “whenever almost every possibility leads to A, then that’s grounds for thinking A will be seen in practice and it happens to be true a fair amount”.

Why do likelihoods tend to swamp priors?

Because if A and B are inputs to a problem but A carries more weight for the question being answer (for whatever reason) it tends to “swamp” B.

Why does MCMC rely on frequentist notions of sampling and convergence?

It doesn’t although loose language often suggest that.

Why does the CLT exist?

It’s one example of a vastly general phenomenon usually associated with Bayesians. Any time a “process” moves a distribution to a higher entropy distribution while maintaining a given set of constraints, then in the limit you reach the maximum entropy distribution subject to those constraints. “Distribution” here could legitimately refer to probability or frequency distributions even thought they’re two very different kinds of things.

Why is there success of survey sampling, experimental design, and quality control?

Part of the answer is that for simple case (the ones Frequentists first tested their methods on) bayes and frequentists largely agree. But there’s also a deeper issue going on.

This is hard one to convey in a short space. But suppose you assume all outcomes of 100 coin flips are equally likely, then you’ll predict roughly 50 heads and 50 tails in 100 flips. From this Frequentist proclaim this “proves’ or “justifies” the equally likely assumption. NOT TRUE!!! Far different assumptions, many of which are violently different, also imply you’ll get roughly 50 heads and 50 tails.

Why? because almost any possible outcome no matter what the physical cause or propensity will lead to a roughly 50/50 split.

So there’s two ways to interpret this. The Frequentist one is that is there’s something in the universe called “randomness” and coin flips have it. The Bayesian one is the conclusion is incredibly insensitive to the details of what’s actually happening physically and that’s why you tend to see 50/50 splits in practice.

In other words, the Frequentist thinks they’re assuming frequencies, and when they make a good prediction they think this frequentist view is bolstered, but what they’re actually doing is showing the vast majority of possibilities lead to the same outcome, and a good prediction merely proves the observed outcome was one of those “vast majority” of cases.

In other words, the great mistake of Frequenitists is they think they making *necessary* assumptions, but they’re actually making *sufficient* assumptions that are incredibly far from being necessary.

By: Corey

Corey — Thu, 25 Apr 2019 03:30:19 +0000

In reply to Justin Smith. You have to keep in mind what I'm attempting to illustrate here. Let me quote from the OP:

“If I ask my frequentist statistician for a 95%-confidence interval, I can be 95% sure that the true value will be in the interval she just gave me.” Not quite true. Yes, true on average, but not necessarily true in any individual case. Some intervals are clearly wrong.

I believe that when AG wrote that he was thinking about prior knowledge being the source the information that lets us say that a realized interval was clearly wrong. I wanted to go further and show that the quoted text is true even without bringing prior information into the picture. Let's take your points in turn. To your first point: usually when people talk about intervals not being great in small sample size situations they're talking about being unhappy with how wide the intervals are; this is a very different concern from the one I'm presenting. To your second and third points: in no way am I claiming this is the only acceptable frequentist approach to the problem. I'm simply saying that *confidence coverage* alone isn't enough, and there must be something more that makes our usual confidence procedures work. (And it turns out that in this case we can draw on this "something more" to improve on our inference -- and that's all to the good.) If you don't believe me, ask Mayo; she'll tell you that in problems of this sort, good long-run properties of a method of inference are merely necessary and not sufficient for severe testing in the case at hand. So don't mistake this for an argument for Bayes -- it's about reference sets and recognizable subsets, issues that were of deep concern to Fisher, as described in this paper on the topic of reconciling pre-data and post-data properties of interval procedures. To your fourth point: there is one thing we can say about any interval resulting from any other Bayesian prior -- it won't include values above the sample minimum, since the likelihood is zero there.

By: Justin Smith

Justin Smith — Thu, 25 Apr 2019 02:46:41 +0000

In reply to Dale Lehman.

Also good to ask why didn’t ACS use a subjective, or any, Bayesian interval here.
Could it be the likelihood swamped the prior?
Why not allow the person in the zip code to use their own prior? Does it matter what a person believes the average household income should be?

Justin

By: Justin Smith

Justin Smith — Thu, 25 Apr 2019 02:42:08 +0000

In reply to Corey Yanofsky.

The truncated exponential is a good example. But:

First, it is a small sample size situation. I do not believe any interval from any school will be great.

Second, the example disallows the frequentist to use ANY other frequentist method, such as bootstrapping.

Third, it disallows the frequentist to use scientific knowledge that theta < min(X_i)

Fourth, it doesn't say what the interval will be like when any other Bayesian prior is used.

So sure, anything can be viewed as a serious problem when you do 1-4,

Justin

By: Justin Smith

Justin Smith — Thu, 25 Apr 2019 02:29:40 +0000

In reply to Anonymous.

Let’s look at the truncated failure times example.

1) it is a small sample setting, I don’t believe any approach is good
2) how does the Bayesian interval vary with different Bayesian priors?
3) frequentism is not tied to always using one rule like Bayesian is tied to Bayes rule, so why is the frequentist method only allowed one method? I believe order statistics (the minimum) and/or bootstrapping could give a confidence interval that makes more sense.
4) you have to realize that allowing ‘the data to talk’ sometimes but rarely yields incongruent silly answers in frequentism, much like ‘allowing subjective beliefs to enter’ gives silly answers dictating parameter values (proof of god existing, search for MH370 being so far off track) and Drake-like equation-ness in posterior distributions

If frequentism is flawed, why do Bayesians (and everyone) use histograms? Why does the Strong Law of Large Numbers simply work? Why do likelihoods tend to swamp priors? Why does MCMC rely on frequentist notions of sampling and convergence? Why does the CLT exist? Why is there success of survey sampling, experimental design, and quality control? Just a few questions for now,

Justin

By: Justin Smith

Justin Smith — Thu, 25 Apr 2019 02:01:19 +0000

In reply to Anonymous.

“It’s been known for 60-70 years now that it’s possible to get confidence intervals in real problems garunteed to not contain the true parameter, and moreover, this guarantee is provable from the same assumptions used to create the CI. ”

CIs can go crazy with Bayesian approaches too, just create a prior that is funky (technical term).

Justin

By: Justin Smith

Justin Smith — Thu, 25 Apr 2019 01:59:46 +0000

In reply to Ben Prytherch.

“If you show “we are 95% sure that…” to someone who doesn’t already understand this, they will naturally take it to mean “there is a 95% chance that the value of interest is in the interval”.”

True, but most people can follow:
1) we have data from one experiment
2) we can repeat the experiment to get more data, afterall we already did it one time

so they understand quite naturally the fact that the intervals are expected to change.

Justin

By: Anoneuoid

Anoneuoid — Wed, 24 Apr 2019 22:31:07 +0000

In reply to Anoneuoid.

RE: advanced applications. I meant building finite mixture models in which there are multiple models competing for posterior mass through a finite simplex of mixing parameters. I think there are many more people doing single model estimation than there are doing model comparison across multiple models.

True. I wonder if that will change as cpu cycles get cheaper.

By: Daniel Lakeland

Daniel Lakeland — Wed, 24 Apr 2019 22:24:02 +0000

In reply to Anoneuoid. RE: advanced applications. I meant building finite mixture models in which there are multiple models competing for posterior mass through a finite simplex of mixing parameters. I think there are many more people doing single model estimation than there are doing model comparison across multiple models.

By: Mark Schaffer

Mark Schaffer — Wed, 24 Apr 2019 22:23:46 +0000

In reply to george. This looks great. Thanks for the tip!

By: Anoneuoid

Anoneuoid — Wed, 24 Apr 2019 22:10:28 +0000

In reply to Anoneuoid.

I don’t think this is entirely forgotten, but it’s definitely a kind of advanced area of application and can be overlooked rather easily even by experts, it’s a good area to look for bugs.

I think calling concern about p(data) "advanced" is somewhat misleading. I wouldn't consider myself to have an "advanced" understanding of MCMC (although I have written my own gibbs samplers from scratch, etc). I'd say it is "atypical", because the main way to apply Bayes' rule has been parameter estimation via MCMC. In that case p(data) usually is not a primary concern (if at all).

By: Daniel Lakeland

Daniel Lakeland — Wed, 24 Apr 2019 19:01:07 +0000

In reply to Anoneuoid. Anoneuoid: Yes, if you are doing a mixture model something like [p(Data|Params1,model1) p(Params1|model1) p(model1) + p(Data|Params2,model2) p(Params2|model2) p(model2) ]/Z Where Z is the normalization constant, then typically you define p(model1) and p(model2) as parameters say p[1] and p[2], and you put a prior on them, and you enforce that they add to 1 (say using a simplex in Stan and providing say a dirichlet prior on the simplex) In this situation you absolutely need that each of the sub-models uses a *normalized* representation of the density. This is a subtle point that has bitten me a few times, in Stan it means you can't use ~ statements, and need to use things like normal_pdf to compute the normalized version of the conditional pdf. I don't think this is entirely forgotten, but it's definitely a kind of advanced area of application and can be overlooked rather easily even by experts, it's a good area to look for bugs.

By: Anoneuoid

Anoneuoid — Wed, 24 Apr 2019 18:47:13 +0000

In reply to Anoneuoid.

I don’t think the marginal distribution, p(data) = int p(data|parameters) d(parameters), is always so relevant, as it can involves integrating over all sorts of things that we don’t care much about. To put it another way: from my perspective, p(data|parameters) is fundamental, it’s a key part of statistical modeling, whereas p(data) is less clearly defined.

The likelihoods p(data|params) are used to calculate p(data) along with the priors p(params), so of course it is more fundamental. But MCMC procedures can be understood to work by approximating p(data). Each new step gives us another "unnormalized posterior"[1] term that collectively sum/integrate to approximately p(data), then if it is accepted we store the associated parameter values. By using a markvov chain hopefully we are not wasting effort calculating many negligible terms of p(data). If you only care about estimating parameters of the model you won't care directly about p(data) and you will probably throw away all the unnormalized posteriors used to get you there. But if you care about the probability a models (+ associated parameter set)is correct, then you do want to normalize all these unnormalized posteriors to their sum to get the actual posterior probabilities.[2] [1] I guess people refer to this term as the "unnormalized posterior": p(data|params)*p(params) [2] I think of "real" distributions as always discrete due to limits on measurement precision, so if you are sampling from a continuous approximation then I guess similar models (+ parameters) should be aggregated for this. "Similar" means there is no practical difference between the parameter values, etc. Actually, when p(data) is estimated this way it could be a good measure of convergence to check that its rate of growth approaches zero. Is that a thing?

By: george

george — Wed, 24 Apr 2019 18:42:51 +0000

In reply to Mark Schaffer. Mark, you might be interested in pg 116 of this very elementary book. It lays out a version of the ring toss where the stake (target) is not known to the person doing inference.