Which brings up the point that the primary purpose of statistics is to explore data. You give a good example here of how and why that would be done. The problems with p-values arise when people take it to mean that the job of statistics is to prove (or disprove).

]]>Thanks for the link.

]]>Clarification of possible nesting confusion: The +1 was to Daniel Lakeland’s comment.

]]>+1

]]>+1 to Corey’s September 8, 2015 at 9:43 pm comment.

]]>Anoneuoid, time-ordering and correlation aren’t enough: you can easily have a cause with two effects, one of which occurs before the other, but with no direct causal relation between them. The two effects will be highly correlated, and yet intervening on the earlier effect will fail to, um, affect the later effect.

For example, a person’s general conscientiousness about health may prompt them to seek out a flu vaccine during flu season and also prompt them to wash their hands often. If so, a prospective study with a naïve estimate for the effectiveness of the vaccine would tend over-estimate its effectiveness relative to a randomized study.

]]>Anon:

I don’t have the energy to explain this now, but suffice it to say you are reconstructing a couple of centuries of thought on this topic. To help you along, I recommend you take a look at the Imbens and Rubin book or at this short article I wrote with Imbens.

]]>Flipping the switch is not sufficient either (eg power is out), that is a distal cause, just less distal than walking into the room. Why not say electricity flowing through the bulb causes the light to turn on? It quickly becomes a game of every preceding event is a cause of every later event. What we can observe is the correlation and time elapsed between various events. These observables are deduced from an a priori idea of “cause” that we feel must exist.

Perhaps this is a variation on the old modus tollens vs affirming the consequent issue. You cannot empirically establish that something is a cause, but it can be ruled out (or at least say “Either it is not a cause or some assumption is wrong”).

]]>Yes, if A causes B then A precedes B and is correlated with B (at least, in “distance correlation” thanks to Corey for that reference a while back).

But it is definitely possible to observe that although in many cases A precedes B and is correlated with it, it is not causal.

For example: Anne walks into a room, reaches out her hand, and the lights come on. Walking into the room precedes the lights coming on, and is highly correlated with it, but if the lights are controlled by a switch and not a motion detector, then walking into the room does not cause the lights to turn on. Another person who walks into the room and waves hands (assuming a motion sensor) will find that sure enough, it’s not sufficient to cause the lights to come on, you have to reach out and flip the switch.

the vast majority of good experimental science is about ruling out possible causes until the one that can’t be ruled out anymore is assumed true.

What you CAN do is create situations which can be explained by other methods only in the most baroque of ways (the motion sensor that detects the motion that causes the light to turn on is only sensitive to motion in the vicinity of the switch itself, so it’s really the hypothesized motion sensor that causes the lights to come on, but you need to move the switch or something nearby the switch for the sensor to detect it…)

In the context of say medicine, causality is observable and critical, consider the difference between “cholera is spread through the air” vs “cholera is spread through water”:

https://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak

]]>To all,

A better way of putting my thoughts is that the term “cause” is a convenient shorthand for a certain type of relationship. It is used when A precedes B, and A is also highly correlated with B. Correlations are measurable, but I don’t see how you can measure, or even observe, cause.

]]>Philip:

That error was introduced by the copy editors. But, sure, maybe it was on purpose, I have no idea.

]]>@Anoneuoid

“What difference is there between knowing that flipping the switch is highly correlated with lights turning on/off under a wide range of conditions vs. causing it?”

Because if you knew that flipping the switch doesn’t actually *cause* the light to go on, then you would know not to bother pressing it if you want light?

]]>@Anoneuid

The lightbulb (non LED) turning hot is also strongly correlated with the light turning on under a wide variety of conditions.

The causality problem arises if someone tries turning on the bulb by heating it with an air dryer.

]]>Depends whose practice you’re talking about.

Suppose we’re investigating the effectiveness of a flu vaccine. Consider a group of study subjects and two study designs, a prospective study (i.e., study subjects decide for themselves whether to get the vaccine) and a randomized study (for simplicity we’ll imagine that compliance with the assigned treatment status is 100%). Imagine the two counterfactual histories that would result for the two different studies; would you expect the randomized trial would show more, less, or the same effectiveness in preventing flu as the prospective study?

]]>Cory wrote: “Wow. Um, causality is important if you want to take effective action… right?”

I see what you mean, but on reflection I don’t think so. What difference is there between knowing that flipping the switch is highly correlated with lights turning on/off under a wide range of conditions vs. causing it? In practice I don’t think the distinction ever matters.

]]>What was the purported non-informative prior on the log-odds scale? The Haldane prior is flat on the log-odds and shouldn’t give a too-crazy posterior for the proportion — if the posterior manages to be is proper, i.e., if there’s at least one observation of each possible outcome.

]]>Anoneuoid,

Wow. Um, causality is important if you want to take effective action… right?

]]>Keith wrote: “when addressing causal questions”

I’ve begun thinking that causality is a problematic concept, and I’m not even sure there is a need for it. If B very often follows A and rarely occurs in the absence of a preceding A, we can use detection of A to accurately predict B. Is it actually of any relevance whether A *causes* B?

]]>I’d say that’s a different kind of surprise. You may be surprised to learn that decaffeinated coffee does contain caffeine (maybe 5-10% relative to regular coffee). You should not be surprised if decaffeinated coffee fails to keep you awake.

I agree there are hidden dangers in “non-informative” priors. Some thought about the model is required to choose a non-informative prior (for example, we could be using lengths or areas, differences or ratios). In some cases, invariance arguments will help you to choose the right one. In other cases, you will be happy to get one which is reasonably indifferent within some reasonable region. And if you’re doing reparametrizations I agree that the transformations of the probability densities are not obvious and it’s worth checking what does the prior actually look like.

However, being surprised because a “non-informative” is indeed non-informative (it won’t favour the regions that we find more plausible a priori) seems unwarranted.

]]>The beginning was meant to say: “It’s obviously not the issue in this posting, but I’m curious about how, where, to what extent and by whom people are “trained” to think this way.”

]]>It’s obviously not the issue in this posting, but I’m curious about how, where, to what extent people and by whom are “trained” to think this way. Where I am, subjective Bayes certainly isn’t extinct and Bayesians (and others anyway) tend to be sceptical about noninformative priors. At the same time I do realise (as you know) that many Bayesians when publishing are at pains to avoid any impression of subjectivity. If anybody has something to say about a “cultural mapping” of Bayesians, please do!

]]>Tom M

Yes and that is why it is so obvious if its a non-randomized study there will be confounding (when addressing causal questions) and the distribution of the p_values will be very non-uniform given the (unconfounded) effect is null.

> not that you’re talking about something surprising and unexpected

It is (was) surprising even to many practitioners of Bayesian analysis – Peter Thall once in public admitted that he hit his head against the wall for a month trying to figure what was going wrong in one of his sequential trials (he had assumed a non-informative prior on the log-odds scale that was terribly informative on the proportions scale that mattered and had very small data).

Just because something is implied by a model does not mean people are aware of it or find it intuitive.

So something implied cannot be paradoxical but it may not be obvious.

(I always though people should be forced to plot the implied marginal priors but I now realize most would rather be wrong than stoop to such a simple way of detecting problems)

]]>So: if you don’t like to do tests, you need a good class (use a prior); if you don’t like your class, don’t hold them to a high standard (relax your passing mark).

Train of thought fin.

]]>The frequentist fix is to think more carefully about the discrepancies of interest as null hypotheses, which effectively amounts to the same thing. Both regularize by throwing away solutions from unrealistic models that arise from choosing too big a class to begin with.

]]>It’s too easy to overfit with complex models but it’s also too easy to underfit with simple models. NHST tends to fall under the latter category, as it’s often used for comparing strawman (simple) models to noisy (complex) data. Priors are one way to attempt to redress this imbalance.

]]>You are close to the ultimate realization. Just make sure the “null” hypothesis corresponds to the actual scientific hypothesis (ie what is predicted by the theory). Even then, you can always blame bad data or auxiliary assumptions.

It is impossible for science to prove something true, it is impossible for science to prove something false. There will always be other explanations, you can only rule out the conjunction of your theory plus a number of assumptions, the most mundane being “measuring device not malfunctioning”.

]]>Of course. No need to be sorry :). My point was just that the “null hypothesis” is always embedded in a larger model, and it’s the entire model that determines the test statistic and p-value and effect size and whatever; rejecting the model does not necessarily reject the null hypothesis, which is generally just one parameter in the model.

]]>In other words, it is paradoxical if you use the original meaning of the word: “Counter to orthodoxy”.

]]>I’m sorry, but “no omitted variables” is as necessary for effect sizes or any other quantity of interest I can imagine, as it is for p-values

]]>Carlos:

It is paradoxical because we have been trained to think of noninformative priors as safe, and we have been trained to be suspicious of informative priors, and we have been trained to think of Bayesian inference as scarily dependent on assumptions.

]]>If “plausible” means “consistent with prior knowledge” it doesn’t sound very paradoxical. If not, what does “plausible” mean?

]]>“the p value is a function both of the data, and of the choice of test and null hypothesis. You have specified both a test (t-test) and null hypothesis (no difference in means).”

Yes, the same specification is required to compute (or understand) the other statistics. It is still the case that if the p-value contains none of the information in the data, then neither do the other statistics.

Your grep search is about the decision (statistical significance) based on a p-value. I agree that such a decision is not as informative as the p-value itself or the other statistics (but decisions based on other statistics might not be as informative either).

I am not arguing that decisions based on p-values are generally useful (although as Andrew notes, they can be in some situations). I think the conditions necessary for a p-value to be useful are difficult to satisfy; but in terms of information in the data set (at least for a two-sample t-test) it’s not true that a p-value contains no information.

]]>“the p value is a function both of the data, and of the choice of test and null hypothesis.”

Yes, and the “null hypothesis” is the entire null model, not just the null parameter that you’re interested in. I.e., in the two sample t-test, the null model specifies not only d=0, but normal errors, no omitted variables, etc. A small p-value indicates poor fit of data to the model, but not necessarily to the part of the model that encodes the scientific null hypothesis (such as “no mean difference”). I guess this is obvious, and just means that you must assume your model is “correct” in order to make inferences, but in typical practice p-values are related to the scientific null hypothesis without much regard for model checking.

You might argue “well of course” but if they just reported average effect sizes without any pretense of statistics you’d find out a lot more.

]]>ack. blog hates me because I do math not html coding

/p ?[<>] ?[0-9.]+[0-9]+/

]]>the p value is a function both of the data, and of the choice of test and null hypothesis. You have specified both a test (t-test) and null hypothesis (no difference in means).

If you grep through the text of a random journal article for the pattern /p ?[] ?[0-9.]+[0-9]+/

you find out practically nothing.

]]>I don’t see how you can say that the nominal p value contains “practically none of the information in the data”. Take a two-sample t-test. If you know the sample sizes and the p-value, then you can compute: t, d, confidence interval of d, log likelihood ratio, difference in AIC, difference in BIC, and the Bayes Factor (for a prior based on standardized effect sizes), among others. I set up a web site to do the transformations:

http://psych.purdue.edu/~gfrancis/EquivalentStatistics/

In some situations that are common to social science, the p-value contains just as much information about the data set as many alternative statistics. The differences between these statistical inferences is not about what is in the data set, but how it is interpreted. Thus, scientists have to figure what they are interested in and then use the statistic appropriate for that interest. I do agree, though, that there are situations where none of the above statistics reflect what scientists are interested in.

]]>P.s. to the first knucklehead who says those aren’t “real” p-values they’re “nominal” something equivalent: I, like everyone else, has only ever seen “nominal” p-values since they are only ones calculable.

You have to decide what you believe. You can believe either (A) p-values are a function of the data with an extremely weak correlation to the truth (as the evidence of all kinds overwhelmingly suggests), or you can believe (B) for every calculated p-value there is a mystical and magical “real” p-value that’s nearly perfectly correlated with the truth.

]]>A function of the data which necessary reflects some of the information in the data. How much information?

From that replication study: overall *“Ninety-seven percent of original studies had significant results (P < .05). Thirty-six percent of replications had significant results;",* yet for very low p-values they got *“Twenty of the 32 [i.e. sixty three percent] original studies with a P value of less than 0.001 could be replicated”*.

In other words, p-values reflect practically none of the information in the data. The real question though is “what price do we pay for ignoring p-values completely?” The answer is “none”.

]]>This used to be referred to as “The cult of a single study” e.g. now being coined as “the statistical myth of a single study” by Chatfield (1995) and “the cult of a single study” by Longford and Nelder (1999) from https://books.google.ca/books?id=x-fgBwAAQBAJ&pg=PA398&lpg=PA398&dq=%22The+cult+of+a+single+study%22&source=bl&ots=JIuIPIV9-4&sig=J5jWQXU0-6j1lyUP9cylDy4geWE&hl=en&sa=X&ved=0CB0Q6AEwAGoVChMIrZTYxsrdxwIVgQOSCh1mEguL

Also my comment a couple years ago http://statmodeling.stat.columbia.edu/2013/12/17/replication-backlash/#comment-152404

]]>I think attitudes have changed a lot since for instance I wrote Two cheers for Bayes 20 years ago http://www.sciencedirect.com/science/article/pii/S0197245696907539 .

But somehow I think Kadane’s response would be the same ;-)

For instance “I [Kadane] agree with the author [me] that “when a posterior is presented, I believe it should be

clearly and primarily stressed as being a ‘function’ of the prior probabilities and not

the probability ‘of treatment effects’.” So, I think, do most Bayesians [in 1996????].”

But the “something that we probably don’t think about enough in our routine applications of standard statistical methods” did not apply to his strict subjective approach where priors should not be checked nor questioned but simply noted?

]]>In other contexts, a low p-value, especially where there aren’t multiple tests issues but where the result is ex ante unexpected, usually makes me think “you’ve justified another research grant to study this further”. This leaves problems of publication selection against “negative results”.

I’ve said for years that I’d like to start a Journal of Replications and Negative Results, but I’m still not in a professional place to do that. Feel free to steal the idea.

]]>+1 The whole point of a prior is to regularize – ie remove bad/pathological solutions from consideration. This is actually a conservative procedure.

]]>