Which brings up the point that the primary purpose of statistics is to explore data. You give a good example here of how and why that would be done. The problems with p-values arise when people take it to mean that the job of statistics is to prove (or disprove).

]]>For example, a person’s general conscientiousness about health may prompt them to seek out a flu vaccine during flu season and also prompt them to wash their hands often. If so, a prospective study with a naïve estimate for the effectiveness of the vaccine would tend over-estimate its effectiveness relative to a randomized study.

]]>I don’t have the energy to explain this now, but suffice it to say you are reconstructing a couple of centuries of thought on this topic. To help you along, I recommend you take a look at the Imbens and Rubin book or at this short article I wrote with Imbens.

]]>Perhaps this is a variation on the old modus tollens vs affirming the consequent issue. You cannot empirically establish that something is a cause, but it can be ruled out (or at least say “Either it is not a cause or some assumption is wrong”).

]]>Yes, if A causes B then A precedes B and is correlated with B (at least, in “distance correlation” thanks to Corey for that reference a while back).

But it is definitely possible to observe that although in many cases A precedes B and is correlated with it, it is not causal.

For example: Anne walks into a room, reaches out her hand, and the lights come on. Walking into the room precedes the lights coming on, and is highly correlated with it, but if the lights are controlled by a switch and not a motion detector, then walking into the room does not cause the lights to turn on. Another person who walks into the room and waves hands (assuming a motion sensor) will find that sure enough, it’s not sufficient to cause the lights to come on, you have to reach out and flip the switch.

the vast majority of good experimental science is about ruling out possible causes until the one that can’t be ruled out anymore is assumed true.

What you CAN do is create situations which can be explained by other methods only in the most baroque of ways (the motion sensor that detects the motion that causes the light to turn on is only sensitive to motion in the vicinity of the switch itself, so it’s really the hypothesized motion sensor that causes the lights to come on, but you need to move the switch or something nearby the switch for the sensor to detect it…)

In the context of say medicine, causality is observable and critical, consider the difference between “cholera is spread through the air” vs “cholera is spread through water”:

https://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak

]]>A better way of putting my thoughts is that the term “cause” is a convenient shorthand for a certain type of relationship. It is used when A precedes B, and A is also highly correlated with B. Correlations are measurable, but I don’t see how you can measure, or even observe, cause.

]]>That error was introduced by the copy editors. But, sure, maybe it was on purpose, I have no idea.

]]>“What difference is there between knowing that flipping the switch is highly correlated with lights turning on/off under a wide range of conditions vs. causing it?”

Because if you knew that flipping the switch doesn’t actually *cause* the light to go on, then you would know not to bother pressing it if you want light?

]]>The lightbulb (non LED) turning hot is also strongly correlated with the light turning on under a wide variety of conditions.

The causality problem arises if someone tries turning on the bulb by heating it with an air dryer.

]]>Suppose we’re investigating the effectiveness of a flu vaccine. Consider a group of study subjects and two study designs, a prospective study (i.e., study subjects decide for themselves whether to get the vaccine) and a randomized study (for simplicity we’ll imagine that compliance with the assigned treatment status is 100%). Imagine the two counterfactual histories that would result for the two different studies; would you expect the randomized trial would show more, less, or the same effectiveness in preventing flu as the prospective study?

]]>I see what you mean, but on reflection I don’t think so. What difference is there between knowing that flipping the switch is highly correlated with lights turning on/off under a wide range of conditions vs. causing it? In practice I don’t think the distinction ever matters.

]]>Wow. Um, causality is important if you want to take effective action… right?

]]>I’ve begun thinking that causality is a problematic concept, and I’m not even sure there is a need for it. If B very often follows A and rarely occurs in the absence of a preceding A, we can use detection of A to accurately predict B. Is it actually of any relevance whether A *causes* B?

]]>I agree there are hidden dangers in “non-informative” priors. Some thought about the model is required to choose a non-informative prior (for example, we could be using lengths or areas, differences or ratios). In some cases, invariance arguments will help you to choose the right one. In other cases, you will be happy to get one which is reasonably indifferent within some reasonable region. And if you’re doing reparametrizations I agree that the transformations of the probability densities are not obvious and it’s worth checking what does the prior actually look like.

However, being surprised because a “non-informative” is indeed non-informative (it won’t favour the regions that we find more plausible a priori) seems unwarranted.

]]>Yes and that is why it is so obvious if its a non-randomized study there will be confounding (when addressing causal questions) and the distribution of the p_values will be very non-uniform given the (unconfounded) effect is null. ]]>

It is (was) surprising even to many practitioners of Bayesian analysis – Peter Thall once in public admitted that he hit his head against the wall for a month trying to figure what was going wrong in one of his sequential trials (he had assumed a non-informative prior on the log-odds scale that was terribly informative on the proportions scale that mattered and had very small data).

Just because something is implied by a model does not mean people are aware of it or find it intuitive.

So something implied cannot be paradoxical but it may not be obvious.

(I always though people should be forced to plot the implied marginal priors but I now realize most would rather be wrong than stoop to such a simple way of detecting problems)

]]>Train of thought fin.

]]>It is impossible for science to prove something true, it is impossible for science to prove something false. There will always be other explanations, you can only rule out the conjunction of your theory plus a number of assumptions, the most mundane being “measuring device not malfunctioning”.

]]>It is paradoxical because we have been trained to think of noninformative priors as safe, and we have been trained to be suspicious of informative priors, and we have been trained to think of Bayesian inference as scarily dependent on assumptions.

]]>If “plausible” means “consistent with prior knowledge” it doesn’t sound very paradoxical. If not, what does “plausible” mean?

]]>Yes, the same specification is required to compute (or understand) the other statistics. It is still the case that if the p-value contains none of the information in the data, then neither do the other statistics.

Your grep search is about the decision (statistical significance) based on a p-value. I agree that such a decision is not as informative as the p-value itself or the other statistics (but decisions based on other statistics might not be as informative either).

I am not arguing that decisions based on p-values are generally useful (although as Andrew notes, they can be in some situations). I think the conditions necessary for a p-value to be useful are difficult to satisfy; but in terms of information in the data set (at least for a two-sample t-test) it’s not true that a p-value contains no information.

]]>Yes, and the “null hypothesis” is the entire null model, not just the null parameter that you’re interested in. I.e., in the two sample t-test, the null model specifies not only d=0, but normal errors, no omitted variables, etc. A small p-value indicates poor fit of data to the model, but not necessarily to the part of the model that encodes the scientific null hypothesis (such as “no mean difference”). I guess this is obvious, and just means that you must assume your model is “correct” in order to make inferences, but in typical practice p-values are related to the scientific null hypothesis without much regard for model checking. ]]>

/p ?[<>] ?[0-9.]+[0-9]+/

]]>If you grep through the text of a random journal article for the pattern /p ?[] ?[0-9.]+[0-9]+/

you find out practically nothing.

]]>http://psych.purdue.edu/~gfrancis/EquivalentStatistics/

In some situations that are common to social science, the p-value contains just as much information about the data set as many alternative statistics. The differences between these statistical inferences is not about what is in the data set, but how it is interpreted. Thus, scientists have to figure what they are interested in and then use the statistic appropriate for that interest. I do agree, though, that there are situations where none of the above statistics reflect what scientists are interested in.

]]>You have to decide what you believe. You can believe either (A) p-values are a function of the data with an extremely weak correlation to the truth (as the evidence of all kinds overwhelmingly suggests), or you can believe (B) for every calculated p-value there is a mystical and magical “real” p-value that’s nearly perfectly correlated with the truth.

]]>A function of the data which necessary reflects some of the information in the data. How much information?

From that replication study: overall *“Ninety-seven percent of original studies had significant results (P < .05). Thirty-six percent of replications had significant results;",* yet for very low p-values they got *“Twenty of the 32 [i.e. sixty three percent] original studies with a P value of less than 0.001 could be replicated”*.

In other words, p-values reflect practically none of the information in the data. The real question though is “what price do we pay for ignoring p-values completely?” The answer is “none”.

]]>Also my comment a couple years ago http://statmodeling.stat.columbia.edu/2013/12/17/replication-backlash/#comment-152404

]]>I think attitudes have changed a lot since for instance I wrote Two cheers for Bayes 20 years ago http://www.sciencedirect.com/science/article/pii/S0197245696907539 .

But somehow I think Kadane’s response would be the same ;-)

For instance “I [Kadane] agree with the author [me] that “when a posterior is presented, I believe it should be

clearly and primarily stressed as being a ‘function’ of the prior probabilities and not

the probability ‘of treatment effects’.” So, I think, do most Bayesians [in 1996????].”

But the “something that we probably don’t think about enough in our routine applications of standard statistical methods” did not apply to his strict subjective approach where priors should not be checked nor questioned but simply noted?

]]>In other contexts, a low p-value, especially where there aren’t multiple tests issues but where the result is ex ante unexpected, usually makes me think “you’ve justified another research grant to study this further”. This leaves problems of publication selection against “negative results”.

I’ve said for years that I’d like to start a Journal of Replications and Negative Results, but I’m still not in a professional place to do that. Feel free to steal the idea.

]]>+1 The whole point of a prior is to regularize – ie remove bad/pathological solutions from consideration. This is actually a conservative procedure.

]]>