# The estimated effect size is implausibly large. Under what models is this a piece of evidence that the true effect is small?

Paul Pudaite writes in response to my discussion with Bartels regarding effect sizes and measurement error models:

You [Gelman] wrote: “I actually think there will be some (non-Gaussian) models for which, as y gets larger, E(x|y) can actually go back toward zero.”

I [Pudaite] encountered this phenomenon some time in the ’90s. See this graph which shows the conditional expectation of X given Z, when Z = X + Y and the probability density functions of X and Y are, respectively, exp(-x^2) and 1/(y^2+1) (times appropriate constants). As the magnitude of Z increases, E[X|Z] shrinks to zero.

I wasn’t sure it was worth the effort to try to publish a two paragraph paper.

I suspect that this is true whenever the tail of one distribution is ‘sufficiently heavy’ with respect to the tail of the other. Hmm, I suppose there might be enough substance in a paper that attempted to characterize this outcome for, say, unimodal symmetric distributions.

Maybe someone can do this? I think it’s an important problem. Perhaps some relevance to the Jeffreys-Lindley paradox also.

## 8 thoughts on “The estimated effect size is implausibly large. Under what models is this a piece of evidence that the true effect is small?”

1. It seems like it suffices for (1) the heavy tailed distribution to be
unimodal and asymptotially decay slower than exponentially and (2) the
other distribution to be symmetric and decay at least exponentially.

Suppose Z = X + Y, where Y is the heavy tailed distribution. Then

p(X=t | Z = z) is proportional to p(X = t)p(Y = z – t)

Since Y is much heavier tailed, for large z we can neglect the
possibility that t > z/2. Then, since Y decays slower than
exponentially asymptotically, we have

p(Y = z – t) / p(y = z) 0)
p(Y = z – t) / p(y = z) > e^{eps * t} (for t 0, causing
p(X | Z = z) to tend toward p(X). Let
f(t) = p(Y = z – t) / p(y = z)
so
p(X = t | Z = z) = C p(X = t) f(t)

for some normalization factor C. We have C = 1 for
t > 0 by unimodality and P(x > 0) = 1/2 by symmetry. Hence

E[X | Z = z] = int(t C p(X = t) f(t))
< int(t 2 p(X = t) e^{eps * t})

because f(t) is less skewed to the right than e^{eps * t}. Since
E[X] = 0, we can subtract 2E[X] and get

E[X | Z = z] < 2 int(t p(X = t) (e^{eps * t} – 1))
< 2 int(|t p(X = t) (eps * t * e^{eps * |t|})|)
< 2 eps int( t^2 p(X=t) e^{eps * |t|})
= eps * O(1)
since p(X) decays exponentially and eps is arbitrarily small.

I think it is necessary that Y decay slower than exponentially.
Otherwise, the bias in X will tend to (at least) a constant rather
than zero as Z becomes large. The other conditions seem less
necessary: if Y decays really slowly, X can decay slower than an
exponential but still much faster than Y. The requirements that X be
symmetric and Y be unimodal seem like stupid technicalities, but I'm
not sure how else to deal with the normalization factor.

2. Yeah, nice spam you've attracted.

I'm curious: did you take down that post about pop-micro-economists (which showed up in my RSS reader) or did it get lost in some website-redesign?

3. The pop-micro-economist stuff showed up in my RSS reader as well. I'm as curiuous as Frank…

4. It is very nice. In this paper http://arxiv.org/abs/1107.1811 we find a new family of heavy tailed priors (even more heavy than the Student-t priors) that are given as the convolution of a Student-t density for the location parameter and an Inverted Beta prior for the square scale. I think that these priors could be useful for the distribution of Y rather than 1/(y^2+1).

5. I'm not a statistician, but as a psychologist, the idea that the posterior could go to 0 as the observed estimate increases seems perfectly sensible to me. In a sense this seems like a second-order question where one is asking not so much "how big is the 'true' effect" but "what is the true source of the effect". Larry Bartel's point in the earlier point seems to be that if you assume that systematic error and true effect influence the observed effect additively, any increase in the observed effect should imply at least some increase in the posterior estimate of the causal effect. But this assumes that the causal effect and systematic error are independent. I think there are good reasons to posit that they're inversely related. The simplest one is that researchers stop looking for effects once they find them. If the causal effect was a reasonable size (but not implausibly large!), there would be no need for the authors to keep fishing–it would pop out right away. Conversely, if the causal effect is very small or nonexistent, the opportunity for systematic error to creep in increases in proportion to the amount of time spent looking for an effect. And of course, there's no check on the size of a systematic error. So I think if you just posit that you have two negatively correlated distributions, and the observed effect is some combination of the two, demonstrating that systematic error is contributing heavily to the observed effect does indeed seem like evidence that the true effect might be small…

6. Interesting example indeed!!! I am not certain this relates to the Jeffreys-Lindley paradox, though… The fact that E[X|Z] shrinks to zero when Z increases is a reflection on the prior on X with quickly decreasing tails. Jeffreys-Lindley on the opposite sucks prior mass around zero and thus eventually removes any plausible value for X. This rather reminds me of Example 4.1.2 in The Bayesian Choice where a Laplace prior is used instead of the normal prior and where the MAP (maximum a posteriori) of X is always zero.

7. I am not sure if this is exactly what you had in mind, but say you are testing something where you expect a small positive effect and your prior is a mixture of a model that permits a small effect mixed with a set of models that permit a large effect but caused by something spurious or irrelevant.  Then a large observed effect will essentially zero out the probability of the model with the small effect, raise the probabilities of the models that produce large effects for spurious reasons.  Hence:  Larger signal, smaller inferred effect.  I am sure a realistic example could be produced in no time.  But maybe this doesn't fall into the category you are imagining.

More concrete:  Someone says he can identify the face value of a face-down playing card by hefting it in his hand "different amounts of ink lead to different weights!" he says.  You are suspicious.  He then correctly identifies 20 face-down cards in a row.  Conclusion:  There is no ink-weight effect, the guy is just a charlatan, who has marked the cards or didn't shuffle properly.  If he had correctly done just a few of the 20, you would be more likely to conclude there is an effect.  Apologies to Jaynes, who (I imagine from reading him) loved examples like this.