On the other hand, ever since studying German in high school I’m prone to Capitalizing All The Things

]]>How important are 4 and 5 outside of a specific analytical framework whether Bayesian or NHST?

]]>Reminds me of the guy in Airport who filed away all the Capital letters on his typewriter because he thought they represented authority or some such. :)

]]>Apostrophe’s are for sissie’s and blaggard’s ????

]]>> Andrew needs a constancy condition

That is what additivity means – as in treatment effects are constant over different groups.

Interesting how muddled thinking does seem to be on this issue.

Replacing additivity with commonness and treatment effects with parameters may make it clear.

The reality the wiki graph is depicting is that of two groups with differing intercepts but identical slopes.

To represent that reality adequately the statistical needs to have common parameters for what is common in reality and different parameters for what is different – get any of the those wrong and you have a additivity failure.

The wiki graph is a nice illustration of this.

This is how I would have extracted and code the data

y=c(6,7,8,9,1,2,3,4)

x=c(2,3,4,5,8,9,10,11)

g=c(0,0,0,0,1,1,1,1)

1. A model with common intercept and common slope is wrong.

Coefficients:

(Intercept) x

8.9634 -0.6098

2. A model with different intercepts for groups and a common slope is adequate.

Coefficients:

factor(g)0 factor(g)1 x

4 -7 1

3. A model with different intercepts and slopes for groups is wrong.

Coefficients:

factor(g)0 factor(g)1 x factor(g)1:x

4.000e+00 -7.000e+00 1.000e+00 1.652e-16

Not much of a penalty here – but its not a real example (factor(g)1:x 1.652e-16 se=5.509e-17 t=2.999e+00 p=0.04 * )

In philosophy speak “Awareness of commonness can lead to an increase in evidence regarding the target (2); disregarding commonness wastes evidence(3); and mistaken acceptance of commonness destroys otherwise available evidence(1).”

These considerations of what to take a common and what different is everywhere in applied statistics. Years ago I had tried to get this across here – http://statmodeling.stat.columbia.edu/2011/05/14/missed_friday_t/ – but my mistake likely was getting bogged down in explaining likelihood mechanics and loosing most readers.

Simple examples like this wiki one are probability much better way to present the challenges and opportunities of commonness.

]]>Tractable Bayesian variable selection: beyond normality

http://arxiv.org/abs/1609.01708

@Daniel

Hey, you’ve really got an apostrophe problem lately — first, you left out the one in “urn’t”, now you’ve got “you’re” instead of “your”. ;~)

The example in wikipedia (which, for those two lazy to click on the link above) can be “correctly” modeled as

y = a + x1b1 + x2b2 + u

where the data is of the form

y x1 x2 a

6 2 0 4

7 3 0 4

8 4 0 4

9 5 0 4

1 0 8 -7

2 0 9 -7

3 0 10 -7

4 0 11 -7

where b1 = b2 = 1 for all cases (the error term u is identically zero). Modeling

y = a + xb + u

gives the following:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 8.9634 1.7746 5.051 0.00233 **

x -0.6098 0.2449 -2.490 0.04718 *

—

Multiple R-squared: 0.5081, Adjusted R-squared: 0.4262

F-statistic: 6.198 on 1 and 6 DF, p-value: 0.04718

That is, the slope of the regression is negative when the two groups are collapsed together but when modeled separately each has the same positive coefficient 1. One can eyeball the graph in wikipedia (don’t know how to attach it to this comment–is there anyway to do that, or do you want to suppress ability due to the ability to massively spam?) and the intuitive answer is the model y = a + x1b1 + x2b2, but given the low n it’s unlikely any test of misspecification will pick it up. Note number 2 in the list applies since

xb = x1b1 + x2b2

and so the model y = a + xb + u is incorrectly specified as b = (x1b1 + x2b2)/x, which means b is non-constant. So somewhere Andrew needs a constancy condition on the b, which I don’t see in the list (maybe it is implied in 2). But Bayesians can let b vary–are there any examples of Bayesian regression where the b is non-constant (and correlated with the x for that matter)? If so, what are the convergence properties of such a model (poor, I would guess). And if there is such a model, it should work on the wikipedia case as that is the simplest case.

]]>Also, with Simpsons’ Paradox, it can occur when Linearity is in fact a great model, it’s just that you don’t have the right variables.

]]>This is my read on it too. When you’re model is inappropriate and you move to an appropriate model, it can massively change your interpretation of the situation, and that’s more or less the essence of Simpson’s Paradox.

]]>To me, #1 is more about “measurement” in the broadest sense: Does the thing I’m measuring actually map to the relationship in the world I am claiming to be investigating.

#2, though, is about the statistical model: Does my regression model (not substantive theoretical model) reflect the world sufficiently such that I can properly interpret the results, or does it accidentally average over/away something important because it is mis-specified.

So I think of Simpon’s Paradox in relation to #2. Whether the model is implicit or explicit, it is mis-specified such that the effect in Group A is assumed to be equal (a kind of linear additivity) to the effect in Group B. That isn’t about measurement, it is about modeling.

]]>Actually, I see Simpson’s Paradox as fitting under #1. Simpson’s Paradox is about missing (or, less commonly, inappropriately included) variables in the model. The data don’t map to the research question. You can still have a Simpson’s paradox with analyses that don’t rely on (or even use) additivity or linearity.

]]>#2, right?

]]>Both… it’s inconveniently hard.

]]>This symposium appears to be in the wheelhouse of this blog.

]]>