How important are 4 and 5 outside of a specific analytical framework whether Bayesian or NHST?

]]>That is what additivity means – as in treatment effects are constant over different groups.

Interesting how muddled thinking does seem to be on this issue.

Replacing additivity with commonness and treatment effects with parameters may make it clear.

The reality the wiki graph is depicting is that of two groups with differing intercepts but identical slopes.

To represent that reality adequately the statistical needs to have common parameters for what is common in reality and different parameters for what is different – get any of the those wrong and you have a additivity failure.

The wiki graph is a nice illustration of this.

This is how I would have extracted and code the data

y=c(6,7,8,9,1,2,3,4)

x=c(2,3,4,5,8,9,10,11)

g=c(0,0,0,0,1,1,1,1)

1. A model with common intercept and common slope is wrong.

Coefficients:

(Intercept) x

8.9634 -0.6098

2. A model with different intercepts for groups and a common slope is adequate.

Coefficients:

factor(g)0 factor(g)1 x

4 -7 1

3. A model with different intercepts and slopes for groups is wrong.

Coefficients:

factor(g)0 factor(g)1 x factor(g)1:x

4.000e+00 -7.000e+00 1.000e+00 1.652e-16

Not much of a penalty here – but its not a real example (factor(g)1:x 1.652e-16 se=5.509e-17 t=2.999e+00 p=0.04 * )

In philosophy speak “Awareness of commonness can lead to an increase in evidence regarding the target (2); disregarding commonness wastes evidence(3); and mistaken acceptance of commonness destroys otherwise available evidence(1).”

These considerations of what to take a common and what different is everywhere in applied statistics. Years ago I had tried to get this across here – http://statmodeling.stat.columbia.edu/2011/05/14/missed_friday_t/ – but my mistake likely was getting bogged down in explaining likelihood mechanics and loosing most readers.

Simple examples like this wiki one are probability much better way to present the challenges and opportunities of commonness.

y = a + x1b1 + x2b2 + u

where the data is of the form

y x1 x2 a

6 2 0 4

7 3 0 4

8 4 0 4

9 5 0 4

1 0 8 -7

2 0 9 -7

3 0 10 -7

4 0 11 -7

where b1 = b2 = 1 for all cases (the error term u is identically zero). Modeling

y = a + xb + u

gives the following:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 8.9634 1.7746 5.051 0.00233 **

x -0.6098 0.2449 -2.490 0.04718 *

—

Multiple R-squared: 0.5081, Adjusted R-squared: 0.4262

F-statistic: 6.198 on 1 and 6 DF, p-value: 0.04718

That is, the slope of the regression is negative when the two groups are collapsed together but when modeled separately each has the same positive coefficient 1. One can eyeball the graph in wikipedia (don’t know how to attach it to this comment–is there anyway to do that, or do you want to suppress ability due to the ability to massively spam?) and the intuitive answer is the model y = a + x1b1 + x2b2, but given the low n it’s unlikely any test of misspecification will pick it up. Note number 2 in the list applies since

xb = x1b1 + x2b2

and so the model y = a + xb + u is incorrectly specified as b = (x1b1 + x2b2)/x, which means b is non-constant. So somewhere Andrew needs a constancy condition on the b, which I don’t see in the list (maybe it is implied in 2). But Bayesians can let b vary–are there any examples of Bayesian regression where the b is non-constant (and correlated with the x for that matter)? If so, what are the convergence properties of such a model (poor, I would guess). And if there is such a model, it should work on the wikipedia case as that is the simplest case.

]]>#2, though, is about the statistical model: Does my regression model (not substantive theoretical model) reflect the world sufficiently such that I can properly interpret the results, or does it accidentally average over/away something important because it is mis-specified.

So I think of Simpon’s Paradox in relation to #2. Whether the model is implicit or explicit, it is mis-specified such that the effect in Group A is assumed to be equal (a kind of linear additivity) to the effect in Group B. That isn’t about measurement, it is about modeling.

