Can’t remember anything at all — Nick Cave
Girls, it’s been a while. But it’s midnight and I am AMPED after a Springtime concert (when Lauren asked me what genre of music was, I said “loud”. They played in a very weird room that usually hosts orchestras and clearly had a rusted on group of silver-haired subscribers who didn’t want the intersection of swampy rock, free jazz, and noise. And I now need to see every single band possible in a room that’s half full of people who are SIMPLY NOT INTO IT. Glorious. It’s the vibe of the thing.)
Of course I’ve been gone, but fish gotta swim, birds gotta swim if sufficient external motivation is applied, and similarly I’ve just gotta blog. So I’ve been doing it in secret (not actually secret, I’ve been telling people). Not because I don’t love you but because sometimes a man needs to write eight thousand words on technical definitions of Gaussian processes, describe conjugate priors as “the crystal deodorant of Bayesian statistics“, and just generally make people say “look, I held on for as long as I could but that was too much“. I’ve also been re-posting a couple of my old posts from here that I like because the WordPress ate my equations (they’re all lightly edited because I’m anal, and this one probably got much better). Also, using distill to make blogs means I can go footnote mad and you know how much I love footnote.
Anyway. What am I gonna talk about? I feel there’s some pressure here because one of my favourite of my posts (and, also, one of my most “is he ok? no” posts) was written in a similar post-concert coming down state.
But I’m not gonna be that fancy or that long winded tonight.
Instead, I just want to share some stolen thoughts on a single phenomena the is interesting.
When you add a “random effect” to your regression-type model, the regression coefficients will change. Sometimes a lot.
Now I know that I risk the full wrath of the cancel culture (to use the youth’s term) by using the term random effect on this blog. Andrew has written at length about why he doesn’t like it (you can google). But it’s a useful umbrella term in this context: I mean iid effects like random slopes and intercepts, smoothing splines in (Bayesian) GAM models, Gaussian processes, and the whole undead army of other “extra randomness” bits that you can put in a regression equation. (The criticism that a term doesn’t mean anything specific or is ambiguous can be effectively blunted by using it to mean everything.)
As with many “change the cheerleader, change the world” statistical issues, this really always ends up being an issue of confounding. But oh what subtle confounding it can be! Jim Hodges has a legitimately wonderful book called Richly Parameterized Linear Models that I heartily recommend that goes into all of these things. Even just by looking at the sample chapters, you will find
- models where the multilevel structure is confounded with a covariate
- models where a more complex dependence structure (like an ICAR for spatial data or a spline for … non-spatial data) is confounded with a covariate.
(The book also covers some truly wild things that happen to the hyperparameters [aka the parameters that control the correlation structure] in these situations!)
The long and the short of it, though, is that the interpretation of regression coefficients will change when you add more complex things (like multilevel or spatial or temporal or non-linear) structure to your model.
As with all things, this becomes violently true when you start thinking of interpreting your regression coefficients causally.
It’s worth saying that this is potentially a feature and not a bug! Especially when you’ve added the extra structure to your regression model in order to try and separate structural effects (like repeated measurements in individuals or temporal or spatial autocorrelation) from the effect of your covariates!
So how do you deal with it? No earthly idea.
- Hodges and Reich suggest (in some contexts) organizing your extra randomness so it’s orthogonal to your variables of interest (this is easy to do for Gaussians). This assumes that there are no unmeasured covariates moving in the same direction as your covariate of interest is and will keep the regression coefficient point estimates the same between the vanilla regression and regression with the extra stuff. (Hanks et al suggest this can lead to unreasonably narrow posterior uncertainty intervals)
- Sigrunn, Janine, David, Håvard, and I think you should use priors to explicitly limit complexity of your extra structure, which puts bounds on how much of the change can be coming from unmeasured things correlated with your object of interest. (Jon Wakefield also does this when choosing priors for smoothing splines in his excellent book)
- A whole cornucopia of literature that I don’t have the time or space to link to covers an infinity of aspects of formal causal identification and estimation in a whole variety of situations where we would add these type of structured uncertainties (longitudinal models, time series, spatial models, and variants thereof).
All of which is to say: don’t take your regression coefficients for granted when there are fancy things in your model. But also don’t avoid the fancy things because if you do that your regression coefficients won’t make any sense.
Did you think I’d leave you on a happy note?
If you want to interpret the model parameters you need to derive the model from some set of assumptions about the process that generated the data.
The standard statistical model that uses whatever data is available does not yield interpretable parameters. It can, however, be used to make predictions.
Yes, this means there is a vast literature of misinterpreted numbers out there.
People violently misinterpreting regression models sums up a lot of what I have been thinking about lately. Thanks for the post!
Also noticed a lot of people passively misinterpreting regression models
Hi, Dan. Always good to see you posting here! Perhaps relevant to the above discussion is an unpublished paper from 2006 with Joe Bafumi, Fitting multilevel models when predictors and group effects correlate.
For what it’s worth, the style of your post is as if you were trying to literarily embody the idea of adding randomness to a structure, and how challenging the randomness can make getting to the meaningful part.
Yeah, that felt like an unnecessary barrier. My lunch break timed out before I was really able to get my head onto whatever he was trying to say.
I said, “Well, Daddy, don’t you know that things go in cycles?
Way that Bobby Brown is just amping like Michael”
https://statmodeling.stat.columbia.edu/2017/11/02/king-must-die/#comment-601081
Point taken. But my dissatisfaction wasn’t with Dan’s writing style, it was with the abrupt shift in tone and content. It’s a modeling issue (kinda): I’d inferred after the first paragraph that the post would go on like that and stopped reading (as Dan said at the link, it’s so, so easy to not read something). Fortunately, I then looked at the comments, reversed my inference, and finished the post. Hence the irony that the post ended up being about randomness diverting one from a correct inference, and the post modeling the very phenomenon!
There’s no irony here. It’s just not for you. (Politely signalled in the first sentence.) Not every piece of free content on the internet needs to be tailor made to fit your peculiarities. There is plenty more internet. Go enjoy it!
> we write in order to figure things out for ourselves.
I certainly overdid this in my initial posts way back e.g. https://statmodeling.stat.columbia.edu/2010/12/17/thetyranyof13pd/ (I was struggling to see why others seemed to avoid plotting in the model parameter space rather than just the data) and my OK I’ll start redoing blogs post differently – https://statmodeling.stat.columbia.edu/2016/12/12/avoiding-shadow-knowing-motivating-problem-post/
For every personality/persona the least wrong choice is likely different.
Thanks for reminding me of Hodges work – the 2014 work seemed old but when I checked my email, I last looked at his work in 2013.
Dan:
I think one thing that readers don’t always realize is that the purpose of our writing is not just to explain known things to others; rather, often we write in order to figure things out for ourselves. Indeed, that could describe most academic writing (including blogging). The difficulty is that academic writing is often presented in an Olympian way, with the implication being that the author is The Expert and the purpose of writing is to convey The Truth. Some writers, like you and me (sometimes) have a more questing style in which we’re more open about own process of exploration. I don’t always do such a good job of this myself, but I think you do, and that makes your writing especially interesting to me (also I enjoy the cultural references!). But to people used to the usual style of academic writing, your questing style can be disconcerting.
Just to be clear: I’m not saying there’s anything wrong with a bare-bones, just-the-facts, here’s-the-outline style of writing. A clear and direct style without detours can be really helpful in many, perhaps most, situations. I also think there’s a place for a meandering style, which I think we should value in part because it reflects the process of the author working things out.
To be honest, Andrew, I just don’t think people understand that it’s amusing to get complained at once but annoying to get complained at by the same person a second time. The third time I send over my consulting rates :p
Dan:
Yeah, that’s another issue. In theory, I appreciate even the most negative comments as (a) sometimes I’m missing something or I’m making a big mistake, (b) even if I’m not making a mistake, I could be failing to communicate, and (c) I’m guessing that people who like our stuff are more likely to comment here than people who don’t, so it’s nice that the people who see problems with our writing are going to the trouble of telling us!
That said, sometimes I get annoyed by commenters, especially when they tell me to “stick to statistics” or whatever, and I’ll reply that the blog is free, they should just skip the stuff they don’t like. The good news is that I think it’s rare for us to get flat-out trolls, and when that does happen the trolls will typically get bored once they see people responding to them with sincere arguments. I think a big issue is that it’s hard to convey intonation in typed speech, so it can be hard sometimes for us to distinguish between comments that are helpful and comments that are provocations. Sometimes what seems to us like complaining can just be the commenters working things out; other times maybe not.
In any case, I appreciate all your posts, both for their technical content and for their entertainment value. Actually, these are related!
Hi Dan. Great post, and glad to see Hodges’ work getting more love. I really like his and Brian Reich’s paper “Adding spatially-correlated errors can mess up the fixed effect you love” as a concise summary of the issue of confounding of unpenalized covariates with spatial random effects; they clearly demonstrate there that the issue is particularly severe when covariates vary at low frequencies, as these are the parts of the spatial random effect that are most weakly penalized. It also clearly generalizes to the broader world of random effects.
I’m also curious if you have any thoughts on Emiko Dupont, Simon Wood, and Nicole Agustin’s paper “Spatial+: a novel approach to spatial confounding”. The basic procedure there is to first regress your covariates on space with an appropriate spatial random effect, then include the deviance residuals from those regressions in the model for your outcome. I think it’s pretty closely related to the idea of phylogenetically independent contrasts from phylogenetic regression.
Yes!! I _really_ like that paper and if I hadn’t been writing this at midnight I would’ve remembered to include it in the list!
Isn’t this problem in even the simplest of regression structures with multicollinearity? Like, in the classic example of OLS with two highly collinear covariates, the sampling variance of the individual coefficients is really high, and excluding either is not necessarily valid.
Though I guess in that case it’s not as much of a problem with full bayes (ignoring computation) since the identification problem is plainly visible in the posterior, while you can’t easily get an interpretable sample across categorically different model configurations. Maybe you can infer causality in these settings using hierarchical stacking across model configurations to get predictive distributions under counterfactual covariates?
Neat stuff to think about!
It makes me wonder whether this is an instance of a general problem with descriptive modeling, given that the solutions you point to also rely on a general principle. That principle is “align your model parameters with their intended meaning”. By “descriptive modeling”, I refer to models like regression models which are ultimately all about carving up variance. If we want to turn them from descriptive into causal models, the joints along which the model cuts have to be aligned with (hypothesized) causal mechanisms. The joints of a model are its parameters.
I say this is a general problem with descriptive models because it comes up even in very simple cases. You do a regression that has a multiplicative interaction term—what does that mean in context? Is it sensible? One of your predictors is categorical, so you set up a contrast matrix—that determines how the resulting parameters need to be interpreted, but do they mean what you want them to mean?
My impression is that problems like these come up because people tend to throw a regression at the data using the defaults of whatever software they are using (or using some generic textbook approach). They hope that the causal structure of the data—the joints along which to carve—will be obvious from the resulting parameter estimates, but this is rarely the case. You can alleviate this problem post mortem by trying to reverse-engineer the model you applied. Or you can flip it around, which is what your suggestions amount to, by building in the types of structure you expect into the model itself. This turns the model into a *hypothesis* about causal structure rather than an often opaque descriptive tool.
Dan, I for one like your posting style.
Off-topic, checkout Camp Cope’s recent cover of Sam Fender’s “17 going under”.