I’ve also seen the term “hierarchical regression” used to refer to a sort of stepwise model where coefficients are grouped into “blocks” and added to the model one at a time in the assumed order of causal priority. This of course has nothing to do with nested data or random effects or anything, but it’s just another way these terms get thrown around.

]]>Yeah the default is inverse wishart. Can certainly do scaled f but takes some work. I’m just too lazy I want something EZ

]]>You’re probably right. I do the former which would estimate one ranef per surgeon even if they operate at multiple hospitals.

I use these models for profiling. If surgeons were to operate on avg patient what would be mortality prob across all surgeons

]]>And … MCMCglmm can do “parameter-expanded” priors, which corresponds to half-t priors for standard deviation parameters (i.e. for scalar random effects terms) and something analogous (???) for vector-valued/multivariate random effects terms … see section 8.0.2 of the “CourseNotes” vignette …

]]>Making no claims for superiority of Stan vs MCMCglmm, just here to point out that MCMCglmm can also do whichever of those things you want. Don’t quote me on it, but I think random=~surgeon+hospital should give you a single value per surgeon, whereas random=~hospital+hospital:surgeon would give you “surgeons nested within hospitals”, i.e. a different value for the surgeon in each hospital.

]]>OK. Sounds like yet another case of vague and ambiguous naming in the stats literature. The Wikipedia entry for the Empirical Bayes Method cites Chris Bishop’s book for the definition (their [1], my emphasis):

Empirical Bayes methods are procedures for statistical inference in which the prior distribution is estimated from the data. This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed before any data are observed. Despite this difference in perspective, empirical Bayes may be viewed as an approximation to a fully Bayesian treatment of a hierarchical model wherein the parameters at the highest level of the hierarchy are set to their most likely values, instead of being integrated out.

Empirical Bayes, also known as maximum marginal likelihood,[1] represents one approach for setting hyperparameters.

I couldn’t follow the Casella (1985) paper or even the Efron and Morris (1975) paper as they never isolate the technique per se, but instead introduce it implicitly along with their particular application. I will never understand why statisticians can’t just define something clearly. They keep saying “empirical Bayes”, but never stop to define it other than by example.

]]>Thanks—I can’t even back out where I saw that term when I was looking for synonyms.

]]>I took the easy way out and defined it circularly.

Technically, I think of a hierarchical model as one involving a hierarchical prior. A hierarchical prior is one we fit jointly with the item-level parameters. For example, in the simplest regression case,

alpha[1], …, alpha[N] ~ normal(mu, sigma)

mu ~ normal(0, 2)

sigma ~ normal(0, 2)

defines a joint density p(alpha, mu, sigma) for “item”-level parameters (alpha[1], …, alpha[N]) and “population”-level parameters (mu, sigma). I use scare quotes here, because these levels can keep nesting (students in classrooms in schools in districts in states in countries in …) so there’s no natural item-level or population level.

]]>Bob:

You write: “Empirical Bayes is just penalized max marginal likelihood.” Sometimes, but not always; see discussion here.

]]>“Multilevel Models are like 2 or 3 hard concepts and lots of different names and notations for everything.”

We had a lot of fun with writing things several different ways as an exercise in translating between disciplines.

]]>Empirical Bayes is just penalized max marginal likelihood. So you have to specify which variables to marginalize, but otherwise it’s just (the first step of) an estimation technique.

]]>Adan: Also this may be of interest https://statmodeling.stat.columbia.edu/2019/09/18/all-the-names-for-hierarchical-and-multilevel-modeling/#comment-1122518

]]>Terry: There are also philosophical views (the meta-theory of “hierarchical models”) – http://www.stat.columbia.edu/~gelman/research/unpublished/Amalgamating6.pdf

]]>Terry:

Lots of treatments of hierarchical models, including my book with Jennifer, have oversold the idea by treating the methods as automatic and emphasizing that the group-level variance parameters can be estimated from data alone.

Our focus on this point was understandable as a reaction to earlier Bayesian work that considered the group-level variance parameters as part of the “prior” and thus required to be pre-specified in the analysis. I think we were making a valuable point to emphasize that these hyperparameters can, indeed, be estimated from the data.

But when the number of groups is not large, you can’t estimate the group-level variance parameter accurately from data alone, and it can help, for both computational and inferential reasons, to guide this inference using prior information. So now we’re moving toward a general recommendation to use informative priors when fitting multilevel models. That’s what we do in rstanarm, for example.

How did I realize the problem with purely data-based estimates of group-level variance parameters? I realized after seeing Jeff and Justin fit the same hierarchical regression model to a series of datasets (as part of an MRP project). We noticed the estimated group-level variance parameters jumping around from dataset to dataset, resulting in some substantively incoherent inferences.

Consider a simple case of a state-level variance parameter. When it’s estimated to be near 0, the state intercepts are partially pooled toward the national model, which might predict the state-level intercept based on Republican vote share in the previous election. When the state-level variance is estimated to be a high value, the state intercepts are not pooled, so these estimates will vary much more by state. Having vastly different amounts of pooling from dataset to dataset can wreak havoc on inferences over time. Ideally one would embed all this in a time series model, but in practice we’re often secret-weaponing it, fitting the model separately to each dataset, and we’ll want to regularize this estimate with an informative prior on the group-level variance.

The point is that if you fit a model, or a series of models, on just one dataset, everything can seem to make sense. But when you fit the model repeatedly to different data, the noisiness of the estimate can be more apparent.

Our perspective, and our practice, has changed. We oversimplified and oversold. And now I think we’re doing better.

P.S. OK, I guess this should be its blog post. It should appear in March or so.

]]>Well, I kinda understand what it mean (the term “Hierarchical model” is the most self-explanatory term in the list). But I was searching for some more mathematical definition. You know, the model can be called hierarchical if parameter vector X can be split into groups X_1, X_2 and so one, data Y can be split into X_1, X_2 and so one, such that P(Y_1|X_1), P(Y_2|X_2) … are separate models. Something like this.

I was searching for a definite definition of a “Hierarchical model” to refer when I was writing my thesis intro. But I found none. I guess if I would be writing another theoretical text, I would have to quote this comment.

]]>For quite a while, I thought “hierarchical models” were some exotic new technique, and that I had no hope of understanding them. Turns out that I already knew about them, just under a different name.

]]>Keith:

Wow, thank you so much. That’s vbery nice of you, I appreciate it. I will certainly go through the details and willing to learn the workflow.

]]>Adan:

Perhaps even especially for clinical research, you might wish to go heavy on the principled Bayesian workflow.

My into for those who tend to be “lazy” – my talk at https://www.youtube.com/watch?v=I7AVP9BCm1g&feature=youtu.be (also access to code and slides) – or more definitive stuff by Michael Betancourt at https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html

And you are only expecting one rejection ;-)

]]>You are correct. I was confused and I was wrong.

Luckily peer review worked by identifying the mistake and allowed us to make the easy word change from patients nested within surgeons nested within hospitals to patients clustered within surgeons clustered within hospitals. And now its published.

https://www.ncbi.nlm.nih.gov/pubmed/31082909

Thank you for pointing this out. Science progresses

]]>Done. That was relatively easy. I still need to do a lot of model diagnostic checking and will probably add more variables but I got the model fitting down. Relatively easy. One all nighter did it.

-Adan

]]>The fact that you used the word nested incorrectly might indicate that you are confused. But it doesn’t permit you to assert that “ there is also confusion over the word “nested””. I am not confused.

]]>Side note:

BYE BYE inverse wishart priors :)

]]>“How did I miss this” I hear people crying… including me. Andrew does a good job of clearing up the issue below: effects vary by group in both settings but no pooling versus pooling, or another way to think about it is no probability model vs. yes probability model

Again “how did I miss this”- I ask myself

https://statmodeling.stat.columbia.edu/2015/03/22/no-fixed-random/

]]>BAHAHAHA it will surely be rejected at least once before it gets accepted. Journals such as JAMA often don’t appreciate rigorous methods or wouldn’t appreciate why we have these conversations. Or maybe my paper really is not all that important… Regardless Ill be happy to send it to you!

]]>Or when it’s rejected, whichever comes first.

]]>Cool. After many years of knowing that Stan is superior to MCMCglmm, I am now going to actually stop being lazy and learn how to do it in Stan.

Will send you paper when it is accepted.

-Adan

]]>Adan:

Stan will do whatever you tell it to. You can fit a model in which parameters vary by surgeon and are preserved across hospitals, or a model where surgeon parameters can vary by hospital. You can also do fit such model using Stan from R using stan_glmer() in rstanarm. Or using brms.

]]>Thanks for the clarification. Ive seen econometricians use fixed effects model terminology but it resembled a multilevel model that allows intercepts to vary. At first I thought econ fixed effects model meant including the clustering variable as a parameter in the model. So for an example where patients are clustered within surgeons, I thought that the econ fixed effect model would include surgeon ID as a fixed parameter (which would estimate k-1 parameters where k is number of surgeons), but now I see what you mean… I think lol

]]>I meant to say other software like SuperMix estimates separate random effects for surgeon A at hospital A and surgeon A at hospital B*

]]>Fair enough. Clustering makes more sense to me anyway.

How does Stan deal with cross classification? I am currently working on getting data ready to use Stan for the first time. Does it estimate one random effect for a surgeon who operates at multiple hospitals? MCMCglmm estimates one random effect for this surgeon, but I believe other software like SuperMix (which uses empirical bayes). I know I know, SuperMix is written in fortran but Im just curious.

]]>You write, “’empirical Bayes’ means using maximum marginal likelihood…” Not quite. Empirical Bayes has different meanings. One meaning of empirical Bayes is hierarchical modeling using point estimates for the hyperparameters (not necessarily marginal maximum likelihood, by the way; traditionally it’s been common to use moment-based estimates). In other settings, empirical Bayes refers to a more general conceptual framework in which intermediate-level parameters are considered as latent or missing data and averaged over, whereas hyperparameters are considered as fixed quantities to be conditioned on. (And then there are the people who say you can’t condition on a fixed quantity; don’t get me started on that.) In any case, you’re right that I don’t like the term “empirical Bayes” because it seems to imply that plain old “Bayes” is not empirical.

]]>Mikhail:

Either: a model for data that are clustered into (possibly non-nested) groups.

Or: a model where the parameters are themselves batched into batches that are modeled.

Sorry about saying “batched into batches”; I can’t say “batched into groups” because I already used “groups” in the first definition. I try to be unambiguous: data are in groups, parameters are in batches.

]]>Adan:

In our book, Jennifer and I would refer to that example as non-nested.

]]>Then a reviewer said that the data aren’t nested because surgeons can operate at multiple hospitals. We responded with saying that we used the MCMCglmm package which accounts for the cross classification when estimating the random effects and changed the word “nesting” to “clustering”.

Thoughts?

]]>Like, can you point to the single simple definition?

]]>for example, “Statistics for Spatio-Temporal Data” (Cressie, Wikle, 2015) uses the term “hierarchical model” for something I would call Hidden Markov model

]]>What’s similarity between all these names? Partial pooling. Maybe we should just call them models for pooled/longitudinal data.

]]>