I use these models for profiling. If surgeons were to operate on avg patient what would be mortality prob across all surgeons

]]>Empirical Bayes methods are procedures for statistical inference in which the prior distribution is estimated from the data. This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed before any data are observed. Despite this difference in perspective, empirical Bayes may be viewed as an approximation to a fully Bayesian treatment of a hierarchical model wherein the parameters at the highest level of the hierarchy are set to their most likely values, instead of being integrated out.

Empirical Bayes, also known as maximum marginal likelihood,[1] represents one approach for setting hyperparameters.

I couldn’t follow the Casella (1985) paper or even the Efron and Morris (1975) paper as they never isolate the technique per se, but instead introduce it implicitly along with their particular application. I will never understand why statisticians can’t just define something clearly. They keep saying “empirical Bayes”, but never stop to define it other than by example.

]]>Technically, I think of a hierarchical model as one involving a hierarchical prior. A hierarchical prior is one we fit jointly with the item-level parameters. For example, in the simplest regression case,

alpha[1], …, alpha[N] ~ normal(mu, sigma)

mu ~ normal(0, 2)

sigma ~ normal(0, 2)

defines a joint density p(alpha, mu, sigma) for “item”-level parameters (alpha[1], …, alpha[N]) and “population”-level parameters (mu, sigma). I use scare quotes here, because these levels can keep nesting (students in classrooms in schools in districts in states in countries in …) so there’s no natural item-level or population level.

]]>You write: “Empirical Bayes is just penalized max marginal likelihood.” Sometimes, but not always; see discussion here.

]]>“Multilevel Models are like 2 or 3 hard concepts and lots of different names and notations for everything.”

We had a lot of fun with writing things several different ways as an exercise in translating between disciplines.

]]>Lots of treatments of hierarchical models, including my book with Jennifer, have oversold the idea by treating the methods as automatic and emphasizing that the group-level variance parameters can be estimated from data alone.

Our focus on this point was understandable as a reaction to earlier Bayesian work that considered the group-level variance parameters as part of the “prior” and thus required to be pre-specified in the analysis. I think we were making a valuable point to emphasize that these hyperparameters can, indeed, be estimated from the data.

But when the number of groups is not large, you can’t estimate the group-level variance parameter accurately from data alone, and it can help, for both computational and inferential reasons, to guide this inference using prior information. So now we’re moving toward a general recommendation to use informative priors when fitting multilevel models. That’s what we do in rstanarm, for example.

How did I realize the problem with purely data-based estimates of group-level variance parameters? I realized after seeing Jeff and Justin fit the same hierarchical regression model to a series of datasets (as part of an MRP project). We noticed the estimated group-level variance parameters jumping around from dataset to dataset, resulting in some substantively incoherent inferences.

Consider a simple case of a state-level variance parameter. When it’s estimated to be near 0, the state intercepts are partially pooled toward the national model, which might predict the state-level intercept based on Republican vote share in the previous election. When the state-level variance is estimated to be a high value, the state intercepts are not pooled, so these estimates will vary much more by state. Having vastly different amounts of pooling from dataset to dataset can wreak havoc on inferences over time. Ideally one would embed all this in a time series model, but in practice we’re often secret-weaponing it, fitting the model separately to each dataset, and we’ll want to regularize this estimate with an informative prior on the group-level variance.

The point is that if you fit a model, or a series of models, on just one dataset, everything can seem to make sense. But when you fit the model repeatedly to different data, the noisiness of the estimate can be more apparent.

Our perspective, and our practice, has changed. We oversimplified and oversold. And now I think we’re doing better.

P.S. OK, I guess this should be its blog post. It should appear in March or so.

]]>I was searching for a definite definition of a “Hierarchical model” to refer when I was writing my thesis intro. But I found none. I guess if I would be writing another theoretical text, I would have to quote this comment.

]]>For quite a while, I thought “hierarchical models” were some exotic new technique, and that I had no hope of understanding them. Turns out that I already knew about them, just under a different name.

]]>Wow, thank you so much. That’s vbery nice of you, I appreciate it. I will certainly go through the details and willing to learn the workflow.

]]>Perhaps even especially for clinical research, you might wish to go heavy on the principled Bayesian workflow.

My into for those who tend to be “lazy” – my talk at https://www.youtube.com/watch?v=I7AVP9BCm1g&feature=youtu.be (also access to code and slides) – or more definitive stuff by Michael Betancourt at https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html

And you are only expecting one rejection ;-)

]]>Luckily peer review worked by identifying the mistake and allowed us to make the easy word change from patients nested within surgeons nested within hospitals to patients clustered within surgeons clustered within hospitals. And now its published.

https://www.ncbi.nlm.nih.gov/pubmed/31082909

Thank you for pointing this out. Science progresses

]]>-Adan

]]>BYE BYE inverse wishart priors :)

]]>Again “how did I miss this”- I ask myself

https://statmodeling.stat.columbia.edu/2015/03/22/no-fixed-random/

]]>Will send you paper when it is accepted.

-Adan

]]>Stan will do whatever you tell it to. You can fit a model in which parameters vary by surgeon and are preserved across hospitals, or a model where surgeon parameters can vary by hospital. You can also do fit such model using Stan from R using stan_glmer() in rstanarm. Or using brms.

]]>How does Stan deal with cross classification? I am currently working on getting data ready to use Stan for the first time. Does it estimate one random effect for a surgeon who operates at multiple hospitals? MCMCglmm estimates one random effect for this surgeon, but I believe other software like SuperMix (which uses empirical bayes). I know I know, SuperMix is written in fortran but Im just curious.

]]>You write, “’empirical Bayes’ means using maximum marginal likelihood…” Not quite. Empirical Bayes has different meanings. One meaning of empirical Bayes is hierarchical modeling using point estimates for the hyperparameters (not necessarily marginal maximum likelihood, by the way; traditionally it’s been common to use moment-based estimates). In other settings, empirical Bayes refers to a more general conceptual framework in which intermediate-level parameters are considered as latent or missing data and averaged over, whereas hyperparameters are considered as fixed quantities to be conditioned on. (And then there are the people who say you can’t condition on a fixed quantity; don’t get me started on that.) In any case, you’re right that I don’t like the term “empirical Bayes” because it seems to imply that plain old “Bayes” is not empirical.

]]>Either: a model for data that are clustered into (possibly non-nested) groups.

Or: a model where the parameters are themselves batched into batches that are modeled.

Sorry about saying “batched into batches”; I can’t say “batched into groups” because I already used “groups” in the first definition. I try to be unambiguous: data are in groups, parameters are in batches.

]]>In our book, Jennifer and I would refer to that example as non-nested.

]]>Then a reviewer said that the data aren’t nested because surgeons can operate at multiple hospitals. We responded with saying that we used the MCMCglmm package which accounts for the cross classification when estimating the random effects and changed the word “nesting” to “clustering”.

Thoughts?

]]>Like, can you point to the single simple definition?

]]>for example, “Statistics for Spatio-Temporal Data” (Cressie, Wikle, 2015) uses the term “hierarchical model” for something I would call Hidden Markov model

]]>What’s similarity between all these names? Partial pooling. Maybe we should just call them models for pooled/longitudinal data.

]]>