Readings for a two-week segment on Bayesian modeling?

Michael Landy writes:

I’m in Psych and Center for Neural Science and I’m teaching a doctoral course this term in methods in psychophysics (never mind the details) at the tail end of which I’m planning on at least 2 lectures on Bayesian parameter estimation and Bayesian model comparison. So far, all the readings I have are a bit too obscure and either glancing (bits of machine-learning books: Bishop, MacKay) or too low-level. The only useful reference I’ve got is an application of these methods (a methods article of mine in a Neuroscience Methods journal). The idea is to give them a decent idea of both estimation (Jeffries priors, marginals of the posterior over the parameters) and model comparison (cross-validation, AIC, BIC, full-blown Bayesian model posterior comparisons, Bayes factor, Occam factor, blah blah blah).

So: have you any suggestions for articles or chapters that might be suitable (yes, I’m aware you have an entire book that’s obviously relevant)? In the class topic (psychophysics), the data being modeled are typically choice data (binomial data), but my methods paper happens to be on data from measuring movement errors (continuous data), not that any of that matters.

My reply:

This is a personal view but I think BIC, Bayes factor, Occam factor, etc are bogus. I recommend you (and your students) take a look at my 1995 article with Rubin in Sociological Methodology (if you go to my home page, go to published papers, and search, you’ll find that paper) for a thorough discussion of what we hate about this. I also think Jeffreys priors are a waste of time. I wouldn’t spend one moment on that in your course if I were you. Regarding choice models, you could take a look at the section on choice models in chapter 6 of my book with Jennifer Hill. I also have a paper in Technometrics, Multilevel modeling: What it can and cannot do. Regarding the topic of model checking, you could take a look at chapter 6 of Bayesian Data Analysis.

15 thoughts on “Readings for a two-week segment on Bayesian modeling?

  1. There is a well known failing among purveyors of prob and stat they generalize the peculiarities of the problems they working on into philosophical principles which claim apply to everyone. In sure there are 20 examples on this blog warning people against making that kind of leap. So it’s always disturbing to see Gelman do it.

    If your problems involve weak models where there is lots of uncertainty in the model assumptions (lots of “unknown unknowns” if you like) then bayes factors and so on probably wont do anything for you. Social scientists typically work on these kinds of problems. But shock-of-all-shocks these aren’t the only type of problems out there.

    For example, if you’re programming military radar software, then the “models” are individual types of aircraft and ships. We know all of them ahead of time and know quite a bit about them (at least until we run into alien UFOs). Moreover the physics of the E&M fields which connect the models to the data are extremely well understood. In this instance we are dealing only with “known unknowns”. We understand everything going on, but haven’t measured everything. In this case the bayes factors and so on will work very very well.

    Your experience in statistics isn’t the be-all and end-all of the subject!

    • I don’t think my experience is “the be-all and end-all.” Landy asked me, so I gave my impressions.

      The real point, though, is that 2 weeks is not a lot of time. Bayes factors etc. have some use (we do mention them briefly in Bayesian Data Analysis) but I don’t think they’re important enough to be covered in such a short segment.

  2. But clearly you have REASONS for saying “BIC, Bayes factor, Occam factor, etc are bogus,” so in this sense it is not “a personal view” if that is understood as merely an idiosyncratic matter of taste. Thus, you should give a hint of your reasons, not to mention I’m very interested to know.

    • Mayo: You’re exactly right. Either they are bogus or they aren’t. If they’re bogus then they shouldn’t be used at all. If there is something to them, then that should be clarified to the point that they can be used reliable and accurately.

      People try to use them to “discover” the structure of a model using internal evidence in the data. Gelman (and I) would rather use a fundamental understanding of the problem to select the structure of the model in such problems. Attempts to do the former, whether by Bayesians for Frequentists, almost always fall apart. It works if you’re modeling something like “sun spot counts over time”, but it doesn’t work in most real problems from the life and social sciences (it’s a well known path to ruin in Finance).

      But even in the social sciences the ideas like bayes factors/okhams factors could be exploited and to put to definite use if understood better. At its heart, Ocham’s Razer is an attempt to judge a model based on its structure. Before dismissing the idea note that it is after all it is possible to judge models without any data at all. For example:

      Model 1 predicts every coin flip will be either heads or tails.

      Model 2 predicts only half the coin flips will be heads.

      Model 1’s prediction will always be at least as correct as or more so than Model 2! Some models are bound to have more predictive ability simply because they are consistent with more possible outcomes. “Consistent” here could be taken to be “if the outcome is used to conduct a posterior check on the model, the model would pass”. I believe these ideas could be exploited better, even by social scientists.

  3. I recommend the excellent lecture series by David MacKay (based on his book, but the lectures are less technical), now available here: http://videolectures.net/course_information_theory_pattern_recognition/

    Unfortunately I have only watched as far as lecture 9, and the lecture covering model comparison (presumably not AIC and BIC – those are not really Bayesian, despite the existence of Bayesian motivations for them) is nr 10. So I don’t know how relevant the content will be, but what I’ve seen so far is making me recommend these lectures to everyone. Starting off with compression and coding may not be the angle most people want to take, but it gives a fresh perspective that most statisticians will not be familar with.

  4. Josh Tenebaum’s homepage
    http://web.mit.edu/cocosci/josh.html
    has a few articles that might be at the right level for Landy’s students: “Bayesian models of cognition”, “A tutorial introduction to Bayesian models of cognitive development”, etc.
    I’m not sure but perhaps the cognitive science context & examples will be helpful for psychophysics students.

    Likewise, there’s also some Bayesian material in Kass, Brown, and Eden’s draft book “Analysis of Neural Data”:
    http://www.stat.cmu.edu/~kass/smbrain/wp-content/uploads/2011/01/syl.pdf

  5. I have the same strong opinion about using marginalized likelihood to do model selection or evaluation. The practical problem is it prefers way too much to simple models simply because our Prior may be always vague. However, relevant methods, such as the use of spike slab prior in variable selection, are so dominant to date in Bayesian community. Do you know some nice papers demonstrating this problem using real examples? I think real examples are more convincing than lots of arguments in words.

    • Longhai:

      In chapter 6 of Bayesian Data Analysis (first and second editions) we illustrate how Bayes factor, or spike and slab, fails when applied to the 8-schools example. That’s a real example, although a small one. My impression is that the Bayesians who like these methods accept that they are not perfect and thus are not too fazed by this sort of counterexample. Instead, they try to apply the methods in settings where they are useful.

      The big problem, I think, is that so many Bayesians think of the marginal probability of a model as a very fundamental thing. To turn the analogy upside down, they see model comparison or model averaging using Bayes factors as the apex of Bayesian analysis. Hence the attempt to fix the Bayes factor in various ways (e.g., BIC, partial Bayes factor, etc.) so as to stay inside this framework. Here I think the ultimate response is not to show that the traditional framework fails but instead to solve problems using more direct approaches, for example shrinkage of coefficients instead of variable selection. Much of modern nonparametric Bayes goes in this direction: lots of parameters and lots of shrinkage, no need for variable selection or model selection.

  6. Hi Andrew,

    You may hate Bayes factors, but I love them, and with a passion. I think they answer exactly the kinds of questions that researchers in my field care about (“did my experimental manipulation show an effect or did it not?”). And I am not alone in my love for Bayes factors. Prominent Bayesian statisticians like Berger, Raftery, O’Hagan, and Robert and many others all advocate the use of the Bayes factor in their papers and books. So to argue that Bayes factors are “bogus”, and that the concept of Ockham’s razor is “bogus” is your Personal View. That’s fine of course, and you clearly indicate that you are expressing your Personal View, but I did not want your blog readers to walk away with the idea that your Personal View is the only one that can be entertained. Perhaps we can agree that Bayesian statisticians will never agree and there will always be people who hate the Bayes factor, and people who love the Bayes factor.

    Cheers,
    E.J.

  7. EJ:

    Agreed. But then we should also add to the list some other methods that have their strong supporters and that have solved many applied problems, methods such as classical hypothesis testing, stepwise regression, path analysis, etc etc. For a 2-week course, I’d prefer to focus on the methods I really like. In our book, though, we consider all sorts of methods we do not ourselves use and discuss where they can be useful.

  8. Sorry, but what exactly is your problem with using Bayes Factors? The Gelman & Rubin paper does not really make this clear (or am I missing something?), and I simply can’t imagine what would be the problem with BFs. I am actually quite impressed by E. Jaynes’ argument in favor of them. Could you elaborate?

    • JP:

      See Bayesian Data Analysis (pages 185-186 of the second edition). We have an example in which Bayes factors are helpful and an example in which Bayes factors are a distraction. It happens that the problems I’ve seen are mostly of the second sort.

  9. I think that Jeffreys prior is neat-o as part of the development of probability matching priors (Bayesian pie and Frequentist ice cream); I’d mention the existence of O-Bayes with good frequentist properties as counterpoint to the common perception of essential tension. There are lots of problems in which model selection of any flavor is not a good idea, but plenty in which it is. I’d mention the existence of Bayesian model selection in a 2 week course to avoid giving the impression that Bayesians have no reasonable methods for that question. If I recall correctly, you liked WAIC and DIC. WAIC and DIC estimate the cross-validation of the posterior likelihood, and fractional BFs directly implement that calculation where analytically possible. I agree than vanilla BFs and their descendants take the prior on model parameters a bit too seriously to represent an interesting calculation. That is a useful lesson, though maybe too involved for a short course.

Comments are closed.