33 thoughts on “Bayesian Linear Mixed Models using Stan: A tutorial for psychologists, linguists, and cognitive scientists

  1. This paper has an interesting history, for those suffering repeated rejections.

    1. We first submitted it to PLoS ONE, rejected without review. Reason: don’t do tutorials.
    2. Then we submitted it to Journal of Math Psych. Rejected after review. Major comment from a reviewer: why do we need a tutorial? Just read the (800 page) Stan manual. Fertig.
    3. Then we rewrote it and submitted it to a special issue in Psych Methods. Rejected without review.
    4. Then we submitted it to a special issue in Zeitschrift fuer Psychologie or some such journal I have never heard of. Apparently the oldest journal in psych. There was a special issue. Long silence, then we were informed that the special issue was cancelled because nobody submitted except us (maybe one or two others? I don’t remember). We withdrew it and I don’t think we got any reviews.
    5. Then we submitted it to an open access journal Quantitative Methods for Psychology. This was finally reviewed (and the reviews were very helpful and improved the paper), and eventually accepted.

    I may have missed a journal or two in this list that we submitted to. All this took some 1.5 years and three or so complete rewrites. The rewrites may have improved it a bit, but I don’t feel that what happened to us was fair or justified. I had heard that at a Bayesian summer school people were using this tutorial and recommending it to each other. I still don’t understand why this paper was so hard to publish, especially since it actually teaches a very useful skill. But it’s in press now. So it’s official folks. It’s now worth reading. It has passed peer review.

      • I just entered a two year sabbatical from teaching, I may well do something like that. But it’s going to be hard to top McElreath’s or Kruschke’s books if it’s for non-statisticians. There are a lot of people who only need to know one thing, how to fit linear mixed models; they don’t want to wade through a book. For them this short tutorial article might be useful.

        PS Somehow I have a hard time getting my students in my statistics classes to even read my lecture notes, let alone books. I have to constantly say things like, “Please remember that you are allowed to read up on this, you don’t just need to rely on my lectures and your notes taken in class.”

        • A short book maybe? :)

          Most books can say what’s really their point in 20% of the pages they use. The rest is just redundant information, explained better elsewhere. Maybe there’s pressure from the publisher to add pages so that buyers won’t feel ripped off?

          I think there’s a untapped market for short books, 50-100 pages that will be sold at approx. a $15-$20 price point.

        • Yeah, I know what you mean. The BDA3 book is impossible to carry physically on a vacation with you, and the Kindle edition is unreadable because each page renders ultra-slowly on the computer screen, and it can’t even be read on a Kindle (surprisingly, seeing as it’s a Kindle book!). So I ended up not reading the whole of the BDA3 book. The Kruschke book comes as a pdf, but it just crashed my pdf viewer, it was 700+ pages and there was some bug in the book or the viewer. As a result I didn’t read the whole book.

          My ideal is to write using something like Rbookdown, make the book free but also print editions if one wants them. No publisher seems to want to do that (except O’Reilly with Hadley Wickham’s books). I have a book contract with Cambridge Uni Press (different topic) and they refused to let me put up the book for free the way they allowed it for Mackay’s Information Theory book. Maybe you have to be famous to be allowed to put up your book for free while also having print editions. That means I am doomed to either put up free pdfs and get no publication credit for my books, or to write books that cost actual money.

          I may do something like that. What’s a good title? Some relevant clickbait titles could be:

          – The Signal and the Noise Revisited: How to publish noise as signal in top journals, Tips and tricks from the professional statistician’s perspective
          – P-hacking for Professionals: How to analyze your data so that nobody knows what actually happened
          – How to Fix Null Results: What to do when things go south and you want to go north
          – How to Lie with Statistics Without Actually Lying: Tips and tricks on using conversational implicature and clever wording to hide the dirty reality of your data

          by Shravan Vasishth, PhD

        • We are releasing free pdfs on the up and up with the Stan books we are working on with Chapman and Hall. Cambridge University Press also allows free pdfs in some cases.

        • Bob, I saw on your blog that you release reviewer comments publicly. Is this a violation of confidentiality? I’m asking because I have been also thinking of discussing the things reviewers say publicly, but I have always been unsure whether I am violating an implicit agreement never to talk publicly about the paper and the review. Of course, I don’t need to care, I am a full professor with tenure; what can anyone do to me? But it’s more a question of ethics; if I have agreed to something implicitly, can I violate that agreement? I was curious about what others think about releasing reviews publicly. One useful purpose they would serve is to show the younger generation some examples of what happens in others’ papers, to create some more general awareness of the norms. Right now students learn this from their advisors’ hard earned experience.

    • Unfortunately, the system is indeed neither fair or justified, in lost of ways. The “rules” have become codified as “That’s The Way We’ve Always Done It.” That’s not a good justification.

      • Oops. Should be “… neither fair nor justified” and “in lots of ways” (although “lost” does indeed describe some of the ways things are currently done.)

      • Peer review does usually improve the paper, I have to admit that. It’s just too costly for students as things stand. In my university (Potsdam) you can do a paper based dissertation (called a cumulative diss). This means having three papers published or accepted (with a subset in review) by the time you defend. Currently this is very difficult to achieve in 3 years, going from nothing to doing the research and publishing, if waiting times are 1.5 years (which is starting to feel standard to me now; I think I have six papers stuck like this right now).

    • Shravan:

      Some editors and reviewers take into consideration how an accepted publication in their journal might credential the authors more than they might _deserve_.

      This seemed to be the case in one of my past reviews “Although your paper has some fascinating points concerning, in particular, likelihood visualizations as part of an appropriate data analysis and modeling/inferential strategy, it does not provide a sufficient computational nor graphical contribution/advance to justify publication in ****.” The next journal submitted to had a similar concern about the lack of technical innovation. Both were correct in their assessment – what I was suggesting did not require new technical developments.

      “So [especially] hard to publish, especially since it [only] actually teaches a very useful skill” described what I was trying to publish to a tee. Maybe sometime, when I have [a lot] more time, I might revisit the material but likely following Bob suggestion possibly using Bookdown (by RStudio).

        • Allowing training to use the order & titles of Section Headings & the reference list formats should already help a lot. Add in paper length & keyword list and you ought to make a lot of progress.

          I wonder what other features one might extract to train on.

    • I was really interested in your post and am busy developing my understanding of Bayesian statistics and so will definitely read your paper in the morning.

      I had virtually exactly the same experience as you did with a paper on EFA. I also went through a long process of bouncing from journal to journal (I also submitted to PM, and I think JMP, and also psychometrica and one or other additional journals) and received good feedback from some of the journals but no-one was interested in publishing it. I really felt like the paper would be of value to someone, so I didn’t give up and then got a very positive response from Quantitative Methods for Psychology. The cherry on the top was when shortly after publication I received an email from a reader who had benefited from the paper, and another from someone who wanted help with their analysis. Being one of many academics that is largely self-taught when it comes to advanced statistics, I really see the value in these kinds of tutorials and wish more academic journals were willing to publish them.

      Thanks for the great work, and thanks Andrew for always posting such interesting content!

  2. Can’t say I’m a fan of all the implicit uniform priors on the variance parameters. Seems dangerous to teach folks these models without talking about the importance of using at least somewhat informative priors.

    • Mike:

      I agree, and I take much of the blame for this, as we mostly use uniform priors on hyperparameters in our books. Since writing those books, I’ve changed my views and have become much more convinced of the value of informative priors.

    • Good point. We discuss this in footnote 3:

      “This is an example of an improper prior, which is not a probability distribution. Although all the improper priors used in this tutorial produce posteriors which are probability distributions, this is not true in general, and care should be taken in using improper priors (Gelman, 2006). In the present case, a Cauchy prior truncated to have a lower bound of 0 could alternatively be defined for the standard deviation. For example code using such a prior, see the KBStan vignette in the RePsychLing package (Baayen, Bates, Kliegl, & Vasishth, 2015).”

      In practice, we always do a sensitivity analysis with different priors, and for linguists and the like we have a more entry level discussion in this review, where we also discuss it in more detail:
      http://www.ling.uni-potsdam.de/~vasishth/pdfs/StatMethLingPart2ArXiv.pdf

      In the kind of data we deal with (eyetracking, reading data), in practice the choice of the prior on the variance parameters doesn’t make any difference. But this is of course not going to be true in general.

Leave a Reply to Shravan Cancel reply

Your email address will not be published. Required fields are marked *