Skip to content

The methods playroom: Mondays 11-12:30

Each Monday 11-12:30 in the Lindsay Rogers room (707 International Affairs Bldg, Columbia University):

The Methods Playroom is a place for us to work and discuss research problems in social science methods and statistics. Students and others can feel free to come to the playroom and work on their own projects, with the understanding that with many people with diverse interests in the room, progress can be made from different directions. The Playroom is not a homework help spot. It’s a place for us to have overlapping conversations about research, including work at early, middle, and late stages of projects (from design and data collection through analysis, interpretation, and presentation of results). It is a place to share different perspectives on quantitative work and connections between quantitative and qualitative work.


  1. Yay! The playroom’s awesome enough for me to get out of bed early.

    If it weren’t for the playroom, I’d never have gotten R2WinBUGS installed (thanks Masanao and Yu-Sun) and would’ve never had the time to get so much hands-on advice from Andrew and Jennifer Hill (she and Andrew were working on multiple imputation at the time). They helped me formulate a model for my goal of incorporating annotator accuracy and bias into gold-standard creation for machine learning (what Breck and I were working on a lot at our natural language processing company). Turns out we rediscovered the same model likelihood that Dawid and Skene developed in 1979 (and fit with an early application of EM, deriving exactly the marginalization we need to fit it in Stan). Andrew once told me that every model you come up with in social science was developed by psychometricians 50 years ago. This was only about 30 years, but close enough for statistics :-) I only knew about the playroom because a friend introduced me when we both moved to NYC in the mid-90s. I started asking him for stats help and he suggested I come to the playroom. Great idea, as there were grad students (Masanao Yajima and Yu-sung Su) and also Jennifer Hill to help translate Andrew’s advice into terms a beginner could understand. Now everyone knows about it!

    I was so hooked on the problems and the methodology and had gotten a bit bored building logistic regression classifiers, named entity extractors, clusterers, and spell checkers for relatively simple applications (it was a 2-person company), that I jumped at the chance to move back into academia (I was a professor before the first dot-com sucked me into industry). I took what was basically a postdoc position working with Andrew at a huge pay cut. I really wanted to understand MCMC more than anything else, as I had a strong feeling I could build a better piece of software than BUGS if I just understood the problem. (Don’t get me wrong, BUGS, like R, is an amazing piece of work, even more so when you realize nobody building these tools had a computer science background; see, e.g., aRrgh: a newcomer’s angry guide to R, for a computer scientist’s perspective on R.) All my former academic colleagues were jealous, because it’s actually a lot more fun being a postdoc than a dean or department head or even a senior professor if you like doing research yourself.

    Stan, in many ways, is a direct result of my involvement in the playroom. When Matt Hoffman and I joined Andrew’s group and set to work on trying to build better samplers for hierarchical models, we found HMC and autodiff pretty early on (both through suggestions to posts I made on this blog asking for help, I might add, despite Andrew having had a previous postdoc, Matt Schofield, who used HMC to fit tree ring climate data). I thought, hey, we can turn this into a BUGS-like programming language. So I went off and built the Stan C++ prototype and then started on the language (only after I built the language prototype did I realize the language was a lot more general than BUGS!). Matt, meanwhile, cracked open the sampling problem by developing NUTS (and also developed stochastic variational inference in his spare time!).

    P.S. Stan fits the Dawid and Skene data coding models very robustly in something like 15 minutes; BUGS would take more than 24 hours and often crash. So I succeeded in my initial goal of building a better way to fit these models. It took a few years. And like in other cases, the discrete parameter marginalizations are all given by those who were fitting max marginal likelihood models in the 1970s and 80s.

Leave a Reply