Three hours in the life of a statistician

Kaiser Fung tells what it’s really like.

Here’s a sample:

As soon as I [Kaiser] put the substring-concatenate expression together with two lines of code that generate data tables, it choked.

Sorta like Dashiell Hammett without the broads and the heaters.

And here’s another take, from a slightly different perspective.

4 thoughts on “Three hours in the life of a statistician

  1. I dunno, I imagine a guy like Kaiser probably has a broad peering over his shoulder saying “whatcha doin honey? Why won’t you come on down to dinner?” and Kaiser would say “Hey baby, gimme a break here, can’t ya see I’m busy with these dates here”. Then a guy would bust in to the room and demand that Kaiser tell him the upcoming lottery ticket numbers, and Kaiser would plug him full of holes with his gat.

  2. I’ve been at the project I’m working on now for three hours. I couldn’t remember how to read data frames in R or then how to get all the data as a set of levels, and their doc is like a maze of twisty little passages and the nice book on simulation in R I have at home is nowhere at hand.

    I then had no problem munging my data. That took about ten minutes.

    I then wrote the model in JAGS and the wrapper in R. Easy.

    I then spent almost two hours banging my head against R2jags and rjags, failing to get them installed. I then spent ten minutes writing a detailed message to Yu-Sung begging for help. Which leaves me completely blocked at work.

    I’m now going home incredibly frustrated to use my notebook, where either I got lucky when I spun the big install wheel the first time (it was an earlier version of JAGS) or someone helped me and I forgot to write down what I did.

    Earlier this week, I spent an hour composing a request for help on the Spirit:Qi mailing list (a parsing framework in the Boost C++ libs). No response, despite a very active mailing list. I then followed their rules on submitting requests to the letter, which involved writing a standalone program extracted from the Stan graphical model parser. That took almost three hours. Still no response. This is following on hours of trying to figure out how to get error messages to report where the error occurred. If you wonder why software isn’t more robust, it’s to some extent because we just can’t figure out how to do things we’d like to do, depsite the best of intentions.

    Software is just incredibly frustrating.

    • This is probably too obvious, but there seem to be quite a few Boost experts/advocates on stackoverflow. Maybe you’d actually have more luck there than on the spirit mailing list…

  3. I liked the “that modeling feeling” post.

    One thing I personally have a hard time with is it seems like the window between inferences where a relationship is “obvious” by looking at the data, and problems where the data is too noisy and anything that is pulled out from a model are likely hallucinated garbage.

    Sometimes I wish I could be better/more efficient at carving out interesting insights that fall into this narrow space, but in practice, it often feels like a combination of getting lucky with a dataset and getting lucky with the question. Often I hit a wall where there doesn’t seem to be an intersection between interesting questions / what the data can answer / what requires an interesting model.

    I’d like to understand how to get better at this, but maybe this is too general of a problem and there’s no “tricks” other than accumulating experience and knowledge.

Comments are closed.