I received the following email from a Ph.D. student who wishes to remain anonymous:

I came into operations research with a masters in mechanical engineering therefore my statistical analysis is very unbalanced. I tool stochastic processes I and II (both PhD level courses) but I have no applied background in statistics, now I am doing a lot of number crunching using R but I believe I still lack broad understanding of statistical tools and I don’t know enough about analysis (although I believe I have a solid understanding of the basics like mean, divergence, confidence interval, etc)

Given all my embarrassing situation I would appreciate it if you would please, in a blog post, lay out a learning plan for people like me who like to dive deeper into statistical analysis and do it in daily basis but come from weird backgrounds like mechanical engineering.

My reply: Since you’re not at Columbia, I can’t simply recommend that you take my course. You could read my books, that might help. More seriously, you gotta think about all the great skills you have that many statistics students don’t have: you can make yourself useful in a lot of different sorts of projects . . .

As someone formally trained with a lot of probability but zero statistics (I was an EE undergrad and a bio grad), I can related to the PhD student who asked the question. I very strongly recommend reading through Jaynes "Probability Theory : The Logic Of Science" to understand how a generative, Bayesian approach to data analysis can be both intuitive and useful.

But just as in any engineering field, there's no substitute for -doing the problem sets-, I've found that in statistics there's no substitute for -reproducing the results- from papers that you read. Implement an algorithm from scratch, do the analysis yourself in R, etc. Often you'll find subtle differences, and in my experience authors are more than willing to discuss them with you.

Finally, I think Andrew is also correct about your background — a little domain-specific knowledge can be a very powerful thing, both in terms of knowing what sorts of problems a field finds worthwhile and knowing enough about the system to recognize a funny, "interest" result.

Andrew's books are a good place to start. I also find George Box's books really useful to get a practical perspective. Statistics for Experimenters is amazing. If the student understands baseball, Curve Ball is well done.

Definitely check out the first few chapters in 'Data Analysis Using Regression and Multilevel Hierarchical Models'. For me it is/has been one of the best intuitive presentations of model specification and interpretation. (note, I have affiliation with Gelman, I just started following the blog because I like the book so much). 'Using Econometrics: A Practical Guide' by Studenmund is also good. John Fox's books are also supposed to be quite helpful.

One important thing to keep in mind is that many statistical methods Just Don't Make Sense. Really. You might expect otherwise, if you've previously been exposed to fields like physics where anything controversial is at the frontiers of knowledge, far from what you'd need to learn to get started. But statistics isn't like that.

This is true of many frequentist methods, of course, but also of many supposedly Bayesian methods. In particular, despite the various interesting things Jaynes has to say, his big idea – "maximum entropy" – is in the Doesn't Make Sense category. Supposedly, it lets us infer an unknown probability distribution by choosing the distribution that maximizes entropy subject to the constraints that the expectations of various functions with respect to this distribution have various values. But if you don't know the distribution, how is it that you know the expectations of various functions with respect to this unknown distribution? As a new PhD student, coming to this from a computer science background, I read lots of Jaynes' papers, trying to figure out how this is supposed to work, before realizing that it Just Doesn't Make Sense. The same is true of plenty of other things you'll encounter when coming at statistics from some other background. (On the other hand, if you learned statistics from an early age, you either already know this, or you probably never will.)

Of course, the challenge is separating the things that Just Don't Make Sense from the ones that you just haven't understood yourself yet.

Although the maximum entropy principle just Doesn't Make Sense, it is at least the solution to a well-defined combinatorial problem. To problem is figuring out what relevance that combinatorial problem has with statistical inference.

Radford,

I agree. But another way of saying this, equally true in my opinion, is that three are lots of different incompatible principles (including unbiased estimation, Bayesian inference, maximum entropy, bootstrapping, and lots else), all of which Do Make Sense.

This is a point I discuss in my recent article in Bayesian Analysis; see the third full paragraph of page 8 there.

As I wrote here:

Scientists like a good statistician. I have no doubt that Larry [currently a non-Bayesian] is useful to his scientific colleagues and that, in working with him, they will have justified faith in his methods and principles. I also have no doubt that the scientists with whom Don Rubin [a Bayesian] works have faith in his methods and would prefer the Bayesian interpretation. There are many roads to Rome, and I don't think that the statement "Larry's colleagues prefer confidence coverage" should warn me off of Bayes, any more than the statement "Don's colleagues prefer probability inference" should warn me off of frequentist ideas.