How can busy economists and political scientists learn R quickly?

It’s all R all the time around here, as Chris Blattman asks:

How can busy economists and political scientists learn R quickly? Is there a good guide? A set of handy ready-made programs? I learned MATLAB and STATA inside out in econ grad school, but now that I’m in a poli sci dept, and because the technology has progressed, I feel like I should learn R. I tried to teach myself a year ago (with no aids) and it was not pretty, even though I am usually pretty good at these things. So I abandoned it. But the article suggests R is easy to use. Did I miss the magic instruction book? If you have time to post on this, it would be a real boon to me and others I am sure.

My suggestion is to start with John Fox’s book, An R and S-Plus Companion to Applied Regression and follow up with my book with Jennifer Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, which is full of R code. Sure, lots of the code is messed up, but we’re planning to put together a clean version soon. . . .

If anyone has other suggestions for Chris, feel free to say something in the comments.

22 thoughts on “How can busy economists and political scientists learn R quickly?

  1. I would recommend a download of the PDF: "Econometrics in R" by
    Grant V. Farnsworth. It's well-written and I learned a lot from it even though I don't do econometrics.

  2. A related question, specially useful for grad students and junior faculties, I think, is: How can we keep learning new statistical programming technologies as they evolve (R will not last forever, right?). Perhaps, spending some time learning a "real" programming language, such as C or C++, so that we can really understand how this "high" level languages are built and work?

  3. For a slightly messy, but amazingly comprehensive intro to R — from data structures to regression models, with a heavy emphasis on graphs: check out this website: http://zoonek2.free.fr/UNIX/48_R/all.html.

    I often find it easier to copy and adapt code from these pages than dive into the R help files when I don't remember the syntax.

  4. I'm a grad student – I swear by Quick-R… at least to get started. Once someone learns the basics they should dive into some other advanced documentation to do exactly what they want to do.

  5. Frank: I took a look at this link, and it looks to me that Hiebeler is more of an expert in Matlab than in R. For example, item 75 is "Compute AB^-1" and the suggested R command is A%*%solve(B). This is a literal translation but I don't think it's the most efficient. I seem to recall reading an R or S manual from way back that emphasized that you want to avoid inverting matrices. I think you're supposed to do solve(a,b) or solve(b,a) or something like that.

  6. "lots of the code is messed up"

    I've bought Gelman/Hill, but haven't started it yet. Do you mean the code is buggy, or just scattered through the book?

    Another way of phrasing this question: Given that I have a huge stack of stuff I need/want to read, should I put this a bit to the back until you do the work you refer to above?

  7. There are probably better things out there now but years ago I learnt much from Phil Spector's short tutorial, which I think has been published as a book.

    I'd highly suggest any beginner to spend much time on learning the data structures (lists, matrices, data frames, single and double square brackets, etc.).

  8. Chris, I honestly think you should stick with Stata, provided it does what you want and you are prepared to pay for it. Your reasons for exploring R sound like it's the latest "fashion".

    Don't get me wrong, I am equally a fan of both Stata and R, but R can be painful and unreliable at times. For example, updating R versions requires downloading the latest complete version, and then downloading the latest version of all the extra packages you need. This can happen several times a year and there is no guarantee your update wont be buggy. Updating Stata simply requires the command typing "update" and, if required, buying the latest version every few years.

    In my view Stata is the easiest of the comprehensive stats packages to use, similar to SPSS, but much easier than SAS and R. And Stata Corp work hard to keep it up-to-date, which has been a problem with SPSS in the past.

    And support for Stata is almost good as that for SAS, although not needed so often, as the manuals and books are excellent. Support for R often depends on the kindness of strangers who are paid to do other things.

  9. Some simple differences seem to have been at least partly overlooked in the generally wide-ranging discussion in this and other related threads centred on R.

    A common criticism is that statistical payware (SAS, Stata, SPSS, etc.) is often not up-to-date in terms of cutting-edge procedures. This comment is frequently correct, but the situation may well be a deliberate consequence of companies thinking on a longer time-scale and from a different perspective than researchers:

    1. What's "hot" now — meaning often, deemed to be "hot" by its author and some like-minded people — often fades away quickly into obscurity or is superseded soon by something else. If companies always tried to keep adding every bit of new stuff, they would be adding many procedures of transient worth. (And they would also get even more criticism for bloated products.) It's often prudent for companies to wait a while and see what emerges from the fray, especially if people in a given area can not agree on what is "best" (usually the case).

    2. Any individual knowledgeable about some part of modern statistical science can usually look at a commercial package and list half-a-dozen omissions from their viewpoint. Put that all together across the field, then you probably have a collective wishlist of easily a thousand procedures. Adding all that to even a big commercial package is likely to be a bad idea for several simple but good reasons. And a company typically has no inclination to regard _your_ wish-list as especially compelling. (If you are the kind of person who could list half-a-thousand omissions, you strengthen my point.)

    3. Kind and degree of support are clearly different. How many people who write an R package now regard themselves as committed to supporting that package indefinitely? That's got to be more nearly the attitude of companies (or else they get flak for dropping a procedure).

    4. Researchers who can write good code quickly often underestimate the effort needed to write excellent code matched by certification scripts, plus a GUI front-end of the code when that is required, plus good documentation, plus provide support within a company for the support people! Anyone who has read a journal paper for a R-based project which implies that the crucial details are in the on-line help, together with the on-line help that implies that the crucial details are in the journal paper, neither being correct, will have felt a difference in standards biting rather hard! (Of course, you _can_ look at the code, and a jolly good thing that is.)

    5. Companies and packages vary but at least SAS historically and Stata increasingly see much of their mission as providing tools for competent users to write extras themselves. (This has also applied to varying degrees to Gauss, Matlab, etc., etc.). Thus the payware/freeware distinction is not sharp but fuzzy: if you buy e.g. Stata you simultaneously get access to one or two thousand user-written Stata packages in the public domain. They are very little use without Stata, except in so far as they could be translated into something else, but at least a fraction of each community operates largely under open source principles, and people who buy a product are buying into the expertise of thousands of user-programmers. (Of course, R is all that and free.)

  10. Personally, I recommend using the abovementioned books or references in an applied setting–chances are, your university or prior research has provided a number of homework problems or analyses which you require your grad students to complete (or which you have completed before, using a software package you "trust").

    My tactic for learning R several years ago was to simple replicate the same homework results using R, and referring back to my (graded) homeworks which I had completed using SPSS. Or, look to some of your own statistical analyses and try to replicate them in R. This will likely bring to light some of the issues inherent in the switch, as well as giving you an accomplishable task to use to verify that you've got it.

Comments are closed.