This graduate student wants to learn statistics to be a better policy analyst

Someone writes:

I’m getting a doctoral degree in social science. I previously worked for a data analytics and research organization where I supported policy analysis and strategic planning. I have skills in post-data visualization analysis but am not able to go into an organization, take raw data, and turn it into something usable. I’m planning to use my elective credits to focus on statistical analysis so that I can do just that.

I heard about the work you’re doing after listening to your EconTalk episode and want to learn more about issues using quantitative research in social sciences (and try to connect it as much as possible to the field of education). I have the option to create an independent study but there isn’t anyone at my institution is familiar with this work and able to construct a plan for it. I would love some advice as to how you think I might construct an independent study focused on these concepts (as well as thoughts on the background knowledge and skills in stats I would need to be able to understand the material). Any suggestions you can send my way would be much appreciated.

My reply: I’m not sure, but as a start you might try working with my forthcoming book, Regression and Other Stories (coauthors Jennifer Hill and Aki Vehtari) and my edited book from a few years ago, A Quantitative Tour of the Social Sciences (with Jeronimo Cortina).

Maybe the commenters have additional suggestions?

15 thoughts on “This graduate student wants to learn statistics to be a better policy analyst

  1. Funny you should ask. We here at Florida State University, offer a degree which would exactly fit you needs. https://education.fsu.edu/measurement-and-stats.

    Seriously, there are a number of educational measurement programs that have quality offerings, maybe one at your university.

    If you are serious about education, you absolutely need to take a course or two in psychometrics and test design so you understand the limitations of educational measurement. Our program, at least, also offers data analysis courses which are (a) focused on education, and (b) more applied (focused on interpretation rather than theory) than those offered by the Stat department.

    BTW, I do use Andy & Jennifer’s book for two of my courses (I’m waiting eagerly for the new edition).

  2. I would suggest looking into the quantitative track in the psychology department. You want to get a solid foundation comparing groups with respect to outcome variables of interest, while adjusting for confounding variables, from ANOVA and Regression perspectives (General Linear Model). Once you have developed this skillset, take a calculus course concurrently with a multivariate stats course. (Start learning a couple of statistical software tools here). Next, take a course on measurement and scaling, which should cover classical and latent trait perspectives along with this take applied education courses (e.g. motivation to learn, standards based education, curriculum based assessment). Next, take a linear algebra course along side of a stochastic methods and data analysis course (maybe found in the MBA program), which will get you comfortable with probability distributions and simulations. Read the book “Visualizing Data” by William Cleveland. This foundation should enable you to independently learn from books.

    • > get you comfortable with probability distributions and simulations
      Agree with that.

      > adjusting for confounding variables, from ANOVA and Regression perspectives
      Can’t agree with that without the “Other stories”, at least as its usually done. What I recently saw from a recent grad from a Masters in statistics program “variables selected if p-value <0.3 in univariate analysis with a p-value < 0.05 in final analysis".

      Perhaps interestingly, in James Robins' The ‘Causal Revolution’ in Epidemiology and Medicine: History, Controversies and Future he referred to the (distant?) past when Epi was just regression – put treatment and confounders in regression model.
      https://harvard.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=8f66a523-192d-4bce-950f-aaff0175d748

      Most places seem to be so far behind what needs to be taught to avoid many of the mistakes of the past…

      • > Most places seem to be so far behind what needs to be taught to avoid many of the mistakes of the past…

        While obviously different people have different experiences, my own (as both instructor and student) is that most quantitative psych/applied stats *courses* and their instructors actually do advocate good practices and expose students to important problems with how methods are misused in scientific discourse.

        I find that mistakes arise when those students proceed to use those techniques on their own outside the structure of the course and either discover for themselves—or worse, are directed by their advisors—that shortcuts and lazy thinking that wouldn’t have passed muster in the course are actually accepted and even rewarded in the literature.

        From my perspective, I don’t think it is “places” that are far behind per se, so much as it is that students are *dragged* behind by the weight of decades’ worth of mistakes that live on in the published literature but that they do not feel comfortable cutting loose.

        Of course, one of the big values of this blog is to act as a counterweight to those mistakes—not least through your own commentary!

  3. I’d say 75% of the work in “go[ing] into an organization, tak[ing] raw data, and turn[ing] it into something usable” is data extraction and cleaning. This requires solid knowledge of the programming mechanics of something like R (my go-to is the excellent data.table package) or python — and then some familiarity with databases, SQL and cloud products. The stats knowledge comes in after you have a proper dataset, but is also critical.

  4. Here’s a wrinkle: Suppose the grad student follows some or all of these very solid suggestions, graduates, and coauthors a paper showing that a one-hour intervention on 15 students improved standardized test scores a year later? After we hear the uncritical coverage on NPR, we will all constructively criticize his paper for using the methods and interpretations he was taught–after following our guidance.

    We all know that typical education and psych programs are rife with professors, courses and labs that teach null hypothesis testing, forking paths, using small samples and noisy measures, etc. Textbooks and software packages tend to reinforce, not correct, the most common misconceptions. Maybe the question isn’t “How does an education grad student get statistical training for future research?” but “How does an education grad student evaluate the training available, choose a program that isn’t repeating the sins of the past, and/or supplement with other resources?”

    I don’t know the answer to the second question–I’d say “Read the work of profs in the program or graduates or authors to make sure it follows high standards,” but evaluating methods is a skill gained by studying methods from someone who teaches them correctly. There are a number of people like Andrew whose blogs make clear what their priorities are, and programs like (presumably) the FSU program linked above, but what if you can’t transfer?

    Someone should come up with review process or approved list or a set of online resources so that we don’t always have to correct bad learning after the fact–he said, knowing full well he would not be that person.

  5. If you’re going to any education work, be ready to have almost everything you work for destroyed by the simple recognition of the gobs of assumptions you had to make to do your study that you didn’t even realize you were making, but couldn’t possibly account for even if you knew you were making them.

    For anyone hoping to apply statistical analysis to social phenomena and ultimately policy prescriptions, keep in mind that you have about as much knowledge of what’s really going on as a Neandertal physicist had about the laws of motion. If you remember you know almost nothing, you might be able to learn a little teeny tiny something. But if you forget that, you’ll just be wasting everyone’s time and money, and possibly screwing everyone up even worse with stupid policies from bad research.

    • Speaking in defense of the Neanderthal physicist… you ever bring down a wooly mammoth with a knapped stone arrowhead? Just because you don’t know F=MA, doesn’t mean it doesn’t work.

  6. Don’t overthink it (as the training pipeline is already reasonably well-established) —

    Use your elective credits to start working your way through the standard degree courses for computer science (with a focus on AI/ML) at your university, rather than trying to create a new independent study.

    (Incidentally, the above scales well to the PhD, master’s, bachelor’s, and associate’s level as general advice as background for any field, as we move into the 2020’s.)

    -Allen Schmaltz

    • The training pipeline is well established… it produces a steady stream of horrific trainwrecks…

      I’m with Michael Nelson above… the problem with doing this today is that the educational system in stats and social sci is at the heart of causing the problems we see.

      • Interesting. Well, I’m not up-to-date on the current state of the applied stats courses in Economics/Psychology/etc., but in my opinion, the CS curriculum/training pipeline (which was the pipeline I was referring to) these days is fairly stable/good (at least in relative terms, and note that the path to the ML/AI courses entails the probability/etc. pre-requisites), and students will need that background in their careers anyway, so in terms of elective coursework, they should spend their time optimizing toward that objective while they have the opportunity to readily do so (particularly if, as you suggest, the available applied stats/methodology coursework at the university is shaky). Once they have that background, they can (probably) pick up the rest on their own. Also, the aforementioned is orthogonal to whatever core courses they are taking in their field (e.g., in qualitative or non-technical, in this sense, fields like Economics or Linguistics), which presumably they would still want to take in order to study whatever they’re studying. Actually, if that last part (the taking courses part) isn’t true, I would recommend flipping it, and mainly just doing the CS degree, and the other field can be the elective (or an independent study). (Of course, this is just my unsolicited opinion, as concrete advice to give a student today signing up for a class tomorrow, prompted from seeing this come up on the Twitter feed for this blog. Students should do whatever they want in the end. More speculatively, I’m also inclined to think that giving the above advice to multiple generations of students might also eventually resolve the broader issues you mentioned, to the extent they could be resolved, naturally over a couple decades or so in a bottom-up fashion.)

        -Allen Schmaltz

  7. I earned a doctorate in social psychology many years ago and work in data analytics. Like most people who went to grad school when I did, my stats training was poor and rife with all the problems detailed in this blog. I really wish I could go back to school, but it is a little late for that! So I’m trying to “re-train” and educate myself on my own time…one way is by reading this blog. But there are also some psych professors who are working to train the new generation in better ways. Here are a couple of good resources I found:

    Russ Poldrack’s textbook for psychology undergraduates (provides a good overview), available online: https://statsthinking21.org/

    Graduate statistics course for psych PhD students at Stanford: https://psych252.github.io/

  8. My advice would be to focus on:

    -any peaceful hobby
    -your studies and career and not advice from blogs
    -endless Bayesian vs frequentism battles
    -experimental design and survey design concepts. IMO too often people think ‘data and then stuff happens’, but don’t think about how the data arose

    Justin

Leave a Reply to Daniel Lakeland Cancel reply

Your email address will not be published. Required fields are marked *