He wants to know what book to read to learn statistics

Tim Gilmour writes:

I’m an early 40s guy in Los Angeles, and I’m sort of sending myself back to school, specifically in statistics — not taking classes, just working through things on my own. Though I haven’t really used math much since undergrad, a number of my personal interests (primarily epistemology) would be much better served by a good knowledge of statistics.

I was wondering if you could recommend a solid, undergrad level intro to statistics book? While I’ve seen tons of options on the net, I don’t really have the experiential basis to choose among them effectively.

My reply: Rather than reading an intro stat book, I suggest you read a book in some area of interest to you that uses statistics. For example, Bob Carpenter is always recommending Jim Albert’s book on baseball. But if you’re interested in epidemiology, then maybe best to read a book on that subject. Sander Greenland wrote an epidemiology textbook; I haven’t read it all the way through, but Sander knows what he’s talking about, so it could be a good place to start.

If you had to read one statistics book right now, I’d suggest my book with Jennifer Hill. It’s not quite an intro book but we pretty much start from scratch.

Readers might have other suggestions.

74 thoughts on “He wants to know what book to read to learn statistics

  1. I think Alpert’s book and really any sabremetric materials, including some websites that discuss defensive ratings, win shares, etc. would serve a philosophical interest in knowledge. Example: look at how win shares developed as a method for estimating impact on a team or defensive zone player ratings as they’ve integrated better data about exactly where a ball is hit, how long it is in the air, etc. so you connect the concept of what you’re trying to know with better measurements and associated issues of data integrity limits.

  2. I think he said “epistemology” as opposed to “epidemiology”. He’s a philosopher, so probably philosophy of science/mathematics type books are what he’s seeking. I hear E. T. Jaynes’ Probability Theory: The Logic of Science is a great start for this sort of reading…

  3. I’d suggest Bill Cleveland’s books, Elements of Graphing Data and Visualizing Data, but that’s because I gave up on formal statistics long ago because I have yet to deal with any data set that approaches the normal distribution.

  4. In this vein, would Andrew or any of the commentariat have book recommendations for someone interested in getting a strong grounding in statistics with a particular interest in economics and consumer finance?

  5. It’s very difficult to just educate yourself by reading a book or two or three. You need objective feedback on your mistakes, there will be many. I would suggest doing the online Graduate Certificate in Statistics from Sheffield (one year). It’s worth it. If you survive that, I suggest doing the MSc (two or three years), also delivered online. They have three specializations: Medical Statistics, Statistics, and Financial Statistics. The last one has special math requirements, you can’t be admitted unless you prove you have them.

    I reviewed both courses here:

    http://vasishth-statistics.blogspot.de/2011/12/part-1-of-2-review-of-graduate.html

    http://vasishth-statistics.blogspot.de/2015/02/getting-statistics-education-review-of.html

    • I read through your review of the Sheffield course as I have a colleague who is looking for something like this. I found:

      “Another amazing fact is that frequentist statistics is standard practice in medicine. I would have expected that Bayesian stats would dominate in such a vitally important application of statistics. I am willing to use p-values to make a binary decision to help a journal editor feel good about the paper, but not if I am deciding whether drug X will help stave off death for a patient. I am really glad that I do not need to enter the job market as a statistician. If I were starting out my career after finishing this degree, I would probably have done into a pharma company, and it is horrifying to think that I would be forced to deliver p-values as decision-making tool.”

      Well said sir!

      • Yeah, what is going on medical statistics? It was absolutely insane to see the med stats books packed with p-value based decisions.

        I hope that Frank Harrell can force some change in at least the medical regulatory bodies.

        • Well, some of us are trying to do something about it! It’s an uphill struggle though. I think a large part of the issue is that frequentism in general and NHST in particular was the (very) dominant ideology at the time that evidence-based medicine developed (at least for the people invlved at that time), so that became the standard methodology for medical research. There has been a tendency for method to solidify around certain procedures and ways of doing things, possibly because the majority of (research) practitioners in the field are not particularly statitically literate, so prefer to adopt standard methodologies. One consequence of this is that change is slow. There are lots of reasons why it’s hard to get people to change practice – one research idea I’ve thought about is exploring the many different barriers to change and how they could be overcome, but I wouldn’t really know where to start with that (need a propoer social scientist!).

        • > medical regulatory bodies
          Don Berry did a lot on that, but learned _standard_ Bayes was not appropriate for regulatory bodies as they need to assess error rates – so some sort of calibrated Bayes was necessary.

          He write this up in a couple of papers I can’t locate quickly ;-)

        • A while ago I heard a talk by Scott Berry (Don’s son and colleague) with title something like, Being a Bayesian in a Frequentist World, which I think pretty well expresses the problem.

          (Scott together with Brad Carlin, Jack Lee, and Peter Mueller have a 2010 CRC Press book, Bayesian Adaptive Methods for Clinical Trials.)

        • Keith, can you expand on this, why do they *need* to assess error rates, and why is standard bayes not appropriate for assessing error rates?

          In my naive view here, an error rate is meaningless by itself. Let me give an example, suppose you have a drug and it has some effect on cholesterol of X. Now, you might for example have 50% of people less than X and 50% of people more than X. So if you want to know the “negative” error rate, it’s 50%!!!! But if the 50% who have less than X have negligibly less than X, then it doesn’t matter!

          So, by itself it seems an error rate is meaningless, the only meaningful thing is rate of error of a certain size, times the size of that error… which immediately leads to Bayesian Decision Theory.

          Now, if you’re saying that “the law requires that the regulatory body assesses error rate” then to me this is just like saying “Lawyers made a mistake, and the law needs to get changed”

        • The argument (from Scott Berry et al’s book) (and from memory) is that the aim of regulatory bodies is to ensure that only a small percentage of the drugs that get approved don’t actually work, so a low Type I error rate is what they are trying to ensure.

          I don’t really like that argument for a couple of reasons; it relies on the dichotomy that drugs “work” or “don’t work” which doesn’t reflect messy reality (and has been much criticised by Prof Gelman among others), and it doesn’t take account of severity of conditions or toxicity or drugs aand all of that, which surely should play a role in regulators’ overall decisions and strategy. I’m sure regulators would be more bothered about type I errors in really toxic cancer treatments than non-toxic treatments for warts, for example. So I don’t really buy the argument that they are (or should be) pursuing a single overall Type I error rate.

        • Yep. I agree with all of this. Further, Wald’s theorem says ultimately that if you want to minimize “Type I” error that the class of techniques you should use is basically Bayesian stats with a zero-one cost function. So, yeah, if you like zero-one cost functions, then by all means go ahead, but you still need to use Bayes ;-)

        • Thanks Simon (though I should try to find one the his papers that discusses this.)

          It was the error rates of the regulatory agency’s decisions – approving things they shouldn’t and not approving things they should (related to “Type 1” but more generally).

          They also are not in a position to claim “consistent with the assumptions” we accepted because they should know those assumptions are not just wrong but probably not what they would accept if they really understood what those who did the research do.

          So one needs to consider what “usually” happens when a given Bayesian method is used by an applicant seeking approval over repeated applications by various applicants.

          What the FDA Guidance went for was to require a simulation drawing from a fixed parameter in the null and a fixed parameter in the alternative and if I understand correctly a rational being required for exceeding usual 5% for the first and lower than 80% for the second.

          I think something like this would better http://statmodeling.stat.columbia.edu/2016/08/22/bayesian-inference-completely-solves-the-multiple-comparisons-problem/ but something is definitely needed beyond taking the posterior probabilities as being relevant and used literally in their decisions.

        • Keith: what I hear from you is in essence “don’t let the fox choose the lock on the henhouse” and I can see how that is a clear issue. You can’t just accept that the model the pharma company gives you is the right one to use… But saying that you can’t accept the fox’s model and saying that you can’t accept any method in which you need to choose a model (likelihood and prior) are two different things.

        • > ” saying that you can’t accept any method in which you need to choose a model (likelihood and prior)”
          Certainly did not meant to suggest that – rather that one needs to do more make than just deduce the consequences of the likelihood(s) and prior(s) chosen (aka getting and interpreting the posterior(s) as “prima facie” relevant).

  6. If you’re into epistemiology, “The nature of scientific evidence” edited by Taper and Lele might be of interest. It’s a philosophical back and forth discussion between frequentists, likelihood-ratio(ists/ans?) and bayesians. Also, “The ecological detective” by Hilborn and Mangel is a rather entertaining intro to flexible modelling, with an ecological slant of course. Neither is a typical stats textbook though.

    Otherwise, I second the recommendations of “Statistical rethinking” and Gelman and Hill’s “Data analysis…” which both has really good intro chapters of some basic concepts.

    I haven’t come across a good resource that mix nuanced modern philosophy of science with nuanced modern statistics, and I am curious to know if anyone else has. When I took an philosophy of science course as a graduate student a couple of years ago I found there was a large disconnect between the philosophy and practices of science, not least the data analysis.

    So good luck bridging the gap!

    • >>> large disconnect between the philosophy and practices of science<<<

      If only philosophers had to practice science & get their hands dirty before they dabbled in philosophy, we may make some progress on bridging the gap.

    • Ian Hacking’s ‘Introduction to Probability and Inductive Logic’ might be good first foray into probability, statistics, and the philosophy of science for those who are philosophically inclined. It covers a great deal of material, is accessible, and wouldn’t require brushing up on mathematics. It could easily be studied in parallel with a more mathematically-focused text, or perhaps as a prequel to one.

  7. > read a book in some area of interest to you that uses statistics. […] if you’re interested in epidemiology, then maybe best to read a book on that subject.

    Please recommend books for a statistic-curious, data-minded linguist!! (There’s a bunch of stats+ling books and I’ve read—or went through—a few, but I’m interested in recommendations, incl. on the basics. I can program in Python, I can use R if I have to, and I like Tufte.)

  8. My path into (Bayesian) statistics has been helped by having had a rigorous course in probability. Calculus and matrix algebra also highly useful, but once you appreciate that Bayesian inference is extending probability theory to parameter and model uncertainty, things get easier. I also find I often understand non-Bayesian techniques better by reformulating them in a Bayesian framework- i.e. Lasso –> laplace priors, etc.

    • I agree. I made basically no headway in Bayesian statistics (I stopped reading Gelman et al’s first edition BDA book almost immediately) until I did a course in calculus, probability, and matrix algebra, all geared towards applications in statistics. Gelman and Hill was very helpful but I still only had a vague idea about what was going on and only understood the details after reading textbooks like Lynch (after the math etc courses).

      • Hi Shravan and Chris,

        can you recommend a good course in matrix algebra? Probability theory I have a good foundation already and calculus I had good background from school. But not enough knowledge about matrix algebra. I noticed that in many books for applied statistics (I want to apply it for social sciences but also analyze data about e.g. microbiota. Is matrix algebra really necessary to understand e.g. Andrews book with Jennifer Hill? If so, it would be cool if someone knew a good course that covers all the basics needed without going too much into detail about what I don’t need….

        Thanks

  9. I think if you posed the same question to the “machine learning” community they would be more likely to recommend “An Introduction to Statistical Learning” by James, Witten, Hastie, and Tibshirani or the very similar (longer) The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman. Always interesting (as an outsider) to see how these similar fields have different philosophies

  10. Wow, what a response! Thanks, first off to Andrew, but also to the rest of you who posted in this thread — some great info!

    A bit about me: As some of you have noticed, I do have a philosophy background, but no, I’m not a philosopher, but it is a core interest of mine. I’m generally interested in rationality and decision making, and statistics is an important part of that, hence my interest. I asked for an intro book as my math is quite a bit rusty, though I assume I will have to refresh calc and the rest of it to make any serious headway. Luckily for me, my wife is a mathematician, and one of my best friends is an econometrics PhD, so I can get help when needed.

    I will definitely check out all the books in this thread though, both the more applied and more philosophical ones. Thanks again everybody!

    • Tim: it could be helpful for someone like you to know that there is a major tension in statistics between two major schools of thought.

      The first, frequentist statistical ideas are at their heart, about determining what random number generators might do and to try to see if some observed data values could be like the output of these random number generators. This leads to p values and to hypothesis testing.

      The second concept is about quantifying what is more or less likely to be true based on some initially assigned knowledge, and leads to Bayesian inference, and its more restricted form, Likelihood based inference.

      These two things are fundamentally different *in terms of their meaning*. If a Bayesian says “there is a 75% probability that it will rain today” it ultimately means something like “under some assumptions about the physics of the weather, and some data about what it was like yesterday, and some range of plausible measurement errors and values for key atmospheric variables…. 75% of the weight of evidence suggests at least some rain”.

      whereas a frequentist who tells you 75% chance of rain means “under some assumptions about which random number generators generate outcomes that are similar to the actual historical weather, 75% of the random numbers we generate for tomorrows weather show rain”

      So, when you go looking for info on statistics, if you keep in mind that some people put probability weight on different logical possibilities, and some put probability only on how often events would happen if they came out of a random number generator, and this difference is fundamentally two entirely separate fields both of which call themselves statistics.

      • Thanks Daniel — this much I do understand, primarily thanks to James, my econometrician buddy. I’m really interested in causality, epistemology, and experiential methods, in terms of figuring out what is really going on the in the world. Well, as much as we are able to figure out with any degree of confidence :) I tend more towards the Bayesian side of things, though as I am not really qualified to make authoritative statements / decisions on the topic, it’s more of a gut feeling than a practical or educated preference.

        I look at learning (and relearning) stats as equipping my mind with the tools to make better decisions, both in my business and personal lives. And to be able to think about and read scientific papers without pestering my wife about the math all the time.

    • I’ll be curious if you decide to read Jaynes, how that goes for you. I found his book irritating after a short while, and gave up rather quickly. As somebody said above, it comes with a hefty dose of polemics, and what felt to me like a lot of mathematical asides/irrelevancies. Michael Betancourt’s document (I linked above) is a much better bridge from probability theory into statistical inference, IMO (disclaimer, I consider myself a literate user of statistical and quantitative methods, not a mathematician/statistician).

        • Hmm… I’m going to object, but I won’t go into the details of my objection, instead I’ll elaborate on maybe why it is that you have this impression.

          Both Jaynes and Rand were writing *against* a particular popular established “truth”. For Jaynes, it was against the interpretation of statistics as being about the properties of random number generators, and for Rand it was against the rising popularity of communism among a group of social revolutionaries during the period starting in the crash of 1929 and continuing through the end of the second world war, influenced greatly by the depression and soforth.

          Jaynes had seen how the view of statistics as fundamentally about the frequency properties of random number generators removed the possibility of incorporating physics into mathematical models of the world, and he was a physicist and took great umbrage at this….

          Rand had seen how the reality of communism gutted the country where she was born and destroyed the real wealth of millions of people, and took great umbrage at this….

          So, they have similar kinds of motivations and similar vehemence… beyond that I don’t think comparisons are really apt.

        • In particular, Rand made unverifiable claims about morality (really it came down to “I believe X”), whereas Jaynes made externally verifiable claims about mathematical and physical calculations. So Rand’s group is following a “cult of personality” making claims about “what is the good and the true”, and Jaynes group is following along with pencil and paper verifying the mathematical properties of a system of calculation…

          At some point you could argue that “Calculating with Bayesian probabilities” is itself a cult, but then by the same methods you could also argue that “algebra” is itself a cult, so I don’t think it gets you anywhere useful.

        • Even Andrew could be thought of as writing against the same “particular popular established “truth””, right?

          But it isn’t annoying to read Andrew. I think it’s in the writing style. Andrew comes across as factual, practical and persuasive. Jaynes comes across as haughty and polemic.

          Perhaps it’s a subjective thing: One reader’s irritation can be another’s ecstasy.

          PS. Another author I put in the Jaynes category is Judea Pearl.

      • I’m making a list of everything posted in this thread, and will be checking them all out. The Pearl work on causality I already have on my reading list, and I think I’ve had the Jaynes book recommended to me also, but haven’t checked it out at all yet. The Betancourt looks accessible upon a first scan also — and it seems I still remember some symbolic logic also, but I can relearn that quite quickly.

        • You can get the first chapters of the Jaynes book though Larry Bretthorst’s website (http://bayes.wustl.edu/): http://bayes.wustl.edu/etj/prob/book.pdf . It may be all you need to get a feel for the epistemological thrust. Similar for McElreath, whose first chapter (freely available here: http://xcelab.net/rm/statistical-rethinking/) gives a useful overview of the link between “truth”, philosophy of science, and where statistics fits in.

          I’d say you might as well avoid most standard stats books. As an epistemologist, you are likely to be disappointed by the lack of any meaningful justification for the methods, which seems like a key to what you are after.

  11. I’d recommend the self-study exam series of the Royal Statistical Society.

    They provide a syllabus, study guide, literature recommendations, and old exams, and then allow you to sit the exam (without any other formal qualification – just need to pay, obviously).

    Depending on the desired level, you can take the Ordinary Certificate, Higher Certificate, or Graduate Diploma (where the latter is “equivalent to that of a good UK honours degree in statistics”).

    The reader writing in does not want to take ” classes, just working through things on my own”. In my experience (I took the graduate diploma), this is sort of the best of both worlds – self study, but structured, and with a certification.

    http://www.rss.org.uk/RSS/pro_dev/Examinations/RSS/pro_dev/Examinations_sub/Examinations.aspx

  12. Tim, actually, having read all the comments, including yours, I’m not so sure anymore – the RSS exams don’t cover the philosophical side so much, they’re more a (good) introduction to the canonical stuff (foundations, several applications). Well, at any rate, check it out – the syllabi and literature recommendations alone might be helpful.

  13. I think you might like Stigler’s The History of Statistics. For me it helped clarify a great deal, both of things that never made that much sense to me and of certain kinds of debates. It helped me see the kinds of problems different people were trying to solve and how that shaped things.

    If you really want an intro level statistics book either to plow through on your own or as a reference book most of the suggestions here are too advanced. They are assuming a level of baseline knowledge. So the question is whether you feel you have fluency in those concepts. Unfortunately friends and family won’t get you that even if they can help with getting through a chapter. You can somewhat separate out books by whether they assume you are fluent in linear algebra, fluent in calculus, both or neither.

    When it comes to intro books the basic problem is that most of them are written for college students doing the one semester required course who will quickly flee. The books “cover the content” but don’t necessarily give you a framework for moving forward. They are often awful and basically race you through a bunch of formulas and tests. Having done some looking last year at books for undergraduate beginners, If you feel like you need the real, I know nothing, basics in some ways the GAISE inspired texts such as Agresti and Franklin are the best. They won’t hold you back once you head into more challenging material.

  14. I’d put in a suggestion for two more free resources from Cosma Shalizi. The more introductory one is a short set of notes on probability theory and statistics, it basically doesn’t tell you any of the how or why, but it does cover in a very easy-to-follow way all the basic concepts that you can then go and learn about in depth from other sources. It’s greatest virtue is that it’s free and (fairly) non-indoctrinating.

    It’s here: http://bactra.org/prob-notes/srl.pdf

    The next is the truly fantastic book he has written called ‘Advanced Data Analysis from an Elementary point-of-view’. He says it assumes previous courses in calculus, linear algebra and intro stats, but to be honest I think you can get a lot of it without any of those. You maybe won’t be able to follow the derivations without those prerequisites (there is a lot of calculus and matrix algebra) but the actual virtue of the book is teaching how all these concepts fit together. It’s like a map of all of statistics, it doesn’t take you down deep into any valley, but it tells you what the valley is (e.g. linear regression), why someone thought of it in the first place (the concept of a general regression function + linearity is the simplest choice of dependence), his take on what you should think about it now and what you shouldn’t think about it (it is finding the (predictively) optimal linear approximation to the regression function, it is not how the variables are actually related and it says nothing about whether the relation is causal or not, nor whether it is a *good* approximation), and how it is related to other ideas (linear smoothing, nonlinear dependence, local linear regression, structural models and many others). I would probably say it is mostly not philosophical but it is *the* answer to ‘I have some data in front of me, what now?’.

    It’s here: http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/

    Once you’re ready to start worrying about philosophy and particularly the frequentist/bayesian question here’s a quick list of things that might be interesting (Disclaimer: I have not read all of them, certainly not learnt everything in all of them):

    – Some of the writings of Fisher, Neyman, CS Peirce and others
    – Jaynes’ Logic of Science (discussed above)
    – Judea Pearl or the Spirtes/Glymour/Scheines book on causation
    – Deborah Mayo ‘Error and the Growth of Experimental Knowledge’
    – Leonard Savage ‘Foundations of Statistics’
    – Lots of stuff on Bayesian consistency (http://bactra.org/notebooks/bayesian-consistency.html)
    – Anything Cosma Shalizi writes about Bayesianism (http://bactra.org/weblog/cat_bayes.html)
    – some important results such as (in no particular order and obviously incomplete) Cox’s Theorem, Von Neumann–Morgenstern utility theorem, Bernstein–von Mises theorem, Wald’s theorem, Birnbaum’s result on the likelihood principle, de Finetti’s theorem (hmm coming in a little heavy on the Bayesian side here…)
    – Some other approaches such as Geisser’s Predictive Inference and Laurie Davies’ approach which is based on weak topologies on spaces of probability measures induced by metrics like total variation distance (afaik. His stuff is sometimes hard to follow…)

    I hope that’s a fairly balanced representation of the area with good arguments and ideas on both sides.

Leave a Reply to Daniel Lakeland Cancel reply

Your email address will not be published. Required fields are marked *