Tim Gilmour writes:

I’m an early 40s guy in Los Angeles, and I’m sort of sending myself back to school, specifically in statistics — not taking classes, just working through things on my own. Though I haven’t really used math much since undergrad, a number of my personal interests (primarily epistemology) would be much better served by a good knowledge of statistics.

I was wondering if you could recommend a solid, undergrad level intro to statistics book? While I’ve seen tons of options on the net, I don’t really have the experiential basis to choose among them effectively.

My reply: Rather than reading an intro stat book, I suggest you read a book in some area of interest to you that uses statistics. For example, Bob Carpenter is always recommending Jim Albert’s book on baseball. But if you’re interested in epidemiology, then maybe best to read a book on that subject. Sander Greenland wrote an epidemiology textbook; I haven’t read it all the way through, but Sander knows what he’s talking about, so it could be a good place to start.

If you had to read one statistics book right now, I’d suggest my book with Jennifer Hill. It’s not quite an intro book but we pretty much start from scratch.

Readers might have other suggestions.

I think Alpert’s book and really any sabremetric materials, including some websites that discuss defensive ratings, win shares, etc. would serve a philosophical interest in knowledge. Example: look at how win shares developed as a method for estimating impact on a team or defensive zone player ratings as they’ve integrated better data about exactly where a ball is hit, how long it is in the air, etc. so you connect the concept of what you’re trying to know with better measurements and associated issues of data integrity limits.

I think he said “epistemology” as opposed to “epidemiology”. He’s a philosopher, so probably philosophy of science/mathematics type books are what he’s seeking. I hear E. T. Jaynes’ Probability Theory: The Logic of Science is a great start for this sort of reading…

Epistemology, epidemiology . . . they’re the same thing, right?

As Yakov Smirnov never said: in Epistemology, the people study what is upon.

Jrc:

I was inspired by your comment to look up Yakov Smirnoff on wikipedia. It’s a fascinating read. Maybe he can come out of retirement and do Trump/Putin jokes.

He doesn’t need to. He performs regularly in Branson, Missouri. He has his own theater there!

Yakov was funny, with a great routine based on inversions of American expectations. Back in the late 80s, at the height of his career, I saw him live in the Royal Oak (MI) Music Theater. Some local guy opened for him. Told some jokes about men and their tools. The opener was a riot; just about brought the house down, and I thought Yakov’s routine fell flat in comparison. I could never remember his name, or find it, but years later I came across this TV show, and there was the routine, replayed weekly. Tim Allen, very early in his career, had opened for Yakov.

Sander Greenland would agree to a degree. See paper reference in post scriptum here: http://statmodeling.stat.columbia.edu/2004/10/14/bayes_and_poppe/

Largely (especially the meta-analysis chapter), but Epidemiologists have to learn many conventions and skills that unlikely would help much outside of epidemiology.

Similarly, statisticians have to learn many conventions and skills that unlikely would help much outside of statistics and especially of having to do their own analysis when they get messy.

The problem, as I think I have learned from trying to cut enough of that stuff out but not too much (e.g. with epidemiologists, toxicologists, etc.) is not knowing enough of the right stuff that has to be added so they get a not too wrong understanding of statistics. (Some material I used here https://phaneron0.wordpress.com/2012/11/23/two-stage-quincunx-2/ )

Having said that, Richard McElreath’s _Statistical Rethinking would be a good choice (though in it is seems he had to enable his students to actually do their own analyses and so distracting material on techniques that are not ideal but rather doable by them.)

(As an aside, Richard did mention to me that he tried out using Galton’s two stage quincux similar to what I had used in the above link, said he thought it would be ideal but the students did not take to it, but he thought he could make it work. To me, that means there is something conceptually important in his book currently missing.)

Also, I would suggest interleaving this stats learning with epistemology reading of authors like Susan Haack e.g. https://www.academia.edu/20018761/Legal_Probabilism_An_Epistemological_Dissent_2014_

(Andrew, fell free to give Tim my email if he wishes to discuss further.)

Richard McElreath’s _Statistical Rethinking: A Bayesian Course with Examples in R and Stan_ is a more modern, pedagogical take on Jaynesian thinking. It has a really unique mix of philosophy (“What do these calculations *really mean*?”) and practice (“How do I *really do* these calculations?”). I feel like Jaynes’ _Probability Theory_ is more to convince statisticians to be (Jaynesian) Bayesians than to teach non-statisticians (even the philosophically-inclined ones) to be statisticians.

+1 Jaynes is more polemic / philosophy than text.

+1.

McElreaths is, imho, probably the 2 or 3 best stats books for applied scientists. There is barely any math.

In a philosophical tone William Briggs’ ‘Uncertainty’ is a fun read.

Definitely recommend this one — Richard also has youtube videos to follow along, R-code to download, and, if you ask nicely solutions to many of the problems. https://www.youtube.com/playlist?list=PL7pGJQV-jlzDHpASeEC-L2SiX6bl2TE8k

That link brought me to a gaming website, I think this is the one to Richard McElreath’s _Statistical Rethinking lectures on YouTube:

https://www.youtube.com/watch?v=oy7Ks3YfbDg&list=PLDcUM9US4XdM9_N6XUUFrhghGJ4K25bFc

Thanks, will check it out. I have a very surface knowledge of R, but have been doing software development forever, so using R to implement and illustrate the subject matter sounds like something I could get into.

I’m going to go ahead and recommend Jaynes book as well, definitely good for understanding the foundational concept of Bayesian statistics.

Isn’t that like recommending a PhD level book to a kindergarten kid?

I agree. I tried to read Jaynes before I started to study statistics formally, and it felt quite remote from what I wanted to understand. There seemed to be a sub-text. I didn’t know what he was critiquing.

This guy did say he was interested in *Epistemology* which suggests he’s basically a philosophy student. I think Jaynes is more about the philosophy than about “how to do stats” so that’s the thinking behind my recommendation.

For a foundational perspective, as an alternative to Jaynes, I would also plug Michael Betancourt’s in-progress tutorial here: https://github.com/betanalpha/stan_intro/blob/master/stan_intro.pdf

@Daniel

I agree. In that sense you are right.

You summed it up precisely: Jaynes is more about the philosophy than about “how to do stats”.

Looking for both philosophical and mathematical books — gotta have the theory, but also know how to put it into practice. I’m not aiming to become a mathematician / statistician, simply to have an effective, practical grasp of the subject. Jaynes does look good though.

I’d suggest Bill Cleveland’s books, Elements of Graphing Data and Visualizing Data, but that’s because I gave up on formal statistics long ago because I have yet to deal with any data set that approaches the normal distribution.

In this vein, would Andrew or any of the commentariat have book recommendations for someone interested in getting a strong grounding in statistics with a particular interest in economics and consumer finance?

It’s very difficult to just educate yourself by reading a book or two or three. You need objective feedback on your mistakes, there will be many. I would suggest doing the online Graduate Certificate in Statistics from Sheffield (one year). It’s worth it. If you survive that, I suggest doing the MSc (two or three years), also delivered online. They have three specializations: Medical Statistics, Statistics, and Financial Statistics. The last one has special math requirements, you can’t be admitted unless you prove you have them.

I reviewed both courses here:

http://vasishth-statistics.blogspot.de/2011/12/part-1-of-2-review-of-graduate.html

http://vasishth-statistics.blogspot.de/2015/02/getting-statistics-education-review-of.html

I think the US has comparable programs taught online, although I don’t know how good they are. Maybe Andrew should start one and do it right.

I have some colleagues doing the online program at Oregon State and have heard positive reviews

I read through your review of the Sheffield course as I have a colleague who is looking for something like this. I found:

“Another amazing fact is that frequentist statistics is standard practice in medicine. I would have expected that Bayesian stats would dominate in such a vitally important application of statistics. I am willing to use p-values to make a binary decision to help a journal editor feel good about the paper, but not if I am deciding whether drug X will help stave off death for a patient. I am really glad that I do not need to enter the job market as a statistician. If I were starting out my career after finishing this degree, I would probably have done into a pharma company, and it is horrifying to think that I would be forced to deliver p-values as decision-making tool.”

Well said sir!

Yeah, what is going on medical statistics? It was absolutely insane to see the med stats books packed with p-value based decisions.

I hope that Frank Harrell can force some change in at least the medical regulatory bodies.

Well, some of us are trying to do something about it! It’s an uphill struggle though. I think a large part of the issue is that frequentism in general and NHST in particular was the (very) dominant ideology at the time that evidence-based medicine developed (at least for the people invlved at that time), so that became the standard methodology for medical research. There has been a tendency for method to solidify around certain procedures and ways of doing things, possibly because the majority of (research) practitioners in the field are not particularly statitically literate, so prefer to adopt standard methodologies. One consequence of this is that change is slow. There are lots of reasons why it’s hard to get people to change practice – one research idea I’ve thought about is exploring the many different barriers to change and how they could be overcome, but I wouldn’t really know where to start with that (need a propoer social scientist!).

> medical regulatory bodies

Don Berry did a lot on that, but learned _standard_ Bayes was not appropriate for regulatory bodies as they need to assess error rates – so some sort of calibrated Bayes was necessary.

He write this up in a couple of papers I can’t locate quickly ;-)

A while ago I heard a talk by Scott Berry (Don’s son and colleague) with title something like, Being a Bayesian in a Frequentist World, which I think pretty well expresses the problem.

(Scott together with Brad Carlin, Jack Lee, and Peter Mueller have a 2010 CRC Press book, Bayesian Adaptive Methods for Clinical Trials.)

Keith, can you expand on this, why do they *need* to assess error rates, and why is standard bayes not appropriate for assessing error rates?

In my naive view here, an error rate is meaningless by itself. Let me give an example, suppose you have a drug and it has some effect on cholesterol of X. Now, you might for example have 50% of people less than X and 50% of people more than X. So if you want to know the “negative” error rate, it’s 50%!!!! But if the 50% who have less than X have negligibly less than X, then it doesn’t matter!

So, by itself it seems an error rate is meaningless, the only meaningful thing is rate of error of a certain size, times the size of that error… which immediately leads to Bayesian Decision Theory.

Now, if you’re saying that “the law requires that the regulatory body assesses error rate” then to me this is just like saying “Lawyers made a mistake, and the law needs to get changed”

The argument (from Scott Berry et al’s book) (and from memory) is that the aim of regulatory bodies is to ensure that only a small percentage of the drugs that get approved don’t actually work, so a low Type I error rate is what they are trying to ensure.

I don’t really like that argument for a couple of reasons; it relies on the dichotomy that drugs “work” or “don’t work” which doesn’t reflect messy reality (and has been much criticised by Prof Gelman among others), and it doesn’t take account of severity of conditions or toxicity or drugs aand all of that, which surely should play a role in regulators’ overall decisions and strategy. I’m sure regulators would be more bothered about type I errors in really toxic cancer treatments than non-toxic treatments for warts, for example. So I don’t really buy the argument that they are (or should be) pursuing a single overall Type I error rate.

Yep. I agree with all of this. Further, Wald’s theorem says ultimately that if you want to minimize “Type I” error that the class of techniques you should use is basically Bayesian stats with a zero-one cost function. So, yeah, if you like zero-one cost functions, then by all means go ahead, but you still need to use Bayes ;-)

Thanks Simon (though I should try to find one the his papers that discusses this.)

It was the error rates of the regulatory agency’s decisions – approving things they shouldn’t and not approving things they should (related to “Type 1” but more generally).

They also are not in a position to claim “consistent with the assumptions” we accepted because they should know those assumptions are not just wrong but probably not what they would accept if they really understood what those who did the research do.

So one needs to consider what “usually” happens when a given Bayesian method is used by an applicant seeking approval over repeated applications by various applicants.

What the FDA Guidance went for was to require a simulation drawing from a fixed parameter in the null and a fixed parameter in the alternative and if I understand correctly a rational being required for exceeding usual 5% for the first and lower than 80% for the second.

I think something like this would better http://statmodeling.stat.columbia.edu/2016/08/22/bayesian-inference-completely-solves-the-multiple-comparisons-problem/ but something is definitely needed beyond taking the posterior probabilities as being relevant and used literally in their decisions.

Keith: what I hear from you is in essence “don’t let the fox choose the lock on the henhouse” and I can see how that is a clear issue. You can’t just accept that the model the pharma company gives you is the right one to use… But saying that you can’t accept the fox’s model and saying that you can’t accept any method in which you need to choose a model (likelihood and prior) are two different things.

> ” saying that you can’t accept any method in which you need to choose a model (likelihood and prior)”

Certainly did not meant to suggest that – rather that one needs to do more make than just deduce the consequences of the likelihood(s) and prior(s) chosen (aka getting and interpreting the posterior(s) as “prima facie” relevant).

If you’re into epistemiology, “The nature of scientific evidence” edited by Taper and Lele might be of interest. It’s a philosophical back and forth discussion between frequentists, likelihood-ratio(ists/ans?) and bayesians. Also, “The ecological detective” by Hilborn and Mangel is a rather entertaining intro to flexible modelling, with an ecological slant of course. Neither is a typical stats textbook though.

Otherwise, I second the recommendations of “Statistical rethinking” and Gelman and Hill’s “Data analysis…” which both has really good intro chapters of some basic concepts.

I haven’t come across a good resource that mix nuanced modern philosophy of science with nuanced modern statistics, and I am curious to know if anyone else has. When I took an philosophy of science course as a graduate student a couple of years ago I found there was a large disconnect between the philosophy and practices of science, not least the data analysis.

So good luck bridging the gap!

>>> large disconnect between the philosophy and practices of science<<<

If only philosophers had to practice science & get their hands dirty before they dabbled in philosophy, we may make some progress on bridging the gap.

+1. Same goes for statistics, I’d say. And then, of course, vice versa.

That is primarily what makes CS Peirce so profitable to read – he did practice science and got his hands very dirty especially in the pendulum swinging experiments.

p.s. he also worked in Statistics.

Thanks for the appropriate recommendations!

Ian Hacking’s ‘Introduction to Probability and Inductive Logic’ might be good first foray into probability, statistics, and the philosophy of science for those who are philosophically inclined. It covers a great deal of material, is accessible, and wouldn’t require brushing up on mathematics. It could easily be studied in parallel with a more mathematically-focused text, or perhaps as a prequel to one.

> read a book in some area of interest to you that uses statistics. […] if you’re interested in epidemiology, then maybe best to read a book on that subject.

Please recommend books for a statistic-curious, data-minded linguist!! (There’s a bunch of stats+ling books and I’ve read—or went through—a few, but I’m interested in recommendations, incl. on the basics. I can program in Python, I can use R if I have to, and I like Tufte.)

I suggest Dan Navarro’s book:

https://health.adelaide.edu.au/psychology/ccs/teaching/lsr/

Your book with Jennifer Hill is fantastic. I recommend it as often as I can.

My path into (Bayesian) statistics has been helped by having had a rigorous course in probability. Calculus and matrix algebra also highly useful, but once you appreciate that Bayesian inference is extending probability theory to parameter and model uncertainty, things get easier. I also find I often understand non-Bayesian techniques better by reformulating them in a Bayesian framework- i.e. Lasso –> laplace priors, etc.

I agree. I made basically no headway in Bayesian statistics (I stopped reading Gelman et al’s first edition BDA book almost immediately) until I did a course in calculus, probability, and matrix algebra, all geared towards applications in statistics. Gelman and Hill was very helpful but I still only had a vague idea about what was going on and only understood the details after reading textbooks like Lynch (after the math etc courses).

Hi Shravan and Chris,

can you recommend a good course in matrix algebra? Probability theory I have a good foundation already and calculus I had good background from school. But not enough knowledge about matrix algebra. I noticed that in many books for applied statistics (I want to apply it for social sciences but also analyze data about e.g. microbiota. Is matrix algebra really necessary to understand e.g. Andrews book with Jennifer Hill? If so, it would be cool if someone knew a good course that covers all the basics needed without going too much into detail about what I don’t need….

Thanks

I think if you posed the same question to the “machine learning” community they would be more likely to recommend “An Introduction to Statistical Learning” by James, Witten, Hastie, and Tibshirani or the very similar (longer) The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman. Always interesting (as an outsider) to see how these similar fields have different philosophies

I really love Andy Field’s Discovering Statistics Using R. Written in an accessible, humorous way, with tons of real-world examples.

How about Richard Scheines’s “Causation, Prediction, and Search” and Judea Pearl’s “Causality: Models, Reasoning and Inference”?

Hong:

I think those books are of historical interest and might be worth reading for their unusual perspectives but they won’t tell you much about statistics.

Wow, what a response! Thanks, first off to Andrew, but also to the rest of you who posted in this thread — some great info!

A bit about me: As some of you have noticed, I do have a philosophy background, but no, I’m not a philosopher, but it is a core interest of mine. I’m generally interested in rationality and decision making, and statistics is an important part of that, hence my interest. I asked for an intro book as my math is quite a bit rusty, though I assume I will have to refresh calc and the rest of it to make any serious headway. Luckily for me, my wife is a mathematician, and one of my best friends is an econometrics PhD, so I can get help when needed.

I will definitely check out all the books in this thread though, both the more applied and more philosophical ones. Thanks again everybody!

Tim: it could be helpful for someone like you to know that there is a major tension in statistics between two major schools of thought.

The first, frequentist statistical ideas are at their heart, about determining what random number generators might do and to try to see if some observed data values could be like the output of these random number generators. This leads to p values and to hypothesis testing.

The second concept is about quantifying what is more or less likely to be true based on some initially assigned knowledge, and leads to Bayesian inference, and its more restricted form, Likelihood based inference.

These two things are fundamentally different *in terms of their meaning*. If a Bayesian says “there is a 75% probability that it will rain today” it ultimately means something like “under some assumptions about the physics of the weather, and some data about what it was like yesterday, and some range of plausible measurement errors and values for key atmospheric variables…. 75% of the weight of evidence suggests at least some rain”.

whereas a frequentist who tells you 75% chance of rain means “under some assumptions about which random number generators generate outcomes that are similar to the actual historical weather, 75% of the random numbers we generate for tomorrows weather show rain”

So, when you go looking for info on statistics, if you keep in mind that some people put probability weight on different logical possibilities, and some put probability only on how often events would happen if they came out of a random number generator, and this difference is fundamentally two entirely separate fields both of which call themselves statistics.

Thanks Daniel — this much I do understand, primarily thanks to James, my econometrician buddy. I’m really interested in causality, epistemology, and experiential methods, in terms of figuring out what is really going on the in the world. Well, as much as we are able to figure out with any degree of confidence :) I tend more towards the Bayesian side of things, though as I am not really qualified to make authoritative statements / decisions on the topic, it’s more of a gut feeling than a practical or educated preference.

I look at learning (and relearning) stats as equipping my mind with the tools to make better decisions, both in my business and personal lives. And to be able to think about and read scientific papers without pestering my wife about the math all the time.

If you’re interested in decision making, read this foundational paper.

https://projecteuclid.org/euclid.aoms/1177730345

Sure, Wald isn’t the be-all end-all but you need to know this result.

Thanks, that looks a bit intimidating, but that means a challenge, so great. I’ve heard of Wald in passing, but am not really familiar with his work.

I’ll be curious if you decide to read Jaynes, how that goes for you. I found his book irritating after a short while, and gave up rather quickly. As somebody said above, it comes with a hefty dose of polemics, and what felt to me like a lot of mathematical asides/irrelevancies. Michael Betancourt’s document (I linked above) is a much better bridge from probability theory into statistical inference, IMO (disclaimer, I consider myself a literate user of statistical and quantitative methods, not a mathematician/statistician).

I think Jaynes is like Ayn Rand. It’s a cult following. Either you just love him to the point of idolizing him. Or you don’t.

Hmm… I’m going to object, but I won’t go into the details of my objection, instead I’ll elaborate on maybe why it is that you have this impression.

Both Jaynes and Rand were writing *against* a particular popular established “truth”. For Jaynes, it was against the interpretation of statistics as being about the properties of random number generators, and for Rand it was against the rising popularity of communism among a group of social revolutionaries during the period starting in the crash of 1929 and continuing through the end of the second world war, influenced greatly by the depression and soforth.

Jaynes had seen how the view of statistics as fundamentally about the frequency properties of random number generators removed the possibility of incorporating physics into mathematical models of the world, and he was a physicist and took great umbrage at this….

Rand had seen how the reality of communism gutted the country where she was born and destroyed the real wealth of millions of people, and took great umbrage at this….

So, they have similar kinds of motivations and similar vehemence… beyond that I don’t think comparisons are really apt.

In particular, Rand made unverifiable claims about morality (really it came down to “I believe X”), whereas Jaynes made externally verifiable claims about mathematical and physical calculations. So Rand’s group is following a “cult of personality” making claims about “what is the good and the true”, and Jaynes group is following along with pencil and paper verifying the mathematical properties of a system of calculation…

At some point you could argue that “Calculating with Bayesian probabilities” is itself a cult, but then by the same methods you could also argue that “algebra” is itself a cult, so I don’t think it gets you anywhere useful.

Even Andrew could be thought of as writing against the same “particular popular established “truth””, right?

But it isn’t annoying to read Andrew. I think it’s in the writing style. Andrew comes across as factual, practical and persuasive. Jaynes comes across as haughty and polemic.

Perhaps it’s a subjective thing: One reader’s irritation can be another’s ecstasy.

PS. Another author I put in the Jaynes category is Judea Pearl.

If your point is about Jaynes style and vehemence then sure I can see what you mean.

I’m making a list of everything posted in this thread, and will be checking them all out. The Pearl work on causality I already have on my reading list, and I think I’ve had the Jaynes book recommended to me also, but haven’t checked it out at all yet. The Betancourt looks accessible upon a first scan also — and it seems I still remember some symbolic logic also, but I can relearn that quite quickly.

You can get the first chapters of the Jaynes book though Larry Bretthorst’s website (http://bayes.wustl.edu/): http://bayes.wustl.edu/etj/prob/book.pdf . It may be all you need to get a feel for the epistemological thrust. Similar for McElreath, whose first chapter (freely available here: http://xcelab.net/rm/statistical-rethinking/) gives a useful overview of the link between “truth”, philosophy of science, and where statistics fits in.

I’d say you might as well avoid most standard stats books. As an epistemologist, you are likely to be disappointed by the lack of any meaningful justification for the methods, which seems like a key to what you are after.

I like _Comparative Statistical Inference_ by Vic Barnett, though it might be dated in parts now (3rd edition was 1999).

I’d recommend the self-study exam series of the Royal Statistical Society.

They provide a syllabus, study guide, literature recommendations, and old exams, and then allow you to sit the exam (without any other formal qualification – just need to pay, obviously).

Depending on the desired level, you can take the Ordinary Certificate, Higher Certificate, or Graduate Diploma (where the latter is “equivalent to that of a good UK honours degree in statistics”).

The reader writing in does not want to take ” classes, just working through things on my own”. In my experience (I took the graduate diploma), this is sort of the best of both worlds – self study, but structured, and with a certification.

http://www.rss.org.uk/RSS/pro_dev/Examinations/RSS/pro_dev/Examinations_sub/Examinations.aspx

Tim, actually, having read all the comments, including yours, I’m not so sure anymore – the RSS exams don’t cover the philosophical side so much, they’re more a (good) introduction to the canonical stuff (foundations, several applications). Well, at any rate, check it out – the syllabi and literature recommendations alone might be helpful.

I think you might like Stigler’s The History of Statistics. For me it helped clarify a great deal, both of things that never made that much sense to me and of certain kinds of debates. It helped me see the kinds of problems different people were trying to solve and how that shaped things.

If you really want an intro level statistics book either to plow through on your own or as a reference book most of the suggestions here are too advanced. They are assuming a level of baseline knowledge. So the question is whether you feel you have fluency in those concepts. Unfortunately friends and family won’t get you that even if they can help with getting through a chapter. You can somewhat separate out books by whether they assume you are fluent in linear algebra, fluent in calculus, both or neither.

When it comes to intro books the basic problem is that most of them are written for college students doing the one semester required course who will quickly flee. The books “cover the content” but don’t necessarily give you a framework for moving forward. They are often awful and basically race you through a bunch of formulas and tests. Having done some looking last year at books for undergraduate beginners, If you feel like you need the real, I know nothing, basics in some ways the GAISE inspired texts such as Agresti and Franklin are the best. They won’t hold you back once you head into more challenging material.

Just putting this out there:

Jay Kadane of CMU has a (hard-core subjective Bayesian?) book free online called “Principles of Uncertainty”

http://uncertainty.stat.cmu.edu/wp-content/uploads/2011/05/principles-of-uncertainty.pdf

It has a bunch of technical mathy stuff but also a lot of philosophically relevant material, too.

I’d put in a suggestion for two more free resources from Cosma Shalizi. The more introductory one is a short set of notes on probability theory and statistics, it basically doesn’t tell you any of the how or why, but it does cover in a very easy-to-follow way all the basic concepts that you can then go and learn about in depth from other sources. It’s greatest virtue is that it’s free and (fairly) non-indoctrinating.

It’s here: http://bactra.org/prob-notes/srl.pdf

The next is the truly fantastic book he has written called ‘Advanced Data Analysis from an Elementary point-of-view’. He says it assumes previous courses in calculus, linear algebra and intro stats, but to be honest I think you can get a lot of it without any of those. You maybe won’t be able to follow the derivations without those prerequisites (there is a lot of calculus and matrix algebra) but the actual virtue of the book is teaching how all these concepts fit together. It’s like a map of all of statistics, it doesn’t take you down deep into any valley, but it tells you what the valley is (e.g. linear regression), why someone thought of it in the first place (the concept of a general regression function + linearity is the simplest choice of dependence), his take on what you should think about it now and what you shouldn’t think about it (it is finding the (predictively) optimal linear approximation to the regression function, it is not how the variables are actually related and it says nothing about whether the relation is causal or not, nor whether it is a *good* approximation), and how it is related to other ideas (linear smoothing, nonlinear dependence, local linear regression, structural models and many others). I would probably say it is mostly not philosophical but it is *the* answer to ‘I have some data in front of me, what now?’.

It’s here: http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/

Once you’re ready to start worrying about philosophy and particularly the frequentist/bayesian question here’s a quick list of things that might be interesting (Disclaimer: I have not read all of them, certainly not learnt everything in all of them):

– Some of the writings of Fisher, Neyman, CS Peirce and others

– Jaynes’ Logic of Science (discussed above)

– Judea Pearl or the Spirtes/Glymour/Scheines book on causation

– Deborah Mayo ‘Error and the Growth of Experimental Knowledge’

– Leonard Savage ‘Foundations of Statistics’

– Lots of stuff on Bayesian consistency (http://bactra.org/notebooks/bayesian-consistency.html)

– Anything Cosma Shalizi writes about Bayesianism (http://bactra.org/weblog/cat_bayes.html)

– some important results such as (in no particular order and obviously incomplete) Cox’s Theorem, Von Neumann–Morgenstern utility theorem, Bernstein–von Mises theorem, Wald’s theorem, Birnbaum’s result on the likelihood principle, de Finetti’s theorem (hmm coming in a little heavy on the Bayesian side here…)

– Some other approaches such as Geisser’s Predictive Inference and Laurie Davies’ approach which is based on weak topologies on spaces of probability measures induced by metrics like total variation distance (afaik. His stuff is sometimes hard to follow…)

I hope that’s a fairly balanced representation of the area with good arguments and ideas on both sides.

Recommend using this online statistics book. It’s more basic than Gelman/Hill and lets you check your answers. In my experience teaching at non-selective grad school, Gelman/Hill is challenging even to students who have had a couple of prior statistics courses. http://onlinestatbook.com/