Teaching Statistics: A Bag of Tricks (second edition)

Hey! Deb Nolan and I finished the second edition of our book, Teaching Statistics: A Bag of Tricks. You can pre-order it here.

I love love love this book. As William Goldman would say, it’s the “good parts version”: all the fun stuff without the standard boring examples (counting colors of M&M’s, etc.). Great stuff for teaching, also I’ve been told that’s a fun read for students of statistics.

Here’s the table of contents. If this doesn’t look like fun to you, don’t buy the book.

53 thoughts on “Teaching Statistics: A Bag of Tricks (second edition)

  1. Great! Looking forward to reading it. What are your thoughts on introducing inference by simulation a la Chance/Tintle (https://www.causeweb.org/sbi/) or the Locks (http://www.lock5stat.com/)? I have had some success over several years with health professionals, generally quite scared of their statistics module. I show them bootstrapping before standard errors and get them to do an approximate randomisation test by flipping coins and counting how many people in the room had results as or more extreme than the data (8 out of 10 cats prefer Whiskas) before they encounter p-values. Looking ahead, I’m interested in introducing k-fold cross-validation before any goodness-of-fit tests and stats, and maybe ABC at the outset (Rasmus Bååth stylee), both of which build on the general principle of simulation before asymptotics.

    • Robert:

      I love fake-data simulation. There’s not much fake-data simulation in this book but Jennifer and I are putting more of it into Regression and Other Stories.

      Regarding your other points: I recommend not teaching p-values at all and not teaching goodness-of-fit tests either. Bootstrap’s fine, I guess, but to my mind it’s more trouble than it’s worth; I’d rather just simulate fake data and check things that way.

      • I love fake-data simulation
        […]
        to my mind it’s more trouble than it’s worth; I’d rather just simulate fake data and check things that way.

        I wouldn’t call it that, people will think you are up to no good: “Eh its too much trouble to collect real data, lets just simulate fake-data instead like the textbook taught us to do”.

      • > it’s more trouble than it’s worth
        Agree, especially when you see what people who have _learned it_ apply in actual research (Efron even wrote paper circa 2000 complaining that most statisticians do it incorrectly).

        Seeming simple (to define) and being simple (to understand or apply) are very different.

        I think it also does a disservice by suggesting that avoiding (really more so hiding) assumptions is somehow inherently good.

        Doing fake-data simulations with clear and purposeful assumptions (that are varied) and not calling it a parametric bootstrap arguably is better.

        ABC at the outset (Rasmus Bååth stylee) aka two stage fake-data simulation with conditioning is surprisingly difficult to get across to others as being real statistical analysis. McElreath is apparently trying to workout out how to do it with a physical Galton two stage quincunx (e.g. https://phaneron0.wordpress.com/2012/11/23/two-stage-quincunx-2/ ) And there is this new entry by Jim Albert at https://www.datacamp.com/courses/beginning-bayes-in-r

        • “Agree, especially when you see what people who have _learned it_ apply in actual research (Efron even wrote paper circa 2000 complaining that most statisticians do it incorrectly).”

          Do you happen to remember the title of this paper (or something close to it)? I’d be interested in checking it out.

        • I don’t think it was a peer-reviewed paper (i.e. easy to find in google scholar) but a commentary somewhere.

          I used it in the first introductory I gave at Duke (2007) and I might have a copy somewhere???

          Did not cover the bootstrap when I taught the course the next term. Strangely the bootstrap material seemed to go over very well with the students. Now we had just covered survey sampling (with and without replacement) and the text was Freedman, Pisani and Purvis which used box models so it was easy to start with a sample of three, enumerate all possible sample paths of size 3 (i.e. a box model) and then do a bootstrap sample and calculate the frequencies of sample paths from that which obviously was sampling sample paths with replacement. Now increase the sample size to show why enumeration becomes impossible but sampling with replacement (using the bootstrap) remains easy. So now they really know what it actual is and does – now what to make of it when used on real data?

          The Efron paper was convincing that what to make of it when used on real data was really hard as most statisticians (according to Efron) were not doing it right (mostly because one should not use the vanilla but some corrected version of it). The take home message was that it (vanilla bootstrap) was mostly a distraction, except perhaps for getting an assessment of whether the sampling distribution of the statistic was approximately normal. So I removed from the course.

        • I’ve had mixed luck with ABC at the outset. I’ve found it worked pretty well with computer science people, because they really like random number generation to begin with and then conditioning is just a logical second step. It has not worked well at all with people that have no programming experience. Tried it once with a group of psychology undergrads, and that did more harm than good. They just thought it sounded like a hack (which it is, but a useful hack).

        • My one experience was that the students sensed it was a pedagogical trick to illustrate a principle, rather than a useful tool, and powered down their brains, checked facebook etc etc. But I am optimistic…

          “Doing fake-data simulations with clear and purposeful assumptions (that are varied) and not calling it a parametric bootstrap arguably is better.” – I’d definitely agree with that because it keeps the model in the foreground

        • > not worked well at all with people that have no programming experience
          Interesting, my sense was that people needed some experience thinking abstractly through some purposeful use of representations to address something concrete. Programming experience is probably one of the best ways to get that.

          (Always remember this guy ( https://en.wikipedia.org/wiki/Peter_Rosenthal ) telling us in class “most mathematicians have real difficulty making things concrete (as they spend most of their time being abstract without any concern for making them concrete.)

          So maybe, some learning and practice of R/Stan programming should come before any attempt to learn statistics?

          > a useful hack
          It is an abstract representation that is easy to make concrete and which supports most interpretants (concept extractions) needed to thoughtfully apply statistics. A beautiful theory that keeps getting killed by nasty ugly facts ;-)

        • This I do not agree with. I personally don’t think programming is necessary or sufficient for learning or conducting good statistical analysis. I am not saying it is without value – I also am not saying that students today should not learn programming (is that a double or triple negative?). Programming does help develop many critical thinking skills. But I don’t think it needs to, or should, come before learning statistics. I think understanding data and how to think about data is often more easily done before learning programming.

          This probably reflects my own abilities – programming was always a void in my tool kit, and it is by using smart software that I have been able to analyze data with minimal programming skills. I am not advocating that as a model for others to follow. But it has taught me that pedagogically it may be better to start with learning about data – how is measured, what does it mean, what does it look like, etc. – before learning to program. I think the appeal to programming as a prerequisite to learning statistics is just a form of setting a hurdle for entry into the field. In other words, nobody should do statistics if they can’t program – thereby limiting the field to the “chosen few.”

        • I agree with much of what you say, but cannot entirely agree with the last two sentences. I do believe that requiring programming before statistics can unnecessarily limit the field. But I think that people can genuinely believe that programming should be a prerequisite to statistics simply because that worked well for them, not realizing that different paths may work well for different people.

        • Dale:

          My point was simply it might be helpful to learn some programming before attempting to learn statistics implicitly from using two-stage simulation (aka ABC or Galton machine) – so statistics based on data generating models and priors.

          If people are going to learn this way (two-stage simulation) they will need to get get some experience making representations and working with them. Otherwise they are just inducing stories to explain to themselves what is going on. If one has the math to comfortably do that – as almost play rather than hard work – then they can do it that way (that will like take one than one or two calculus courses).

          I agree with Martha that different paths may work well for different people – so its not going be best for everyone.

          I am perplexed though I why one would want to do statistics these days without some decent programming skills. A number of times I have taken over work done by statisticians who left and with an afternoon of programming was able to do work they spent days doing in about 15 to 30 minutes. To me (and their former supervisors?) it ridiculous.

        • I am respond to Keith here since I can’t seem to respond to a response to my response…
          I was not trained as a statistician but I do plenty of statistical and data analysis work. I am not pretending to be a statistician, but I do not feel any need for programming (I use JMP if that helps, and it has its own scripting language, but I also rarely need to use that). I am not advocating that students learn to do statistical work without programming – but I am advocating that they first learn to appreciate data, and that for many, that is better accomplished without programming. I will venture as far as to say that most of the serious problems I have seen with statistical work result from failure to understand your data, conceptual problems with what is being measured and how, or lack of recognition of the limits of an anslysis. Failure to use the correct technique rarely are as important (from my experience). How programming fits into any of this, I don’t understand. I tried taking a Stanford machine learning course on Coursera – it was good and I did the conceptual parts of the course fine, but I gave up on the programming assignments after the first one. Far too much wasted time making errors of syntax for which I saw too little benefit. Again, I am not advocating this for most students (I am at a different point in my career than them), but I am reacting to your suggestion that you can’t see why anyone would want to do statistics without programming. I love statistics and I hate programming.

        • Dale. Suppose you’re a teacher in an intro stats course, as Rahul is fond of saying on this blog, there really *is* a binary decision you need to make. Should you 1) Teach a moderate amount of programming first or 2) Avoid teaching programming until maybe some later class.

          Now, the Bayesian way to make this decision is to choose the one that leads to the maximum *expected* learning. Suppose you have a really good final exam that tests people’s conceptual understanding well. You give it to incoming students, and then you teach two sections of this course, one you do some programming stuff early, one you avoid programming entirely. You look at the distribution of changes in the final exam type scores… you try to estimate how much having the programming improved outcomes.

          Of course most people advocating programming don’t do the RCT etc, but they imagine in their head what might occur, and put some prior distribution, and then say to themselves… I think the average will be higher if we get programming involved early.

          This is perfectly compatible with “for some people, they will be harmed by the programming requirement”. It’s also compatible with your idea of a “hurdle”, that is, you might over-estimate the goodness of programming by the fact that people who hate programming drop out or don’t take the class.

          It seems to me the ideal case, if you can manage it, is to teach two classes, and let people self-assort into the one they thing is best. Problem is… most people don’t have any programming knowledge at this stage, and so their assortment is often uninformed. They might frequently make the wrong decision based on fear or what they heard from their friends or whatever.

        • Dale wrote: “but I am advocating that they first learn to appreciate data, and that for many, that is better accomplished without programming. I will venture as far as to say that most of the serious problems I have seen with statistical work result from failure to understand your data, conceptual problems with what is being measured and how, or lack of recognition of the limits of an anslysis.”

          This is a good point. It brings to mind someone who was very adept at programming who volunteered to teach introductory statistics. He had the students learn R — but he did not understand statistics well enough to use R well himself. He got “results”, but they were based on a lack of understanding of statistics. So I think his students got a lot of mis-learning.

          In other words, good programming skills without good understanding of statistics is a recipe for poor use of statistics.

        • Dale,

          The value I see in GUI based statistical programs are the following:

          1. great for simple questions,
          2. great for complex statistics with small numbers of variables,

          The problems I see with them are:

          3. tendency toward the use of defaults,
          4. terrible for cleaning and prepping large amounts of complex data,
          5. terrible for complex questions that require strong understanding of the underlying concepts (see #3).

          Jump is a great program if the data has already been cleaned and prepared for analysis. But it is not a great program for automating the production of large numbers of graphs create reproducible analyses.

        • I am not claiming that a well designed GUI could not be developed that overcame the limitations. But, I am saying that I have not come across that GUI in my exploration of statistical analytic tools.

        • Edit:

          Making an analysis reproducible and auditable is important. I have NOT come across a GUI that comes close to making that as straight forward as it is with code.

        • Curious
          I could not disagree with you more regarding JMP. It’s strength is in cleaning data. Reproducing analysis is simple by saving the scripts – easy to save with the data sets which makes it easy to reproduce the analysis and see the code at the same time. GUI can be bad and promote bad habits or it can help develop good habits. With JMP I believe it is the latter. Either you don’t know much about JMP or you have some very different experiences than mine. As for the defaults, many of JMP’s defaults are quite smart actually (e.g., in building decision trees, neural networks, etc. the defaults largely protect you against overfitting – making validation less important). The problem with defaults is lazy thinking and I don’t think you can blame lazy thinking on GUIs. In fact (and perhaps I’ll start another tirade with this one), I’ve seen far too many people use R by taking someone else’s code and just modifying the variables. Talk about lazy thinking – but that is the natural result of having such a poor user interface.

        • Dale,

          I attended a presentation of JMP a couple of years ago at a Boston Area SAS Users Group (BASUG). And perhaps the answers I received from the JMP representatives were not accurate. But, I left there being told that I would not be able to automate the data cleaning, recoding, calculations, etc. in the same way that I was using PROC SQL and Data Step in the SAS program.

          Perhaps they have improved the interface and scripting tool?

        • Modifying code is a great way to begin the learning process in an analytical program. It is not stopping point, but an entry into a new method.

        • I don’t know that I would accept the idea that lazy thinking is the only problem. It is that many GUI’s and perhaps this is not longer the case in JMP, but it was when I saw the presentation, is that the GUI lacked the full flexibility of code. This may have changed, but it was still the case a couple of years ago anyway.

        • It lacked both the full flexibility and the full functionality of code in the same way that SAS Enterprise Guide’s (EG) GUI lacks the full functionality of code that can also be implemented within EG.

        • Indeed, JMP’s scripting language provides more capability and control than just using the GUI. Despite most of my work being with data, I have found few reasons to use the scripting language – that does not mean that others feel the same way. The GUI has many capabilities that are quite sophisticated, and in some ways, reinforce thinking carefully and meaningfully about the data. In ways that code does not promote – for me. Perhaps, as Martha says, it depends on the world you come from and how your experience has been constructed.

          I would suggest that programming is a red herring. As I said at the outset, it is neither necessary nor sufficient for good work with data. As with NHST, that is a loaded statement. I could just as easily say that a good GUI is neither necessary nor sufficient for good data analysis. I’m willing to live with that symmetry – but I don’t accept the argument that programming somehow produces better data analysis – just as I don’t accept the argument that good mechanics are better drivers than non-mechanics.

          One other point about JMP is worth noting. Despite its superiority (in my view) over other GUI-based statistics programs, it is far less well known and less used. I believe this is due to it being under the SAS umbrella – I believe SAS is worried about cannibalizing their more expense products (and it is a legitimate worry in my mind). That point is somewhat off topic, but I believe it may be relevant to some of your experiences with exposure to the product.

        • Dale,

          I agree that programming is not required for good and careful data analysis, but it does in fact depend on what you are trying to accomplish. There is a reason Andrew Gelman and the Stan group felt compelled to develop the Stan software. It was because the existing programs did not do something they wanted to do. This has been my experience with all GUI’s I’ve used. They do not provide the same functionality as the coding language that underlies them. This is inevitably true in that they would have to code every bit of existing functionality into the GUI, which is typically not very efficient.

          Also, I certainly accept your argument that JMP has been adequate to your needs. But, if I were using JMP and found that the GUI did not allow me to implement something I wanted to, I would learn the scripting language. If they have made the scripting language such that I could send a file to you and you could reproduce my analyses exactly and could even change the script where you thought useful, then that’s a great advance over what I was under the impression JMP was previously capable of.

          I am not against GUI’s, I am against limitations GUI’s create for analyses that do not exist within the underlying language. If a GUI does the trick, then have at it. And if it doesn’t, find something that allows you to do the analyses you want. If it is the JMP scripting language, then awesome. If it is R, Python, Stata, Stan, whatever. Then use it. The great thing about R and Stan is that they are relatively cheap.

        • I am replying to this out of sequence because this is the last “reply” link I see in my browser – all the answers below do not admit replies for some reason.
          I am with Dale in this debate. For beginners, trying to teach programming takes away class time from teaching data literacy and methods. I also think for beginners, it’s less important to spend time on data cleaning; I’d attach a small section at the end of the course to expose them to this sad reality but not much more. It’s also difficult to appreciate the value and importance of data cleaning when you don’t have a good sense of how to run the analysis in the first place.
          GUI based programs like JMP is great for beginners. I have learned first hand that the young generations know how to navigate software and what is particularly gratifying is that they hand in assignments which demonstrate that they have explored the software and used functions that are not covered in my lectures. When I taught programming languages, this does not happen… probably because going beyond the lecture notes means having to read the coding manuals, and apparently no one likes to read manuals.
          Of course, for more advanced students and for research, one has to learn coding

        • Kaiser,

          While I agree that it is important to teach good statistical analysis, I think it is a mistake to ignore real world data issues. It has been my observation that many who have completed graduate level statistics courses, even with high grades, often have little understanding of the complexities of data and modeling in the real world and even in the real world of research. I know this was the case for me and I haven’t seen anything to persuade me otherwise in the past decade.

          I would not argue that we trade good analytical training for data cleaning skills as both are essential. My primary argument is the importance of having a data analysis process in place that can be easily reviewed and audited from start to finish and the difference in ease with which this is done with code relative to a GUI. Perhaps JMP allows for this currently, but when I was considering using it a couple of years ago, it did not. I spent a bit of time looking at their website last night and it does appear that there have been some substantial improvements made to the scripting functionality. However, it is not clear to me how easy it would be.

          Here is where it becomes an issue. Let’s say you send me a GUI project with 30 nodes that you would like me to review. If it is possible to produce a scripting document that I can review one step at a time, I will be able to make my way through the project easily and provide feedback where I think it might be helpful. If on the other hand the GUI you are using does not allow for this script to be produced, then I will have to go into each of the 30 nodes or notes or whatever form of documentation you might add to the project and figure out how you produced what is coming out the other end.

          I find it is far easier to review code for complex projects than it is to review GUI processes. Now, perhaps my experience is outdated and this is no longer an issue for most GUI based interfaces, but that would really surprise me.

        • Kaiser
          I actually think data cleaning is very important and I introduce it in my earliest courses. It is one of the things I like best about JMP – it facilitates cleaning large data sets very well. It is also one of the things I find most disturbing about most textbooks – they provide clean data sets so students don’t learn how to work on real data. There was a great blog post from a data scientist that said (paraphrasing) that a data scientist saying they don’t like cleaning data is like a farmer saying they only like harvesting the crop, but not planting and cultivating it.

          Curious
          I’m afraid I don’t know what you mean by “nodes.” The only nodes I am familiar with are in neural networks (and 30 sounds a bit large for that use).

        • Dale,

          I am referring to the analysis nodes in Enterprise Guide and I take from your response that JMP is not structured similarly. If I were going to review someone’s project, they could either point me to the stored project file or they could point me to the code stored as a SAS code file or as a text file.

          What would someone using JMP send?

        • Daniel
          I simply do not agree with you. The fact that there is a script does mean there is programming, but it does not follow that therefore people must know how to program. My car has an engine and I don’t know how it works, but I am still a good driver. I am (sometimes) a good writer, and I use a word processor that has a “script” that I don’t understand. etc. etc.

          Now, regarding what should be taught in a program – I completely agree that a statistics or data science program must have programming as part of it – even a large part. I do not think the first course is the appropriate place, except for some fields (where students are well-equipped for it). But many people will work with data that are not majoring in these subjects. I think it is naive to believe that you can keep data analysis to the small group of people that have majored in these fields. Many people will – and should – use data analysis, and one or two courses should be enough to teach them something. Must that something include programming? My answer is no – there are more important things for them to learn first.

        • They could send you the script – or better yet, they could send you the data set with the script saved inside. Then you just have to run the script to reproduce the analysis.

        • See, as soon as there’s a “script” it *IS* programming. And this is where I think I just completely disagree with you.

          It’s essentially impossible to teach statistics to people with *zero* programming. You can teach some of the ideas, but “some of the ideas” is not statistics.

          If you accept that you can’t learn statistics in a one semester course, then yes it’s perfectly reasonable to discuss where in the 8 semesters required to learn how to do reasonably good data analysis you should put programming (in my opinion, either semester 1 or semester 2) but the option of “never” is simply not an option. Also, the option of “pack it all into 2 semesters and call it good” is not an option either. That doesn’t arrive at the destination.

        • Laura:

          Jennifer and I are breaking up our book into two. The first volume, Regression and Other Stories, is nearly done, and it should be available in 2017. The second volume on multilevel models, coauthored with Ben Goodrich and Jonah Gabry, probably 2018.

        • D:

          If I wasn’t unrealistically optimistic, I’d never get anything done at all!

          Ummm, I guess my productivity might be improved if I were to stop responding to blog comments . . .

  2. Interestingly, Amazon also sells the first edition of the book, at significantly more than twice the price of the second edition (both paperback and hardcopy). I predict that the price for the first edition will drop significantly (H0: it won’t).

    • Here’s what we wrote in the preface:

      For this second edition we have added chapter 4 on graphics, chapter 14 on teaching statistics to social scientists, chapter 15 on statistics diaries, chapter 16 on a course in statistical communication and graphics, and chapter 21 on teaching data science. All these new chapters reflect our view of the unity of statistics education: we see no sharp distinction between introductory classes and more advanced teaching. The educational principle of active learning, and the statistical principles of variation and uncertainty, apply at all levels.

      We have also added new activities in the data collection chapter: a sampling project that involves digital photos, and an alternative taste-testing experiment. There is a new section in the how-to-do-it chapter on the large lecture class that includes experiences with document cameras, clickers, online forums, near-peers, and reproducible documents. Also in that chapter we revised and expanded the section on monitoring progress, adding guidelines for making posters, rubrics for grading, and forms for creating teams, tracking progress with task logs, and for team members to provide feedback about each other’s efforts. We have added a new first-week-of-class activity In the chapter on the survey sampling class and a new project description for a analysis of data from a complex survey.

  3. Any thought to open sourcing the book? This, as you probably know, is a recent trend. Examples:

    http://r4ds.had.co.nz/
    http://www.deeplearningbook.org/

    You still sell hard copies while, at the same time, making the entire book available for free on-line. I would think that this would increase your readership/influence by an order of magnitude, if not more.

    It is tough, after all, for owners of the first edition (like me!) to pony up for a second edition . . .

  4. Andrew, I thought you might have included a section on slicing and dicing data, along the lines of the good work you have done with the Deaton dataset. That seems to be lost in most intro stats classes. Or maybe it’s not obvious from the table of contents?

  5. I look forward to this. It’s especially nice to see section on issues around lying with statistics.

    It is sad but true that even at some excellent schools, many engineering students don’t get as much statistics as they should have:
    a) They may take a stats course, and relate to none of the examples.
    b) If stats is optional, they look at syllabus, and don’t find they relate, so skip it.

    One result is that engineering schools sometimes create their own stats courses. That may be good, or bad.

    On programming, there seems no consensus on whether should be prerequisite, but is there any consensus on *which* language(s)/toolsets would be required, if something were to be required?

    • John:

      I hate to disappoint you, but the chapter on lying with statistics is all old material from the 2002 edition. The material is fine, but it doesn’t have any of the recent stuff we’ve been discussing on this blog for the past few years.

  6. Cool! Very much looking forward to this.

    This may sound silly, but can I put in a vote against setting future books in Computer Modern (the default LaTeX font)? In my opinion it looks spindly and light on the page, and is unpleasant to read. I’d much prefer it if you set your books in a more mainstream print typeface.

Leave a Reply

Your email address will not be published. Required fields are marked *