Kill the math in the intro stat course?

David Kane writes:

Our introductory classes in statistics and data science use too much mathematics. The key causal effect which our students want our classes to have is to improve their future performance and opportunities. The more professional their computing skills (in the context of data analysis), the greater their likely success. Introductory courses should feature almost no mathematical/statistical formulas beyond simple algebra.

I pretty much agree. I haven’t written any introductory statistics textbooks yet, but in Regression and Other Stories we pretty much explicitly replace mathematics with computation.

There’s just one way I’d change the message. Kane recommends no math in the intro stat course beyond simple algebra. I’d argue that intro stat courses, even those that are nominally “calculus-based,” already don’t use any math beyond simple algebra. Here, I’m defining the content of a class not by its lectures but by what’s required in homeworks and exams. Lots of intro classes give fake-o proofs and derivations of the standard deviation of the mean or the least squares solution or whatever, and, yeah, these are kinda mathematical—but I don’t consider these to really be part of the course unless the students are required to do these things in their homeworks and exams. Derivations on the blackboard (or on slides) don’t count.

So the way I’d put it is that intro stat courses (just to be clear, I’m talking about the one-semester statistics class, not about probability or a post-probability course in mathematical statistics) already uses no math beyond basic algebra—and we should just recognize and be comfortable with that.

Also, going beyond that, don’t forget our suggestion of a science core course based entirely on computer simulation.

61 thoughts on “Kill the math in the intro stat course?

  1. In France, I received an education with a lot of emphasis on rigour, theory and a heavy course load in math.
    At the end of the probability course, I had zero intuition, and zero programming skills.
    Two weeks after the exam, I had forgotten all the theorems and their mathematical proof…
    (and I don’t think my experience is unique)
    but many years later, now, I feel at home when I see a formula, and it is often a very useful skill.
    Rather than treating statistical models as black boxes with obscure assumptions, I can open the black box, inspect its components, and judge if it makes sense for the problem. I feel that a solid background in math gives me a better grasp on the modeling, especially for bayesian inference and fitting algorithms. It is true that I rarely need math beyond simple algebra, but if you are not exposed at all in intro courses at university, then you might find yourself struggling with the rule of three.

    • Indeed, I will second Alain’s points. To add another dimension to the problem of lack of mathematical understanding, I have seen (and I can reference) metrics being introduced, used, referenced in scientific work, and even applied in black-box stats computing, and these metrics have been developed only by intuition and some working examples; no rigorous proof. However, later in time, they were practically proven incorrect, something that could have been done if both those that have suggested them, and even more those that have been referencing them were keen to see the theoretical/mathematical ground on which these metrics had been based and studied.
      Unfortunately, people that just understand (stat) concepts are likely to become creative, but this is when rigorousness needs to kick in.

    • Of course mathematical rigor is great, but I think you might be a little unclear on just how utterly devoid of any serious attempt to learn any mathematics the American introductory statistics course already is, and necessarily so. Students studying social sciences, life sciences, business and economics are often required to take precisely one statistics course in their career. Calculus is not a prerequisite (let alone linear algebra), and a significant proportion of these students take no math courses at the university level. This is the vast majority of all students who take any statistics at American universities, and there is no question of teaching them anywhere close to enough mathematics to be able to “open up a black box.” It’s not that students forget the theorems and their proofs, it’s that hardly any theorems are even stated, and none are proved in any serious way; the argument here is against requiring students to mindlessly practice applying a rote formula for a standard deviation.

  2. 100%. Last time I taught an intro course (mind you it was at the graduate level), the most complicated math I used was algebra. Also started assuming zero programming knowledge and built up to model-based inference for most of the data scenarios research Psychologists encounter. Recordings are here [https://www.youtube.com/playlist?list=PLu77iLvsj_GNmWqDdX-26kZ56dCZqkzBO] if anyone is interested; I’d like to redo them someday with more prior/posterior predictive checking.

  3. Berkeley developed an intro Data Science course (Data Science 8) very much along these lines in 2015 (full disclosure, I was a TA), and I think a couple other schools had similar courses before we did. The designers got other departments to accept it in lieu of the intro stat requirement, so it was very popular (at least when I stopped following it a couple years ago). There is also a follow-on intro Probability course that mixes calculus and simulation, which sounded pretty exciting; and a couple upper-div courses in a similar mold.

    I’m a CS person, so I also like that the students see more concrete applications for programming at the early part of the learning curve. Intro CS courses tend to lose people because the applications are either abstract or toy-ish. I saw some students who started with little prior interest in quantitative subjects, but came out of their first semester able to work with data on their own (in very basic ways).

    Calculus and linear algebra themselves could really use updating in a similar mold — have students write code for visualization and approximate calculation to develop very concrete understandings of local linearity, optimality conditions, symbolic differentiation, infinite series, 3+-dimensional spaces, eigenvalues, .

    • There are good arguments in favor of having more applied/computational introductory classes in statistics, calculus, and linear algebra (and I admit I really like the computational approach to Bayesian statistics in McElreath’s Statistical Rethinking), but at the same time I find the general drive towards making introductory classes more applied and less theoretical/mathematical kind of sad. My sense is that more theoretical pursuits are becoming increasingly marginalized in the modern university—they are not valued by either the students or the university itself. The general attitude is aptly summarized in this quote: “The key causal effect which our students want our classes to have is to improve their future performance and opportunities.” This kind of thinking is very common among both students and university leaders, and I’m not saying it’s not understandable, but what happened to learning something for the sake of intellectual exploration and growth, regardless of the practical payoff?

      Also, don’t forget that there are still many students who are more theoretically than practically inclined (especially in places like Columbia and Berkeley, I would think). I took a year long course in programming that was very practically oriented, and it completely turned me off of programming for over a decade. Meanwhile, I learned linear algebra through Axler’s theoretically oriented “Linear Algebra Done Right,” which I loved.

      • Your standard introductory statistics course isn’t theoretically/mathematically oriented. If it’s oriented at all, they’re verbalistically oriented in the sense of teaching student’s the right words to say to defend their papers, plus some mostly unrelated calculus/algebra problems used as a grading-curve-generator. To say this another way, my advice today to the theoretically oriented students you describe is to skip intro stats altogether and jump straight into analysis and probability theory.

        • I agree. Introductory classes in statistics (certainly the one I took back in the day) tend to emphasize rote memorization of various formulas. I certainly think a more computational (and more conceptual) approach is better than that. I object more to Henry Milner’s idea of making calculus and linear algebra computational.

        • I think introductions to single-variable calculus are pretty varied in focus and quality, but i think most introductions to linear algebra are really matrix algebra with a heavy focus on multiplying matrices, so I think it’s quite possible to make first courses in linear algebra more theoretically interesting and involve more programming at the same time.

        • To be clear, I’m talking specifically about the introductory courses from math and stats departments intended for students of other disciplines. The faculty tends to treat these as an obligation and do just a bit less than the bare minimum. I don’t have an advanced degree, so maybe there’s a wrinkly-big-brain reason for this that I can’t comprehend, but it seems like it should really be an interesting exercise in stripping a topic to its barest essentials.

      • > what happened to learning something for the sake of intellectual exploration and growth, regardless of the practical payoff?

        I, and many/most students, are still very much in favor of that! As always, life is lived on the margin. No one argues that every course should be selected to increase one’s future salaries. Similarly, I (and almost all students) believe that, among the 32 courses one takes, some thought should be given to their causal effect on future salary. The balance should be left to the student. University leaders recognize that faculty, in their course offerings and pedagogical choices, are, perhaps, not meeting student demand.

        To students I often say, “You came to Harvard to study philosophy (or poetry or astrophysics or dance or . . .). That is great! We have many wonderful courses! Study what you love! But, at some point, you might ask yourself the following question: ‘Is there one course I can take which provides the greatest marginal increase in my post-graduate opportunities?’ My course is designed to be the answer to that question.” And I think the same is true, or could be true, of other intro data sciences classes.

        • OK, that is quite reasonable. By the way, I took a look at your course website, and I do think it looks like a very neat class.

        • Preceptor/Dr. “I am the rope”, “Big Daddy” Kane,

          Brah, nice class – https://www.davidkane.info/files/gov_50_fall_2020.html

          I also came across this metaphor as a (sorta) undergrad taking an Econ class. The professor stood before 150 or so STEM students at a top university and asked “What would Ulysses do?”… and there was silence.

          Maybe we should be pushing for more close-reading literature courses. Apparently you can learn everything from reading old stories. (I certainly learned much about how to think about the world from thinking about novels – unclear if my economic demography work would exist without Blood Meridien and Absalom, Absalom).

          … Anyway, snark aside, this really does sound like a core-important course to becoming a well-rounded person capable of making sense of the information the world throws at them. I think of Inro to Macroeconomics in the same way – it literally teaches you how to read the economic content of a newspaper (n.b. – I learned almost nothing useful in graduate Macroeconomics). I think many of us hope Stats 1 can be a similar type of “learn to understand basics of the information the world throws at you” course, and are disappointed we fail so badly at doing that (or sometimes that our colleagues in Stats/Econ core classes don’t hand us better-trained students).

          I love that it doesn’t matter the application for this course, just that you are getting your hands on some data and doing something with it. The easiest way to connect numbers to the real world (for me and those like me) is to have something in the real world you are already curious about and have thought about and would like to quantify in some way. So you intuitively already know what you want to look at and how you think the relations go (you’ve thought about the outcomes and metrics and potential mechanisms if not in those terms), now you are just learning the core tools to actually do investigate them. I mean – I love that the projects go from Disability Litigation to a Minnesota Fishing Guide to Mixed-Martial Arts (though I question the preceptor who did not push them to Boxing – MMA, like American football, is barbaric; Boxing, like soccer, is art).

          My version of this course, framed as a “quantitative research in the social sciences” for senior undergrads interested in going to grad school, isn’t this good. Partly because it is just a huge ton of work to supervise so many independent projects; partly because we only get a quarter not a semester; and partly because I’m not as good a teacher as some…. oh, I see, you have a staff of like 20. Nevermind – with a staff of 20 I could be the rope too.

          PS – I also think a Philosophy/Writing course on “how to outline, describe and critically write about an argument” should be up there with Stats 1 and Macro 1… at least in a better world where we are actually training our students on how to think and then communicate those thoughts clearly. Literally of my 200 or so undergrads a year, maybe 20% had ever made an outline or even know what an outline for a paper/argument is. That blew my mind. Sure diagramming sentences might be a little silly, but being able to outline an argument is a fundamental skill for advanced thinking of any sort.

  4. I like the idea of pushing Git hard.

    I don’t see why this is framed as a problem with math. Seems more like the problem is with the formulas in stats books. “Teach randomization-based inference” sounds like traditional stats. So it’s the same material without a dependence on formulas?

    What about “Kill The Formulas and Let the Introductory Course Be Born”?

    The argument sounds like it’s better to learn how to solve these problems with a computer than using tables of formulas. That should be a simple enough argument to make, and then you won’t have to apologize to people who bring up counter-examples of valuable math.

  5. “The key causal effect which our students want our classes to have is to improve their future performance and opportunities. ”

    Your the professor. You tell *them* what they *need* you don’t give them what they want.

    • 1) Every instructor has a moral obligation to clearly explain what causal effects (he thinks) his course will have.

      2) Every US college gives students a great deal of latitude to choose among courses. With rare exceptions, no US college tells students what they *need*.

      3) There are many causal effects an intro course might seek to achieve. Example, one might want to help students (from a comment above) “feel at home when [they] see a formula.” Nothing wrong with that!

      4) My claim is that many/most students would prefer a course which achieves the causal effects I aim for rather than the causal effects which many/most other intro courses (implicitly) aim for.

      > you don’t give them what they want

      Yes, you do. Or, at least, you are honest about the causal effects and, if they choose your course, you do you best to achieve those effects. (Students, obviously, are in no position to judge whether or not a specific lecture/assignment/reading) will work. It doesn’t really make sense to even discuss what they “want” in that context. But students deserve — or at least think they deserve — the right to decide about their own goals for their educations.

      • > Every instructor has a moral obligation to clearly explain what causal effects (he thinks) his course will have.

        When I think about causal effects of the various strategies in the class, I think compared to a baseline class. But the causal as you’ve used it here seems like in comparison to not taking the class?

        And really the baseline might change over the years if we were thinking about improving a class. Maybe the baseline here is in comparison to last year’s strategy (which may have included randomization), and then next year is really in comparison to this year.

        Why not call them course objectives? The focus here seems to be stacking students with useful programming skills that’ll help em’ land jobs.

        Is there any component of this that is trying to measure a causal impact of the strategies? If not, why use the word?

        • +1 to David for framing it as a causal inference question in the first place, which helps focus the discussion, for example by leading Ben to consider what is the comparison.

        • > I think compared to a baseline class. But the causal as you’ve used it here seems like in comparison to not taking the class?

          I don’t think it matters much. There are, of course, hundreds of possible causal effects. One is a comparison with not taking the class. Others are comparisons with any other class a student might take. However you define it, my claim (which might not be true) is that, for the average student, the causal effect of my class on future success/opportunities is positive, and often large. True for every student? No!

          > Why not call them course objectives?

          Nothing wrong with that terminology. But, in my syllabus, I prefer an explicit promise:

          —-
          No course at Harvard does a better job of increasing your odds of getting the future — the internship, the job, the graduate school, the career — that you want.
          —-

          > The focus here seems to be stacking students with useful programming skills that’ll help em’ land jobs.

          That is part of the focus, but my claim is broader. Consider an outcome like “grade on the senior thesis.” My claim is that my course, because it teaches reproducible research processes over and over again, has a positive causal effect, at least compared to the vast majority of other courses. So, this is not just about jobs.

        • There is no way to know if this class, or any class, can deliver anything at all like the Explicit Promise (we do know of many classes that *cannot*, but that’s a different matter). If you claim that it will be a well-spent semester for most, every self-respecting instructor believes the same, so there’s not much information in saying it, and putting it twice on the same syllabus makes it sound like nervous adjunct faculty organ-grinding to raise the enrollment.

          Other than removing the mathematics (or formulas), this is a package of proposals that are not particular to statistics, that one can pick and choose independent of each other. The programming, reproducible research, Github portfolios, self-learning, problem solving — all of these could be done in virtually any STEM class, or made into the computing equivalent of a writing requirement. In essence that’s what you’re doing, a prototype of an open-source computing requirement with R and statistics as the application area. Not that there’s anything wrong with that! But, especially with Harvard students and the kind of jobs they go into and the roles they often play, giving them some Math Lite exposure and the feeling they understand a bit and have the skills to operate independently is a double edged sword if not followed up by Math For Real. The kids on the lacrosse team who will get an English degree and a trading job at Merrill Lynch — your target audience — are prone to overconfidence and bluffing as it is, and having that turbocharged with R skills but without the traditional amount of math is not necessarily an upgrade. For one thing, the math serves as a filter for IQ and other traits that would be useful as a counterweight in these jobs you are promising the kids.

        • > There is no way to know if this class, or any class, can deliver anything at all like the Explicit Promis

          Uh, inference is possible, you know. Weird that you would hang out here if you did not believe that. First, we could actually run the experiment! Take 100 students, randomly assign some to course X and others to course Y. See what happens. Even without random assignment, progress is possible. Check out Regression and Other Stories. It has lots of good advice!

          > If you claim that it will be a well-spent semester for most, every self-respecting instructor believes the same

          I make no claim about “well-spent.” I am making a specific claim: Take this course (or a course like it) and you are more likely to get job X, working for firm Y, in city Z. That is not the same thing as “well-spent.” Although all faculty believe in “well-spent”, very few are interested in concrete promises/plans for increasing these odds. (And, to the extent they are, they rarely explicitly discuss it. And, to the extent they discuss it, it is rarely clear that they have thought hard about it.

          > nervous adjunct faculty organ-grinding to raise the enrollment

          That is fairly redundant, albeit all too true. All adjunct faculty are nervous. Any adjunct faculty with low enrollment won’t be faculty for long.

          > this is a package of proposals that are not particular to statistics, that one can pick and choose independent of each other.

          True.

          > But, especially with Harvard students and the kind of jobs they go into and the roles they often play, giving them some Math Lite exposure and the feeling they understand a bit and have the skills to operate independently is a double edged sword if not followed up by Math For Real.

          This is wrong in several ways. First, Harvard students are not that different from other students at the top 25 to 50 schools. Yes, the entire distribution is shifted right, but there are still huge overlaps, especially among the sorts of students who take my course. The top quarter of students at, say, Vanderbilt, have higher SAT scores than the bottom 1/4 at places like Harvard/Yale/Princeton.

          Second, “the kind of jobs they go into” are incredibly broad. They aren’t all Goldman Sachs bankers and Facebook engineers! They go into the same broad set of jobs as the rest of the top 50 schools.

          Third, “Math For Real” is useless for 90%+ of Harvard graduates. And, even if it were not, the vast, vast majority of my students will never study it.

          > The kids on the lacrosse team who will get an English degree and a trading job at Merrill Lynch — your target audience — are prone to overconfidence and bluffing as it is, and having that turbocharged with R skills but without the traditional amount of math is not necessarily an upgrade.

          This is confused. I am making a specific claim: The lacrosse/English student who takes my class is more likely to get that Merrill job (assuming he wants it), then he would be if he did not take my class. Do you disagree with this as an empirical claim?

          You seem to be arguing that, from the point of view of society (or Merrill?), this is not an upgrade. That is an interesting question! But do you really think you know better what is good for Merrill than the people who run Merrill? I doubt that!

  6. A substantial chunk of the graduate-level climate statistics course I taught recently was Allen Downey’s Think Stats 2e, https://greenteapress.com/wp/think-stats-2e/
    as a refresher for those who’d had some stats and as a primer for those who hadn’t. Python-based, which makes it good for physical scientists. Starts with exploratory data analysis, has a github repository, emphasizes computation over math, etc. I liked it.

  7. Sorry to not fully agree. Maths can be useful to practice statistics if they focus on asymptotic theorems (showing the role of iid assumptions in CLT for instance), or on set theory and measure theory to provide a good understanding of what is conditioning and biases. Let’s add the definition of orthogonal projection, that is a good way to speak about the difference between linear regression (in the context of extrapolation) and PCA denoising (in the context of interpolation). They are so many examples…

    And I am not talking about how a good way to teach the Curry-Howard theory can help in architecturing python packages (in fact packaging for any functional programing language).

    I strongly believe that there is a balance to find between teaching anecdotical recipes and generic principles, especially when the programming language and statistical techniques are evolving so fast. Real maths are an ideal context to build strong intuitions that can be then used a generic way.
    Who wants to teach “best practices” that are deprecated after 18 months?

    (full disclosure: I learned stats in France and I am teaching in a master of probability applied to finance in France)

    • Lehalle:

      Math is a great way to understand many aspects of statistics—if you understand the math. And there are probability courses and probability-based theoretical statistics courses that cover statistics that way. But here we’re talking about the one-semester statistics course, which zillions of students take every year. And many of the students in these classes won’t come close in the time allocated to understanding the central limit theorem, let alone the role of specific assumptions in that theorem. Our students have enough trouble understanding the basic algebra in a formula such as sqrt(p*(1-p)/n). See here for a ralted discussion.

      Also, best practices are not deprecated after 18 months. Our book on Applied Regression and Multilevel Models was published in 2007 and our book Regression and Other Stories was published in 2020. That’s 13 years, not 18 months, and yes some of the practices of that earlier book have changed, but most of it still holds.

      Here’s an analogy. Suppose you’re an American student taking a class in French or Spanish or Italian. If you already know Latin, then it makes sense to understand this new language in relation to its classical predecessor. But if you don’t know Latin, then I wouldn’t recommend that the teacher first teach you Latin, just to help with Spanish or whatever. It would be great to learn Latin too, but time is limited and the effort taken to learn Latin will have to be taken from somewhere else in the course. That’s how I feel about advanced math in an introductory statistics course.

      • I understand your point and I am not fully aware of the level of knowledge of students of this “introductory statistics”, but let me (just for the pleasure, and to answer to Kevin’s comment on measure theory) add that I am not convinced that what I suggest is really difficult.

        To go back to your analogy with Latin, the concept of plural (or genitive) can be explained and be very useful to understand “latin languages”.

        My point would be: how do you plan to you explain the concept of iid? either you say “it is a repeated experiment, always with the same conditions”, and according to me, you are not delivering a lot of information. Either you give examples of memory and dependence to show what it is not. In this second case the CLT will be hidden somewhere in your mind when you will explain this. My only point is: how much do you want to disclose about the CLT and its relationship with iid?

        Same for measure theory (by the way: measure theory is very simple if you really consider Haar measure for what it is), do you need students to understand what is the support of a distribution? in fact, how do you plan to explain the fact that if you observed only 0 and 1 up to now, you have to make some assumptions to believe that you will never see a -1 or a 3 occurring? My opinion is that you will have measure theory, and in fact potential theory, in mind. Do you want to give examples with 0 and 1, or do you want to give intuitions a more generic way.

        This discussion is in fact about the level of abstraction at which you decide to teach. Of course you have to start where your students are in terms of capability of manipulating abstract concepts (and probably a small fraction of French students are very familiar with abstraction). I may be old school but I tend to believe that it is possible to give intuitions on abstract concepts instead of asking to learn formula. Moreover, I strongly believe that abstraction does not implies complexity, and because of that I think it is possible to use it to teach.

  8. I’m going to go in a different direction here. All of math is subsumed under computation, and all of computation is symbolic logic and language which is the foundation of algebra. Furthermore, all of calculus is subsumed under algebra through nonstandard analysis. So “introductory statistics should feature almost no mathematical/statistical formulas beyond simple algebra.” Is meaningless because all of mathematics is algebra and the word “simple” is undefined here. We may as well say “introductory statistics should feature no math other than all of math.”

    • > Is meaningless because all of mathematics is algebra

      Well, the paper (page 4) goes into some detail, providing a list of most/all the formulas provided in an (excellent) introductory text, Intro Stats by De Veaux et al, and arguing that these specific formulas should not be taught. It is not that these formulas are evil or wrong. A second or third course might well cover them. My argument is that they don’t belong in an intro course.

  9. I’m pretty much at the other extreme from lehalle and some others here. I’ve taught the one semester stat course (both undergrad and MBA) for more than 20 years with no math beyond simple algebra. I have disagreements with my colleagues who still insist that students should compute a standard deviation from a data set with n=10 before using the computer to look at variability. I’m still not convinced that the computation by hand (and yes, it is simple algebra) is worthwhile. Now, the latest trend as people start removing the mathematics is to replace it with programming. So, let me make this more controversial – I think the intro stats course need not use any programming! I know this will be a minority view here, and I am not going to use the words “should not” but I have taught the intro course for more than 20 years without using any explicit programming (I use JMP, and of course, there is a script behind the scenes).

    I accept that programming is an essential part of stats education – but I don’t accept that it is a necessary part of the intro stats course. I here statements like we shouldn’t teach methods such as boosted trees without opening the black box by having students use programming (it was R until a few years ago, now it is mostly Python). But writing the code to implement a boosted tree in R or Python (or JMP, for that matter) is just as much of a black box as clicking on a command to execute a boosted tree. Programming does not open the black box – it does provide useful skills in logic and workflow, but in itself it does not provide any insight into what goes on in the black box.

    My take on the intro stats course (or data analysis or data science, terms which I have been unable to find meaningful distinctions between) is that the essential skills are working with data, appreciating the importance and limitations of measurement, understanding variability, thinking multidimensionally, thinking about causation and correlation, and exploring predictive modeling. I maintain that none of these require much mathematics or much programming.

    I also think too many intro courses are viewed as the entry gate to a major (this is true for more than statistics). I reject such thinking. They are often staffed by the least qualified faculty (there are exceptions, but I believe if you look nationwide you will see that tenure track faculty, already at <25% of the overall faculty, are under-represented at the intro level). I believe the intro course should give students a deep enough taste of what the subject means to appreciate what it means to think like a statistician, economist, sociologist, etc. I think this comes from depth, not breadth. Save the broad overview of the subject for the majors – when that is the intro course, most students come away with no idea what professionals in that area really think like. Instead they only get a sense of whether they are likely to get good grades in a subject or not – perhaps the least useful thing to learn from an intro course (and most dangerous).

    I think that is enough for people to disagree with for now.

    • Yeah of the things listed at the end, only two are programming:

      – Use Git and Github
      – Use an open source programming language
      – Teach randomization-based inference
      – Flip the classroom
      – Cold-call
      – Require that student work be reproducible
      – Require that student work be public
      – Require solo final projects

      I could see getting a lot of mileage out just flipped classroom/cold-calling/public/reproducible work even without the programming (though I like the programming).

      • One part of your comment spurs me to observe: nobody ever calls for students to learn an open-source word processing program. Why is it ok to use Microsoft Word but not a commercial statistical package? The open source issue seems like a red herring to me, particularly in the intro stats course. Why not use the best tool for the purpose of the course – and if that is understanding data, why is open source a requirement?

        • From my perspective, free and open source materials are best because they are free. Many of my students have limited financial means and I don’t want that to be a barrier to entry.

          Moreover, I’ve yet to see any package or textbook that I would consider the “best” tool or text, whether free or not. Every book I’ve used has good and bad, same with every software package. So I might as well use one that doesn’t cost anything.

        • Most software publishers (and JMP, in particular) provide inexpensive site licenses to colleges, so there is no financial barrier for students to use the software in classes. What you say is relevant beyond graduation, but there is plenty of time for them to learn open source software by then – we are talking about the intro course here. Further, I wonder if you adopt the same attitude regarding word processing software, presentation software etc.

          There is plenty of reason for you to adopt a particular computing package and it may differ from my choice. But I was responding to Ben’s suggestion that an open source package be one of the guidelines for the intro course. I don’t see the rationale for this, and being free seems a poor reason to me (especially when there are free textbooks and most people use commercial texts, which are far from free).

        • > Ben’s suggestion

          Those are David’s suggestions, to be clear.

          Hmm, maybe this is like the Math thing. In that case I think the real complaint is against the formulas, not Math. Maybe ‘use open source’ is a way of saying ‘use R instead of SAS’ or something (or whatever the intro people use in their department).

          Why-not-open-source-alternative-to-Word is interesting though. I’m not sure. I guess that is OpenOffice/LibreOffice, and those certainly aren’t as widespread as the pricetag might suggest they be. There are probably some places that push LaTeX. Maybe in the future there will be people pushing various markdowns (though these are mostly attached to coding). I’ve seen people doing markdown presentations and they are decent.

          I was MechE grad school, and we were teaching students Matlab and Labview, which both get really pricey really quickly. We also did Arduino but they’re all different things so I don’t think you could ditch Matlab and Labview just cause of the price tag.

        • Wonder no more! I do indeed adopt the same attitude regarding word processing and presentation software. I personally use free and/or open source options on those fronts and do not require students to use Word, Powerpoint, Excel, etc. (though many still do, of course).

          When I started writing this reply, I also didn’t view “open source” as any kind of “guideline”, just a practical choice. But then I talked myself into seeing how it can be valuable for an intro class. I view one of the goals of an intro class as giving students a base that will allow them to learn more about the topic on their own. To that end, open source can make it easier for students to engage with the material outside of class and post-graduation.

          You say that students have “plenty of time” to learn open source software between the intro course and graduation, but I don’t think that’s true. Students are taking classes, working, doing theses, volunteering in labs, etc. (and, I suppose, “living”). Chances are, they will only learn a new tool if a class or job gives them the opportunity and incentive to do so. As you say, having a free or low-cost academic option doesn’t help students once they graduate. So if their intro course uses publicly available free/open source tools, they can hit the ground running.

        • Sorry to disagree, but I do. I’m sure there are some students that fit your description, but of the many different students I have taught over a 40 year period, the vast majority are taking 1 statistics course, sometimes 2 or 3. My belief is that being able to exert control over their manipulation of data so as to get closer to the truth in that data, can have a lasting impact on their quantitative literacy. On the other hand, those taking one course using R – unless they follow with additional courses – almost never use the tool again. The relative pain of working with the data (again, I am not speaking of someone experienced with R, but someone exposed to it in their one statistics course) means they don’t do it any more. They major in X, Y, or Z and eschew quantitative analysis if they are allowed to (and, as we know, in many disciplines they are permitted to do so). On the other hand, if they have a powerful and easy to use tool that permits them to really work with data, they tend to use it in other classes. The more they use it, the more they feel comfortable with data and the more likely they are to keep using it – and eventually learn to use open source tools, if they don’t have good access to the costly ones.

          Clearly some will disagree with me, and some people will have had very different experiences. But I would point out an additional data point: Tableau has been an immensely successful business product despite its high cost and it limited analytical capabilities. They got some things right that many people miss – good marketing, user centric focus (although I don’t find it as intuitive as JMP or even Power BI), and no need for coding. How else to explain their success with the open source Python and R visualization packages that are readily available?

        • I think we actually largely agree, but since I was responding to a comment in response to your original post, I think there might be a bit of a confusion. Conditional on using software at all, I do think using open source has some merits. But the big picture is that I actually agree with your original post that software isn’t any more integral to intro stats than math. Statistics isn’t about math and it isn’t about computation, it is about using data to reason, learn, and decide.

          You’re absolutely right that the pain involved in learning the tools can put students off, and this serves them poorly in the long run. One approach is to focus on tools that are easier to grapple with off the bat, as you say. A related approach is to minimize the pain by focusing on a limited set of use cases. For example, when using R in intro stats, I treat it like a glorified graphing calculator rather than a programming language, which largely avoids a lot of the pain that comes from trying to convey concepts like vectors and data frames which are ultimately irrelevant to what I want the students to learn.

          You can reasonably ask, then, why use R if I’m not really doing much more with it than one could with Tableau? One answer is the open source one, which I think has some merit but I concede it is not a very strong case. A better answer is that I want a tool that can simulate as well as graph data, and to make it easy to slot simulation into the pipeline. I think an important part of working with and understanding data is being able to think about what might have caused the data. This can be covered with thought exercises, but I think being able to quickly simulate makes it pop. Most GUI analysis programs don’t provide simulation functions, so I find them less useful in that context (JMP is an exception). Spreadsheet programs can simulate, but they don’t make data exploration easy (“making the damn plots”).

          So in the end, I’ve picked R not because I think it should be any kind of general guideline, but because it is a one-stop-shop for what I try to convey in the way that I use it.

        • > Why is it ok to use Microsoft Word but not a commercial statistical package?

          On average, it is much easier for students to find work — both inside and outside of academia — if they are skilled in R (or Python) relative to SAS/Stata/Mathematica/whatever. There is no comparable advantage to forcing students to use an open-source word processor.

          > Why not use the best tool for the purpose of the course

          You should! The entire point of the article is that we don’t think hard enough — or, at least, communicate clearly enough to our students — what is the “purpose” (which I equate to “causal effects sought and/or achieved) of our courses. If one of your purposes is to help students get jobs, then using R/Python will help them more than using some other language.

          To many faculty in my experience (not necessarily you!) don’t really care too much about the causal effects that their pedagogical choices have on student employment opportunities.

        • I don’t believe anybody is going to get a job because they took intro Stats with R. I do believe that if they have an into stats course that provides the critical thinking about data and shows them what is possible, and they then take further courses (with R or Python), that they will be both eminently employable and capable.

        • Siri, show me a straw man.

          > I don’t believe anybody is going to get a job because they took intro Stats with R.

          The claim is that taking a course like mine increases a student’s chance of getting a job (which they want). Or, equivalently, take 100 students, randomly assign 50 to my course and 50 to another course. The 50 assigned to my course, will, in aggregate, do better on the job market.

          Everyone agrees that if students take several courses using R/Python they will be more employable than if they don’t.

        • My 50 will do better than your 50 (p=.049). Seriously, and in the spirit of your original post, a lot depends on the audience. You teach at Harvard, I do not. What works for Harvard students might not be what works for my students. I’m not saying that my students can’t handle learning R, but I am saying that they take to it better after using JMP for a semester. Ultimately, I believe we both want our students to learn how to think about data and make sense out of it – and to get jobs. The question is what is the best pedagogical approach to realizing that. What I’ve seen is a large number of students give up on quantitative reasoning after a semester with R before they have any good experience working with data. Not all students, to be sure, but given the broad lack of any numeracy in the population, I prefer to find approaches that will reach more of them. I don’t think I give up any rigor, however, I am only dropping the coding as part of the first exposure to statistics (mostly for business students).

    • Dale said,
      “I have disagreements with my colleagues who still insist that students should compute a standard deviation from a data set with n=10 before using the computer to look at variability. I’m still not convinced that the computation by hand (and yes, it is simple algebra) is worthwhile. Now, the latest trend as people start removing the mathematics is to replace it with programming. So, let me make this more controversial – I think the intro stats course need not use any programming! I know this will be a minority view here, and I am not going to use the words “should not” but I have taught the intro course for more than 20 years without using any explicit programming (I use JMP, and of course, there is a script behind the scenes).”

      I agree that having students compute a standard deviation by hand is not worthwhile. But what I think is important that Dale seems not to mention, is that it is worthwhile to help students understand *why* we use the formula we use. (Or to put it another way: Why the formula is a reasonable way to measure variability.)

      • How in the devil other than having a grip on the mathematical form of the problem, is the programmer ever going to get a grip on the problem of accuracy, loss of accuracy; as it relates to different data sets and different ways of computing — say — the standard deviation? E.g. in one pass (and you risk the expression for the sample variance turning negative) or in two passes; or in the iterative “kalman filter” form, updating the mean and the SD one after the other, one new point at a time. Frankly there are subtle problems lurking in loss of precision with the recursive computation of a mean, rolling through a gigantic data set; at some point, the new datum will no longer have any influence at all; not unless one comes up with a scheme for blocking the calculation. Or, for that matter, summing a huge list of numbers in one direction, or the opposite direction may not produce the same answer at all. I bring this up because “big data” is supposed to be where the action’s at — so they say. Well the bigger the data the bigger the computational subtleties. If the programmer hasn’t any mental model at all for the numerics (and logics) of the computation then that programmer isn’t welcome in my line of work where the stakes aren’t merely heated disputes which miss the mark; but things like mars landers that miss the mark or worse.

  10. I think there should be two little courses introductory statistics, taken in parallel. One is just discrete probability theory. It should be light on “math” that requires lots of pre-amble, like calculus or linear algebra, but lean heavily into proof-style reasoning, combinatorics, and maybe some induction, all of which can be very tough or tricky but not requiring much time the way integration does.

    The second is just descriptive statistics, with programming. Just leveraging visualizations and summary statistics to compare particular properties of datasets. Giving people something like the Jon Snow cholera dataset and asking them to figure out what’s going on and defend their argument with plots, facts and logic.

    I don’t think there should be any inference in introductory stats. I know that’s what everyone wants to get to, but I’m not really sure that inference means anything without a solid grounding in basic probability theory.

  11. I agree with a lot of David Kane’s suggestions but simulation – as the main focus in computation – seems to be getting too little attention.

    Now, if one stresses simulation is just a choice of mathematical analysis, one may avoid some of the pushback. The “Kill math” can be replaced with change the style of math.

    Some links https://statmodeling.stat.columbia.edu/2020/08/05/somethings-do-not-seem-to-spread-easily-the-role-of-simulation-in-statistical-practice-and-perhaps-theory/

    https://statmodeling.stat.columbia.edu/2020/11/25/is-causality-as-explicit-in-fake-data-simulation-as-it-should-be/

    • Agreed on the importance of simulation!

      I think in many ways, math is often used to attempt to convey ideas that simulation makes more apparent. E.g., you find the mean and SD and look up in a table the probability of seeing that result. This is meant to convey the idea that there is a sampling process in the world that produces data of a certain kind and that it produces the kind of data you saw with X probability.

      But simulation reverses that. You say, I think the world is like X, now what would I expect from that and how far is my data from that expectation? Instead of going from data (that you care about) to a lookup table (which seems rote and boring), simulation lets you go from your idea about the world straight to the data.

      Well, I say “straight to”, but it does require some kind of computational interface in between. R is pretty good, but I do think it gets in the way as much as it helps. Still, I think simulation really helps make clear the connections between the world, data, and statistics that we want intro students to get.

      • gec said, ” I think simulation really helps make clear the connections between the world, data, and statistics that we want intro students to get.”

        Agreed.

  12. This whole discussion is mirroring the trend that started in computer science departments of moving away from theory. Most have dropped the automata and theory of computation component from the curriculum—CMU did that before I left in the mid-90s. And I think there’s a trend away from the Scheme-based intros which are nice for computer science theory toward Python-based intros which are nice for what one might actually do in the real world. Is that reasonable? Despite being someone who taught that theory class, I can see why they dropped it as a requirement.

    I really like Keith O’Rourke’s point about simulation not being about killing math, but about replacing it with a different kind of math. And I think that’s true at all levels. At the lower levels, I can’t recommend Jim Albert’s book Curve Ball too highly. It’s devoted to baseball, which is limiting. It does a beautiful job of teaching uncertainty and inference. If someone could translate its spirit to a range of domains other than baseball, I think it’d make the perfect textbook for an intro class like this one. One thing that makes baseball nice is that baseball fans bring a lot of knowledge to the table, so it’d be ideal if any similar book could do the same thing.

    I’m writing an introductory math stats textbook which I view as somewhere between DeGroot and Schervish’s textbook and Casella and Berger’s. I like Casella and Berger much better, but rather than leaning on calculus, I’m leaning on simulation. I’m assuming people can understand what the integrals mean for expectations and hences posterior predictive inference, but I’m not assuming the reader can solve differential equations. So yes, it’s going to be Bayesian in perspective, which is also a big jump from all the hypothesis-testing and estimation in the traditional books (too much of the former and not enough of the latter in my opinion).

    As far as measure theory goes, I do wave my hands on that in this book. But I talk about how it works. There are a bunch of ways the world can be (points in the sample space), and for each such way, random variables take on values. I think focusing on random variables is hugely helpful. But you need at least some idea of what it means to be a random variable. Then in concrete models, it’s very clear what the sample space is and you can leave all that measure theory stuff behind as scaffolding. A random variable under sampling is something that takes on different values with different draws. I think that’s a much simpler idea to understand than a mapping from a complete sigma-algebra to R^n with a measure defined over a subset of the subsets of the sigma algebra.

    What is useful is thinking about densities as limits of histograms (thanks to Jonathan Auerbach for explaining this to me). That ties it into sampling and simulation. And it gives you intuitions about calculating integrals using the trapezoid rule and why integrals are related to probability, whereas densities have units of change in probability (the density’s just the derivative of the cdf, but hardly anyone ever talks about its units).

    Also, it’s really easy to illustrrate the central limit theorem for coin fips using simulation. Just make log-log plots and you can read the n^(-1/2) coefficient off the slope of the simulations.

    I almost always agree with Martha on this site, I’m not sure understanding standard deviation is an important foundational concept. I know it’s traditional to evaluate estimators in terms of bias and variance and that p-values are all about tail stats, but I want to do Bayesian stats. There, I find it much easier to talk about posterior intervals as measures of uncertainty, as they directly capture probability.

    The motivation for standard deviation is subtle. Standard deviation only does a good job of capturing meaningful uncertainty in normal distributions—it’s not even defined for a Cauchy distribution and it breaks down terribly for constrained or skewed distributions. On the other hand, uncertainty intervals make sense in all of these cases. I remember back before I was even trying to learn stats (I used to work on logic-based linguistics and programming languages), I asked a colleague why stats uses standard deviation rather than absolute error. It’s hard to answer that without getting into the central limit theorem or proper scoring metrics. I get to that in my textbook, but they’re very rough topics for a basic pre-calc intro class.

    rm bloom: Addition loses precision on the smaller number equal to their difference in magnitude even when just adding two numbers together. In fact, if epsilon < 10^-17 * x, then x + epsilon = x in standard floating point because you lose every single digit of epsilon. This is rarely a problem in practice because 10^17 is a big number. It does lose some precision, but usually that's not an issue for a mean. Subtracting is the opposite---it loses precision when numbers are relatively close together. So variance calculations of numbers that are close together can lose a lot of precision. I used to teach a class session on floating point and API design for Andrew's stats communication class, and every year at least one student would literally walk out of class (one year, the student storming off shouted during class that this isn't statistics). I never thought I'd hear myself saying, "Kids these days!"

    • Bob: let us say we have a list hundred billion numbers and we want to compute its mean. Let us say (and we do not know this ahead of time) that the numbers run over several orders of magnitude in a series of waves or trends; let’s say the first 50 billion numbers are “large” and the second 50 billion numbers are “small”. The true mean is somewhere in the middle. Of course we work in floating point with fixed precision. If we add up the list in the order such that we run through the “large” section first; then when we get to the “smalls” they’ll have minimal influence on the the accumulation. In fact the accumulated sum will — in this circumstance — be order-dependent in quite a striking way. The problem has concrete instances which are not as contrived as it sounds when stated generically.

      • Never mind the mean: “let us say we have a list of 100×10^9 numbers and we want to compute its *sum*”. The trick to adding them up in right right order is surprisingly subtle — particularly if one is ignorant of the distribution of these numbers into subgroups of significantly different magnitudes.

    • Bob said,
      “I almost always agree with Martha on this site, I’m not sure understanding standard deviation is an important foundational concept. I know it’s traditional to evaluate estimators in terms of bias and variance and that p-values are all about tail stats, but I want to do Bayesian stats. There, I find it much easier to talk about posterior intervals as measures of uncertainty, as they directly capture probability.

      The motivation for standard deviation is subtle. Standard deviation only does a good job of capturing meaningful uncertainty in normal distributions—it’s not even defined for a Cauchy distribution and it breaks down terribly for constrained or skewed distributions. On the other hand, uncertainty intervals make sense in all of these cases. I remember back before I was even trying to learn stats (I used to work on logic-based linguistics and programming languages), I asked a colleague why stats uses standard deviation rather than absolute error. It’s hard to answer that without getting into the central limit theorem or proper scoring metrics. I get to that in my textbook, but they’re very rough topics for a basic pre-calc intro class.”

      My comment wasn’t an argument for using standard deviation — my point was intended to be, “If you introduce standard deviation, then …”

  13. Julian Simon (deceased) had an entire course founded on simulation alone. The notes used to be on his website. His comment over and over was, while few were up to the par of Bartlett and Yates and Fisher who could derive fascinating results about distributions of sampling statistics by manipulation of the integrals and the characteristic functions and so forth, the typical student could gain a great deal of insight by “driving the computer” — by pushing random samples through nonlinear devices and so on.

  14. I wish my intro stats course had just been 1) looking at a ton of plots from real study data and simulation data, and 2) learning about a large variety of probability distributions and how they relate to 1).

Leave a Reply to David Kane Cancel reply

Your email address will not be published. Required fields are marked *