In a comment to my previous post on the Street-Fighting Math course, Alex wrote:
Have you thought about incorporating this material into more conventional classes? I can see this being very good material for a “principles” section of a linear modeling or other applied statistics course. It could give students a sense for how to justify their model choices by insight into a problem rather than, say, an algorithmic search over possible specifications.
Good point. I’d still like to do the full course—for one thing, that would involve going through Sanjoy Mahajan’s two books, which would have a lot of value in itself—but if we want to be able to incorporate some of these concepts in existing probability and statistics courses, it would make sense to construct a couple of one-week modules.
I’m thinking one on probability and one on statistics.
We can discuss content in a moment but first let me consider structure. I’m thinking that a module would consist of an article (equivalent to a textbook chapter, something for students to read ahead of time that would give them some background, include general principles and worked examples, and point them forward), homework assignments, and a collection of in-class activities.
It’s funny—I’ve been thinking a lot about how to create a full intro stat class with all these components, but I’ve been hung up on the all-important question of what methods to teach. Maybe it would make sense for me to get started by putting together stand-alone one-week modules.
OK, now on to the content. I think that “street-fighting math” would fit into just about any topic in probability and statistics. Some of the material in my book with Deb Nolan
Probability: Law of large numbers, central limit theorem, random walk, birthday problem (some or all of these are included in Mahajan’s books, which is fine, I’m happy to repurpose his material), lots more, I think.
Statistics: Log and log-log, approximation of unknown quantities using what Mahajan calls this the “divide and conquer” method, propagation of uncertainty, sampling, regression to the mean, predictive modeling, evaluating predictive error (for example see section 1.2 of this paper), the replication crisis, and, again, lots more.
Social science: I guess we’d want a separate social science module too. Lots of ideas including coalitions, voting, opinion, negotiation, networks, really a zillion possible topics here. I’d start with things I’ve directly worked on but would be happy to include examples from others. But we can’t call it Street-Fighting Political Science. That would give the wrong impression!
> I think that “street-fighting math” would fit into just about any topic in probability and statistics.
I think this is exactly right — I’d go so far as to say that statistics (at least applied statistics) is the epitome of street fighting math. Applied statistics is not about creating a rigorous chain of deductive proof but instead about applying problem-solving strategies to define good approximations for the parts of a system of interest that we’ve decided are important. Theory here serves more as a suggestion for guiding us toward solutions that are likely to work (or more specifically, keeping us away from solutions that we know will not work), but no more. The problem with a lot of statistical education is that we spend a lot of time being as precise as possible about this theory, but spend much less time on the sort of problem-attacking strategies that Mahajan presents in his books (full disclosure: haven’t read them but had heard a lot). Students can leave statistics classes thinking that if they follow the right blueprint, they’ll do the “right” thing and arrive at truth from data without having to exercise judgment. Andrew, I think this underlies a lot of your frustrations with yes/no thinking, forking paths, etc in research that applies statistics. One of the beauties of these “down and dirty” approaches is that it gives a language for expressing your judgments about a problem in English so you can more naturally reason about whether your strategy for organizing complexity makes sense in the applied context.
I actually found the BDA course to be liberating in this respect — being able to specify your own arbitrary model put the role of the investigator’s (street fighter’s?) judgment front and center. Still, you could probably make this more explicit using the street fighting material. I’d love to have a week-long module where we look at a particular system (say basketball tracking data, traffic data, maybe even video — something very complex) and work through specifying models for answering interesting questions about those systems. I think a bunch of the principles in these books would be useful there, even if the examples are markedly different.
In a linear regression course, phrasing all of linear regression in terms of street fighting would be really fun. You can talk about the residual as the place where you stick all of the complexity that you don’t care about, think about model improvement in terms of successive approximation, and reason about functional forms and transformations using dimensional analysis. This gives students the right intuition, namely that linear modeling is a quick-and-dirty approximation to a response surface rather than a way to “prove” a pet theory (i.e., something like 90% of all sports analytics), and it’s a way for instructors to work general advice for how to apply linear models into their courses. I think a lot of the time, we end up talking about algorithms for automatic specification because model specification usually depends so strongly on the use case; Mahajan’s approximate logical forms here provide a nice alternative or complement.
How about taking a practical problem that the students will appreciate, apply the naive treatment, then teach a little street-fighting, and ask the students to re-do the analysis?
Statistics / Probability was one subject for me that was rich in counter-intuitive ideas / paradoxes. Could they be dissected in a course? Would it be useful or distracting?
e.g. Simpson’s Paradox, Non commutative dice, The Two Envelopes Problem, Monty Hall Problem, Ecological fallacy, Bertrand paradox
Almost sounds like there’s half a semester’s worth of work in just understanding & discussing all the counter intuitive results out there!
It feels like you really want to dive into the details but if some of these concepts could make it into a survey class I wanted to point out a Coursera course that Scott E Page called Model Thinking (https://www.coursera.org/learn/model-thinking). I loved this course. Maybe even more for the social science side of things it might give you some ideas.
I think you should include statistical analysis of music in your class(es). I know as I’ve gotten older I’ve developed a sort of classification system in my brain that lets me pigeonhole pretty much anything into a category–obviously, I’m doing some sort of off-the-cuff statistical classification but I’m sure it’s been formalized somewhere. The main advantage is that you would attract many more students, a large number of which never would have been interested in anything statistical (and I’m not talking about sampling, fourier analysis, or any of the other stuff they’d teach a recording engineer–while of interest, I’m sure that is already taught in some class).