Syllabus for my course on Communicating Data and Statistics

Actually the course is called Statistical Communication and Graphics, but I was griping about how few students were taking the class, and someone suggested the title Communicating Data and Statistics as being a bit more appealing. So I’ll go with that for now.

I love love love this class and everything that’s come from it (including statistics diaries and ShinyStan).
Here’s the syllabus [updated]. It’s full of fun reading and great activities, in and outside of class. The only thing missing are the jitts, but I like to keep them as a surprise. So if you want to teach this class—and I think you should, indeed I think this course should be taught everywhere and it should be a standard part of the statistics and quantitative social science curriculum—you’ll just have to write your own jitts. Otherwise the course pretty much teaches itself. And remember, with your guest visitors, keep the converstations short and focus. Long rambling discussions are fun, and they’re easy on the instructor, but ultimately you want to spend lots of class time directly on feedback on student work.

Now for the next 90 seconds I’d like you to talk with your neighbor and come up with a question to ask me.

OK, start yapping!

41 thoughts on “Syllabus for my course on Communicating Data and Statistics

  1. “but I was griping about how few students were taking the class.” i do not see any difference in the two above names for attracting students. you do not have enough sexy words in the class name and description. you need to include words such as bayesian, data science, machine learning, BIG DATA, and such. and for the graphics part, maybe call it infovis. something such as bayesian information visualization for the data sciences.

  2. Cleveland is a superb book, and a landmark in statistical graphics. But it’s more about using graphics to discover than to communicate. I highly recommend “Show Me The Numbers, 2e” by Stephen Few. It’s all about how to structure tables and graphs to communicate something that you’ve already discovered. The subtitle for the book is “Designing Tables and Graphs to Enlighten”. It’s also very reasonably priced, at $27.50 for a hardcopy with lots of color printing. http://www.amazon.com/Show-Me-Numbers-Designing-Enlighten/dp/0970601972/ref=dp_ob_title_bk

  3. Have you thought about running this sort of thing as a summer school for business / industry? I would bet you’d get a lot of takers (run over a week for example) and it is something that I think would be really useful and important for a lot of people in these areas.

  4. Andrew: I suppose this might be covered under some of the subheadings, but in terms of advertising a class on communicating statistics, I would have thought to include communicating concepts (significance levels, power, confidence levels, Bayesian concepts and distributions, correlation, replication, resampling, etc.). But maybe this is a different kind of class.

  5. Good stuff. Probably one of THE most important courses for people that use and analyze data which is rarely/never taught. Would love to see some visuals from the final projects. Our course at the UCB Information School is called Data Visualization and Communication which has nice ring to it: https://datascience.berkeley.edu/academics/curriculum/data-visualization/.

    “Statistical Communication” actually sounds more like “a statistical theory of communication” rather than “a way to communicate with statistics.” Definitely prefer the new name.

  6. Here’s a rather general question about effective communication of data & statistics: All these books and courses, what they sell as the “effective” way or the recommended strategy seems guided by what the author personally thinks as the best way.

    But are there any books based on *empirical* measurements of what is actually effective in communicating statistics? I always found it odd, that for a field that relies on data the niche of statistical communication seemed almost entirely based on ideological grounds rather than any evidence of what actually works best.

    Is there any movement on putting effective statistical communication on an evidence based framework?

    • Rahul:

      There have been some attempts to measure statistical communication (and we discuss this a bit in our class) but it’s difficult. Regarding your other point, I would say we make many of our decisions and recommendations based on our experiences and introspection, not so much based on ideology (Rep. Chaffetz aside).

      • Andrew:

        I didn’t mean political ideology. Methodological or academic ideology.

        When a doctor or public policy maker takes decisions based on introspection alone, we try to nudge him towards RCTs & meta analyses. Evidenced based Medicine & all that. No harm in injecting a bit of that empiricism in Statistical Communication?

        My personal opinion is that a lot of what passes as accepted wisdom in stat. comm. will prove to be just plain prejudice when put to rigorous test.

        • Rahul:

          Introspection isn’t perfect but it is a source of data. I do not think it is the same as ideology, academic or otherwise. I think the analogy goes like this: introspection is a source of detailed but uncontrolled data. Ideology is a crude sort of model of the world. We need both data and models, and we should try to get the best possible data and the most reasonable models. I’m doing the best I can in both of these and I welcome the work of others. Communication is harder to evaluate than medicine because the outcomes are not so clear. I’ve heard that newspapers evaluate articles based on hit counts but that’s not really what we’re looking for here.

        • Andrew:

          I disagree that communication is fundamentally harder to evaluate than medicine. People just don’t try as much. Maybe there isn’t enough money in it?

          The term “communication” is too broad to measure. I’m sure smaller parts of it can be usefully defined and measured.

          When researchers routinely try to measure such vague things like “happiness” and “satisfaction”, communication can hardly be an exception?

        • Rahul:

          Happiness and satisfaction are hard to measure too, and the work in that area is controversial, as it should be. I think statistical communication is indeed fundamentally harder to evaluate than life or death, T-cell counts, heart rate, time on the treadmill, bone density, etc.

        • Andrew: I think the deeper problem with evaluating successful communication in statistics turns on the fact that controversy surrounds some/many of the concepts and methods. The field wouldn’t be having that ASA pow wow of statisticians––just as one example–– in order to give guidance about the nature and value of statistical significance tests and related methods, if this were not the case. Even if some officially sanctioned definitions or recommendations emerge, I can well imagine different statisticians scoring students differently, if we were to envision a test of effectiveness of communication. I don’t think this should be so, nor do I think it must be so, but it seems likely in the current climate.

        • And I can’t be the only person who has found that thorough, rigorous introspection has led me to question, modify or even reject a model I had been quite attached to.

  7. The only area where I know of attempts to measure something like the effectiveness of displays is in the educational area where effectiveness may be measured by some type of test performance. The reason why effectiveness is rarely measured (if at all) elsewhere is that effectiveness means different things to different people. I supposed a case can be made that Rep. Chaffetz may have very effective displays – it is just not what I think “should” be effective. Unfortunately, the most effective displays are often the most misleading ones – unless you have some type of more objective measure, such as ability to discern what the data is really saying. The money is often better in deception than telling the truth.

    • That’s a great point.

      If I had to design an experiment it would probably be a SAT / GRE style test where we show a “good” and “bad” version of a graph to a randomized cohort and then try and measure their performance on a subsequent set of graph related questions. You could add a test time constraint or a lag period between showing them the graphs and the testing.

      • Rahul:

        This sort of experiment has been done, but the challenge is in coming up with (a) reasonable comparison graphs and (b) reasonable test questions. It’s not that it can’t be done, but it’s not so easy to do well. See here, for example, for a discussion from five years ago of one such study that won an award, as I recall, but happened to be pretty much useless, in my judgment.

        • Maybe a survey experiment on MTurk (if it hasn’t been done already) would be a good way to tackle this question. The same data presented in different ways can be treatments and you can create questions which tap into “effectiveness” such as comprehensibility as measured by the amount of time it takes respondents to answer a question about the data presented, aesthetics etc. Collect a couple thousand responses and you’ll at least have a better sense about what does and doesn’t work.

        • Lefteris:

          Yes, there are lots of possibilities. I think these evaluations are difficult and I haven’t been impressed with some of what I’ve seen, but I definitely think it’s worth working on.

        • Indeed, people are doing work like this already. See for example Heike Hofmann and Dianne Cook’s studies of plot designs using Mechanical Turk: http://www.cs.tufts.edu/comp/250VIS/papers/Hofmann-graphicaltest-infovis2012.pdf

          There’s a long tradition of using experiments to evaluate graphs empirically. It dates back at least to this classic from 1984 by Cleveland and McGill: https://www.cs.ubc.ca/~tmm/courses/cpsc533c-04-spr/readings/cleveland.pdf

          There are also ethnographic studies: how are statistics and graphics used in practice, “in the wild”? Here’s a talk by Amy Griffin, reporting such work in progress: http://www.ncrn.info/event/ncrn-virtual-seminar-feb-4-2015

        • Very interesting.

          It’d be interesting to take someone like Tufte’s book & systematically evaluate the key recommendations, one by one.

          Perhaps too ambitious. But I think there’s hardly any money in this area. Lots of low hanging fruit.

        • (Not that those researchers set out to evaluate Tufte as such. But his advice is common enough, and not all original to him, that other people have studied the same ideas empirically.)

          I’d be curious to see such a point-by-point summary of the research on Tufte’s principles. As far as I can tell, he argues from authority or common sense, not from experimental research.

        • Jerzy:

          To loop back to the subject of this post, Cleveland’s “Elements of Graphing Data” is one of the required books for this class.

          Kosslyn’s book I was less thrilled with, as I thought some of the actual graphs in his book demonstrate bad practice. I pretty much agree with Kosslyn’s perspective but I couldn’t bring myself to assign a book on graphics that had this sort of problem.

          Not all of Cleveland’s graphs are beautiful but they’re all pretty clean, which I like.

        • @Jerzy

          Thanks for the tips about the books. Yes, I sure would like to read some research backed recommendations.

          Somehow, most exhortations I come across about graphs have very little actual basis in empiricism.

        • Andrew: glad to hear they’ll be reading Cleveland. Looks like a great syllabus.
          Just curious — did you ever get a chance to read Kosslyn yourself, and not just my review? Apart from that one section you hated (and his view of error bars), there’s plenty of good stuff too. But I agree Cleveland’s better for your students.

          Rahul: agreed!

        • Andrew:

          Which raises the question: Isn’t measurement a necessary step before coming up with a solution to a problem?

          i.e. If we cannot even measure the communication problem in any meaningful way, how good can we do at fixing it?

  8. Hi Andrew,

    It looks really nice. But I am wondering what class size did you have before and what class size are you aiming for. The approach seems only feasible for relative small classes or am I wrong?

    Louis

  9. Alternative titles:

    Visually Communicating Data
    The Graphic Design of Scientific Analysis
    Visual Research Communication
    Visual Design in Data Science
    Job Skills for High-paying Consulting Firms

Leave a Reply

Your email address will not be published. Required fields are marked *