The statistician over your shoulder

Posted on August 8, 2009 9:40 PM by Andrew

Xiao-Li wrote an article on his experiences putting together a statistics course for non-statistics students at Harvard. Xiao-Li asked for any comments, so I’m giving some right here:

I think the ideas in the article are excellent.

The challenges of getting students actively involved in statistics learning have motivated me to write a book on teaching statistics, develop a course on training graduate students to teach statistics, and even to offer general advice on the topic.

But I have not put it all together into a successful introductory course the way Xiao-Li has, and so I read his article with interest, seeking tips in how we can do better in our undergraduate teaching.

The only thing I really disagree with is Xiao-Li’s description of statisticians as “traffic cops on the information highway.” Sure, it sounds good, but often I find my most important role as a statistician is to tell people it’s ok to look at their data, it’s ok to fit their models and graph their inferences. There’s always time to go back and check for statistical significance, but I’ve found the biggest mistakes are when scientists, fearing the statistician over their shoulder, discard much of their information and don’t spend enough time looking at what they have left.

I’m certainly not arguing that simple methods are all we need. (See here for my recent advertisement for fancy modeling). What I’m saying is that I’m happier being an enabler than a police officer. I think I’ve done more good by saying yes than by saying no.

On the other hand, in Xiao-Li’s defense, he’s prevented three false discoveries (see bottom of page 206 of his article), whereas I’ve proved one false theorem. So perhaps we just put different values on our Type 1 and Type 2 errors!

To return to XL’s article, on pages 207-208 he tells a story involving a scientist who was stopped just in time before making a big mistake, by discussing the questionable analysis with Policeman Meng, who noticed the problem. I assume we can all agree that the crucial step in this process was that the scientist was (a) worried that something might be wrong and (b) went to a statistician for help. I’d like to believe that many of the readers of this article would’ve been able to find the problem, but this sort of eagle-eyed criticism is different from what I think of as the most common bit of policing, which is statisticians giving scientists a hard time about technicalities.

Or, to put it another way, I don’t mind the statistician as critic, but I don’t think we should have the police officer’s traditional power to arrest and detain people at will. Except maybe in some extraordinary cases.

To return to undergraduate education: I’ve taught undergraduate statistics several times at Berkeley and at Columbia. Berkeley had an exciting undergraduate program with about 15 juniors and seniors taking a bunch of topics classes. I have fond memories of my survey sampling and decision analysis classes and also of the department’s annual graduation ceremony, which included B.A.’s, M.A.’s, and Ph.D.’s in one big celebration. I’ve heard that the program has since grown to about 50 students. At Columbia, in contrast, we have something in the neighborhood of 0 statistics majors. It’s a feedback loop: few courses, few students, few courses, etc. I think this was the case at Harvard for many many years, although maybe it’s changed recently.

My point? The intro courses at Berkeley for non-majors were very well organized, much more so than at Columbia, at least until recently. Perhaps no coincidence. I suspect it’s easier to confidently teach statistics to non-majors if you have a good relationship with the select group of undergraduates who are interested enough in statistics to major in it. And, conversely, an excellent suite of introductory statistics classes is a great way to interest students in further study.

Teacher training is also important, as Xiao-Li indicates in the last sentence of his article. At Berkeley there was no formal course in statistics teaching, but most of the Ph.D. students went through the “boot camp” of serving as T.A.’s in large courses under the supervision of experienced lecturers such as Roger Purves; between this direct experience and word-of-mouth guidance from other students in the doctoral program, they quickly learned which way was up. At Columbia we have recently revived our course, The Teaching of Statistics at the University Level, and I hope that this course–and similar efforts at Harvard and other universities–will help move us in the right direction.

In addition, wider awareness of statistical issues outside of academia (for example, at our sister blog) will, I hope, make college students demand statistical thinking in all their classes, whether taught by statisticians or not. It wouldn’t be a bad thing for a student in a purely qualitatively-taught history class to consider the role of selection bias in the gathering of historical data (see Part 2 of A Quantitative Tour for more on this sort of thing), just as it isn’t a bad thing for a student in a statistics class to think about the social implications of some of the methods we use.

5 thoughts on “The statistician over your shoulder”

Anonymous on August 9, 2009 7:05 AM at 7:05 am said:

This reminded me of the title of a note by Bob Abelson: Abelson, R. P. 1973. The statistician as viper: Reply to McCauley and Stitt, 526-27 which was his counter response after this note:
Abelson, R. P. 1973. Comment on
"group shift to caution at the race
track." J. Exp. Soc. Psychol. 9:517-21

My recollection from very, very long ago is that he felt he had to make a statistical objection, but as described in this entry he didn't want to be inhibiting people. Mostly I made this comment because the phrase "statistician as viper" was relevant.
Austin Kelly on August 9, 2009 10:09 AM at 10:09 am said:

I suspect that the choice between "cop" and "helper" is a reflection of personality as much as anything. I've found myself in both roles, but the "cop" role is the one that leads to the most memorable frustrations.

In my prior life doing policy analysis at GAO much of my time was spent debunking half-baked statistical analysis that "proved" some policy point. Often it was of the self-serving variety, but much of it was debunking stuff in journals that I suspect was not driven by pleasing the funding source, but rather the incentives to publish first and ask questions later that Xiao-Li so eloquently describes.

I suspect that most social scientists don't know (and wouldn't care if they did know) that the time of non-partisan analysts is often spent in saying "I know the results published in prestigous journal x by professor in ivy-league university y would argue for this policy, but
a) the methodology is badly flawed
b) the data are highly selective
c) important variables are omitted
d) you can't get there from here
etc.

It can be hard to explain to policy makers that you are pretty sure that the article was published because someone's student needed one more article to get tenure and someone owed someone a favor, not because anyone really buys this result. But that went with our territory.
Bill Jefferys on August 10, 2009 10:31 AM at 10:31 am said:

The University of Texas (Austin) has one of the oldest honors programs in the country (established in the 1930s), called "Plan II". For many years it had a required course on logic, but in more recent years it has allowed other courses on "Modes of Thinking" to substitute for this freshman-level course. I developed a course very similar to the "Happy Course" that Xiao-Li described in his paper and in the 2009 article he wrote for the Harvard Undergraduate Research Journal. I taught it five times at UT until I retired, and will teach it at the University of Vermont for the third time this fall. (At UVM I teach it to sophomores, because the structure of their very young honors program is different from UT's).

My course is called "Probability and Inference, Risk and Decision," and it is an introduction to Bayesian decision theory, applied strictly to finite state spaces. I also spend a significant amount of time on behavioral issues such as those studied by Kahneman and Tversky in their pioneering research on behavioral economics.

By sticking to finite state spaces we can explore the basic ideas involved without getting sidetracked into mathematical issues that might be beyond some of the students. Being honors students, probably most of them have had AP calculus, but the closest I get to calculus is working out a spreadsheet-like calculation (on the board) from prior through likelihood to posterior, with the continuous distribution being calculated on a discrete grid (and the students can quickly calculate the numbers we need in class using their calculators). Our major tools are the spreadsheet-like calculations on the blackboard, probability trees, decision trees, and Gigerenzer-style "natural frequencies". The course is taught seminar-style (limited to 20-22 students). I try ask provocative questions, and depend on the students to direct me in what gets written on the board. The students divide up into teams of 3-4 and do their assignments as a team. At the end of the course each team will research a problem of their own choosing, write a report, and give an oral presentation to the class. I also require a 3-4 page essay each week on a topic that relates to the class in some substantive way.

We apply the methods to interesting problems in medicine (e.g., testing for medical conditions, deciding between treatments), business, the law (probability of guilt or innocence, what considerations a juror informed by decision theory might take into account in deciding on a verdict), insurance and why it can be sold to willing buyers by willing companies, investments, the lottery and other retirement plans, etc.

The students run the gamut of majors. Usually about 10% are premed or otherwise interested in medical careers such as physicians assistant or RN; pre-law, some scientists/engineers, some have been interested in journalism, and even one dance major. A number of the premed students have gone on to medical school and I've gotten reports that the experience put them in an excellent position as regards medical uses of probability (which, as Gerd Gigerenzer has pointed out, many physicians have significant difficulties with).

I've even had two statisticians come out of the course; one went on to Cambridge (UK) and Duke and just finished his degree with Jim Berger. The other is at Harvard.

This course has been fun to teach and I believe it fills an important niche in these two honors programs.
Keith O'Rourke on August 11, 2009 4:09 AM at 4:09 am said:

I used to (mis)quote Andrew Carneige with "I don't pay you lawyers to tell me what I _can't_ do but rather how I _can_ do what I want to" as a way to downplay any anticipation that I would act like a "cop" in collaborations.
(I believe there usually is something of value that can be learned in most applications)

Keith
Keith O'Rourke on August 12, 2009 8:35 AM at 8:35 am said:

On further thought, Policing assumes there are “musts” to be enforced but in applying statistics there are only ever “shoulds”. (On the other hand, in mathematics, there are “musts” – things everyone would agree you must do to get only correct answers and perhaps they can take on the policing role).

But I do think this is quite serious and apart from things like “must not make up data” that should be dealt with by University ethics, admin and real police I don’t think we are ever in a position to say more than “should” (again excluding things about math which is distinguished above). I liked an earlier post by Andrew about avoiding telling people to not play lotteries but only advise them of the risks and consequences. The policing role is, I believe, self aggrandizing and overly self assured, and as am sure Meng knows, his advice sometimes will do more harm than good so why frame it in any way as “enforcement” of accepted laws (things people must do).

So I thought I would give some examples where a “policing stance” did more harm than good in my experience (of course only recalling when others where wrong).

Example 1 – a traumatized young lung transplant research came to my office – he had approached the director of the consulting service at our university’s statistics department for some advice about an experiment he had conducted on two groups of 6 dogs each – he was abruptly asked to leave and not return until he had at least 20 to 30 dogs per group. He wanted to know why he had been treated so badly (almost like a criminal). When I had a look at his data the groups means were 20 SDs apart (although the 6 per group did allow a nice non-parametric assessment)

Example 2 – a clinical paper with important findings that was delayed publication for over a year while statisticians argued about which of two commonly used ways to analyze the results (that gave almost identical results) – under the view that only one could possibly be right.

Example 3 – at a grant review one of statisticians decided to send a proscription that rather than analyzing the outcome as a 1st injury that all injuries be considered using a Poisson regression as to just analyze the first injury would be “silly” – unfortunately I only recalled later that the grant gave detailed arguments about re-injury being very different and not the focus of the study.

Example 4 – a presentation at the ASA by a well credentialed statistician arguing a cumulative meta-analysis would have stopped at the third study and the forth study undertaken was unnecessary and even unethical. They withdrew that argument upon learning that of these 4 observational studies, the first three all had the same serious bias while the fourth study avoided that bias. (This value of another observational study issue may not be widely appreciated and there some introductory material about in Greenland and O’Rourke in Modern Epidemiology, 3rd ed. Edited by Rothman KJ, Greenland S, Lash T. Lippincott Williams and Wilkins; 2008)

Example 5 – a published paper suggesting the use of robust pooled estimates that downweight studies with unusual estimates in clinical research – but in clinical research there usually are only a few good studies and many not-so good studies and if doing the studies well was important and gave unusual estimates – you don’t want to downweight those few good studies!

Perhaps most of us would not make these mistakes but there are other mistakes we would make and I believe we should avoid giving the impression of knowing what must be done in research.

I did enjoy reading Meng’s paper and am impressed with his undertakings. The issue of faculty sitting in on (even undergraduate) courses perhaps could be further investigated – it is something I have often done (when not too inconvenient) and suggested to other statisticians – though it’s not quite high culture in academe.

Hope the post was not too long

Keith

Comments are closed.