April Galyardt writes:

I’m teaching my first graduate class this semester. It’s intro stats for graduate students in the college of education. Most of the students are first year PhD students. Though, there are a number of master’s students who are primarily in-service teachers. The difficulties with teaching an undergraduate intro stats course are still present, in that mathematical preparation and phobia vary widely across the class.

I’ve been enjoying the class and the students, but I’d like your take on an issue I’ve been thinking about. How do I balance teaching the standard methods, like hypothesis testing, that these future researchers have to know because they are so standard, with discussing the problems with those methods (e.g. p-value as a measure of sample size, and the decline effect, not to mention multiple testing and other common mistakes). It feels a bit like saying “Ok here’s what everybody does, but really it’s broken” and then there’s not enough time to talk about other ideas.

My reply: One approach is to teach the classical methods in settings where they are appropriate. I think some methods are just about never appropriate (for example, so-called exact tests), but in chapters 2-5 of my book with Jennifer, we give lots of applied examples of basic statistical methods. One way to discuss the problems of a method is to show an example where the method makes sense and an example where it doesn’t.

@Galyardt

Don’t teach methods you don’t like. You are a researcher, you are at the frontier, break new ground (and bear the consequences).

@Gelman

I would not throw the baby out with the bath water. The Lady Tasting Tea example is perhaps the oddest example out there. But note Fisher was not interested in estimating anything, he just wanted to accept or reject a very specific proposition. Judge it in those terms.

More generally, I think simple randomization tests are a great way to start Stats 101. Why? Because they are 100% self contained. No need for complex math (ok some combinations), no need to appeal to laws of large numbers or central limit theorems, no need to know a Normal, and no need to consult those magic numbers at the back of the stats book, no conjugacy, no parameters, no hyperparameters.

Its just you, your experimental design, your decision criteria, and the possible outcomes. It’s absolutely beautiful in its simplicity, but it is also limiting. Then get into problem of estimation, CI, decision making etc. by introducing other approaches.

@Fernando

Not teaching methods you don’t like may not serve the students well; they need to be able to understand what other researchers in their field are doing, why they’re doing those things, and whether (or when) those methods are sensible for the applications at hand.

Agree there is a fine balance.

But if all teaching is adaptive we would still be teaching the noxious humors theory of disease.

At some point a break is needed. This is a view in philosophy of science.

And yes there are risks. But conformity is also a risk. Students may be addled with an albatross.

PS. Btw by “dont like” I meant in the sense Galyardt intended, e.g. as methods you believe are wrong, not as stuff you don’t happen to like in the way I don’t like okra.

Dr. Gelman — The most common application of the Fisher exact test I’ve seen in genomics is the following. A subset of the genes were highlighted by some (usually complicated) experiment. You have an a priori collection of preexisting (potentially overlapping) categories of genes. You want to know if there is an association between a gene being in the highlighted subset and being in any of the categories. The categories are reasonably viewed as fixed. Your article objects to treating the size of the subset of highlighted genes as being fixed, but I don’t see why this is a major problem. The article also suggests modeling the original experiment and (I suppose) the varying number of genes that might have been highlighted. Because genomics experiments tend to be very complicated, this sounds like a huge waste of time.

Morgan:

The test may be fine in practice, although once you abandon the assumptions of the model (fixed margins), it’s no longer exact. Not that an exact test is such a good thing in the context of discrete data; see the paper by Agresti and Coull.

Regarding “huge waste of time”: there are straightforward Bayesian analyses of independence models that are just as fast to compute as the so-called exact test. If the exact test is quick to compute, it doesn’t really bother me so much. The problem is that there are some settings in which the so-called exact test is actually very difficult to compute but people still do it.

> Because genomics experiments tend to be very complicated, this sounds like a huge waste of time.

This doesn’t make any sense. You have a complicated experimental design and you want to interpret it using a model that assumes the simplest possible experimental design? This is a recipe for picking up confounded correlations.

Genomics as a field seems to be particularly bad about abusing the combination of weak effects + exact tests + multiple comparison adjustments. It’s really about time to move towards better estimation approaches. Although I’m guessing Andrew won’t be a fan of the bayes factor approach (throwing out point nulls altogether is mentioned on page 687 though), Matt Stephens highlights some of the issues with FET/p-value usage in genomics in this Nature review article :

http://stephenslab.uchicago.edu/MSpapers/Stephens2009.pdf

I hope that in teaching methods that have been criticized, you will go beyond repeating a bunch of well-known howlers but will instead consider that responses have been given: e.g., a list of 13 criticisms and responses may be found in Mayo & Spanos (2011) ERROR STATISTICS (link can be found on error statistics.com.). Fallacious uses of methods are not indictments of the methods, and furthermore, the more popular approaches are often designed to mimic the “classical” results.

Another way to think about approaching these controversies is to consider who your “audience” is. Is it the best 1 or 2 students in the class who are already well-informed and have the background to really understand the issues you’re raising or is it everybody else, most of whom are probably struggling with the basic material and on whom the nuances of all of this will most definitely be lost? If the latter, then forgo the digressions and focus on communicating the core subjects.

I don’t agree with this. Interpreting statistics is an area where a little bit of knowledge can do real harm. If someone can’t appreciate the nuances that April mentions, they don’t know what they are doing (in a literal, not pejorative, sense) and probably shouldn’t be using these methods at all.

April’s situation sounds familiar to me as a teacher of master’s level health care professionals. I don’t know what the answer is, although I continue to experiment with getting them to explore every question verbally and with common sense first before reaching for the stats. A lot of the time that saves them from blundering into things like confusing significance with importance, or correlating two different measures of the same thing and testing H0: r=0. However it is very hard to defeat the attractive idea that there is a cookbook of stats full of right answers and wrong answers.

Just this morning I heard from a student who had written up her methods and referred to her chi-squared test for trend as a correlation. I thought about explaining how it actually IS, using the Cochran-Armitage formula, and I briefly imagined how the light might go on above her head and she would go on to become a skilled statistics user. Then I imagined the look of horror when I undermined the carefully built mental framework… so I picked up the red pen and changed “correlation” to “test”.

The real problem as April says is “…there’s not enough time to talk about other ideas”.

Time for a GAISE college report on computationally-intensive and Bayesian methods?

First figure out why you don’t like the method. If it is because of a broad philosophical view, it might be over the head of your students. They are taking a class outside of their comfort zone and inside baseball will only make them more confused. But do get across why this customary approach has opponents.

If it is because the method is misused, then explain the misuse and where it breaks down. For example I worked in a place that correctly explained to researchers to not use t-tests for ordinal data on scales of short length. But the non-parametric method implemented, an adjustment of the sign-rank test, often yielded significant results when only one subject had a change on the scale. Regardless of the math, it was not meaningful for decision making. (I wanted to look into other approaches but moved on.) The most important thing a person can walk away from a general class is an understanding of effect size, not a magic bullet of a p-value. Regarding the genomics discussion above, there is a general problem with an excess reliance on significance as a goal in the life sciences.