# The sampling distribution of the sample mean

The hardest thing to teach in any introductory statistics course is the sampling distribution of the sample mean, a topic that is at the center of the typical intro-stat-class-for-nonmajors. All of probability theory builds up to it, and then this sample mean is used over and over again for inferences for averages, paired and unparied differences, and regression. This is the standard sequence, as in the books by Moore and McCabe, and De Veaux et al.

The trouble is, most students don’t understand it. I’m not talking about proving the law of large numbers or central limit theorem–these classes barely use algebra and certainly don’t attempt rigorous proofs. No, I’m talking about tha dervations that lead to the sample mean of an average of independent, identical measurments having a distribution with mean equal to the population mean, and sd equal to the sd of an individual measurement, divided by the square root of n.

This is key, but students typically don’t understand the derivation, don’t see the point of the result, and can’t understand it when it gets applied to examples.

What to do about this? I’ve tried teaching it really carefully, devoting more time to it, etc.–nothing works. So here’s my proposed solution: de-emphasize it. I’ll still teach the samling distribution of the sample mean, but now just as one of many topics, rather than the central topic of the course. In particular, I will not treat statistical inference for averages, differences, etc., as special cases or applications of the general idea of the sampling distribution of the sample mean. Instead, I’ll teach each inferential topic on its own, with its own formula and derivation. Of course, they mostly won’t follow the derivations, but then at least if they’re stuck on one of them, it won’t muck up their understanding of everything else.

## 12 thoughts on “The sampling distribution of the sample mean”

1. Whew! Maybe I'm NOT crazy, or at least not unique in my abberation.

The whole hypothesis-test-on-the-sample-means approach has always struck me as pretty abstract for novices, and I've seen plenty of stats and EE majors who struggled with it. I did it pretty much by rote until I had to teach it, and discovered how scaly an idea it really is ("Let's pretend the null hypothesis is true.")

I have much better luck starting with Fisher's exact test (p-values si! alpha no!), generalizing to the chi-square goodness of fit test (how close is the data to the model? And now alpha makes sense), and THEN introducing tests on the sample mean.

Still, I end up spoiling all in the last two weeks of the semester when we discuss post hoc tests in ANOVA: A > C, but A ~ B and B ~ C takes a heap of explaining. Fortunately, by then the idea of closeness is firmly embedded in the discussion, so some students do catch on.

For my juniors, I often challenge them to develop a relatively simple, but oddball, hypothesis test or confidence interval, so they get the underlying idea.

2. I've noticed the same thing. My theory is that most people just aren't capable of understanding it. It involves a level of abstraction that is just too far beyond their experience.

But I have to wonder, if they can't understand that, is there any point to teaching them statistical inference at all? Would most people be better off just taking something else instead?

3. Ed:

I agree, some of them would be better off studying something else. However, many of my undergraduates are in degree programs like biology that require statistics, and others take my "statistics for poets" class to satisfy a quantitative literacy requirement (and avoid calculus). Academic budgets are driven in part by attendance, so we actually recruit students for our classes!

Fortunately, the "outside their experience" thing isn't as bad as all that. I've had great success explaining two-sample t-tests and post hoc tests in one-way ANOVA by likening them to tuning an old analog radio to different stations; statistical significance means being able to tune in loud and clear.

4. I think you can teach this by relating it to people's experiences. For example, if you examine bowling scores, people do understand that an individual game your score might vary from 100 to 200, with an average of 150 — but — your average most likely will not vary from 100 to 200, it might vary from 140 to 160.

5. I agree that most people are able to grasp the concept that samples are noisy and don't perfectly represent populations. But this is mostly a qualitative understanding. Most students are completely unable to grasp the abstractions that are necessary to quantify the noise and that allow us to do inference. We can probably teach them cookbook methods of doing a variety of statistical tests, but I'm afraid they will have such a poor understanding of what the tests mean that I'm not sure I see the point. We may be giving them just enough rope to hang themselves.

I also realize that most students are in the class because it's requred. But maybe it would be better to require something other than a class in inference. I suppose we could teach a lot of interesting things about data analysis that didn't include inference at all, or only included it as an advanced topic at the end.

6. Wouldn't examples using jackknife and bootstrap estimators help to convey quantitative notions of bias and variance? They are conceptually simple, and useful for people to know about in any event.

7. At the intro level, it does seem pointless to derive the law of large numbers. If that is attempted, students would most likely be lost in the details. Would it not be much more effective to set up some kind of simulation exercise so that the students can figure out for themselves what happens as sample size increases?

8. Kaiser:

I couldn't agree more about teaching the LLN to novices! So instread we do simulations, starting with dice. We begin with many individual rolls, and get a uniform distribution (much discussion about why the histogram is not "perfect"). Then we take the average of four rolls–the classroom is very noisy at this stage–and get a histogram with nice central tendency, and more discussion. At that point we trade the dice for a spreadsheet and "duplicate" the process with thousands of U(0,1) random numbers.* This certainly isn't a proof, but it makes a strong impression, much more that a two-minute mention in a list of theorems on the blackboard.

At this point I've covered one main idea (the LLN) and "sneaked in" two smaller ideas (simple spreadsheet simulations and summing processes), and the lecture is only half over. So next we repeat the cycle, this time rolling the dice in geometric trials. The spreadsheet follow-up is a little trickier (time to introduce some spreadsheet functions), but we get nice right-skewed histograms. Now I make an explicit contrast between summing and waiting time processes.

For a one-shot presentation, I add a grand finale. During the talk I have students write their heights on a roster. At the end I quickly draw a histogram of the heights and ask the question: summing or waiting time process?

*This and paulse's comment just gave me an idea for a simulation my third-year statistics majors can do, to show how variance decreases as sample size increases. Cool!

9. Hi, all. Thanks for the comments. Just to be clear:

(1) We haven't even tried to derive the law of large numbers in the intro stat class. We tell it to them, and we do some demonstrations. The difficulty arises when we try to apply the law of large numbers (more specifically, the sampling distribution of the sample mean) to solve problems in inference (getting conf intervals and hyp tests).

(2) The key difficulty, as I've always seen it, is the idea of a sample mean having a distribution. It's that second level of abstraction. The data have a mean and variance, and under repeated sampling, these themselves have a distribution.

(3) I don't plan to eliminate this from the course. It's a key idea of statistics, to think of the distribution of what could have happened if the sample had been different. Rather, my proposal is to take away the "pivotal" quality of this topic in the course, by no longer using it to derive later results.

Thus, I'd be happy to keep all these examples of bowling, simulations, etc.–but then when I get to conf intervals and hypothesis tests, I think they just need to be presented, without deriving them from sample means.

Regarding the discussion of Ed and Mike on whether to teach inference at all: I think that inference is an important idea, and I hope it can be understood at some level without the necessity of deriving it from sampling distributions of the sample mean.

In some ways, I think that the sampling distribution of the confidence interval can be easier to understand than the sampling distribution of the sample mean. For some examples, see Chapter 8 (particularly Section 8.4) of my book on teaching statistics.

10. I don't teach statistics but policy analysis, and I find I have to teach my students about thinking statistically so they can read and interpret the literature. Based upon this experience, I believe there are two problems. Part of the problem is that we underteach variance, treating it as secondary to the central tendency. As a result, things that require understanding deviation from the mean are hard to convey because the base is not there.

The other problem is the students don't understand at the gut level that we have a sample and are trying to make statements about the population. Without understanding this as the central task in inference, they are lost.

When I've tried to communicate these ideas to the class intuitively, I've used coin tosses and trying to decide whether the coin I'm tossing is a fair coin or biased coin. 6 heads out of ten, no one is willing to reject the hypothesis that the coin is unbiased, 6000 out of 10000, all are. The discussion of where the tipping point is for judging the coin to be biased makes the discussion of sample size, p-values, confidence intervals a real one.

11. What about something like the resampling method (bootstraping for everything)? It's not perfect, but it produces good enough results for most statistical questions and can be understood by "normal" people.

12. Me, being a non-math type, would like to see someone explain why n=32 is the minimum sampling size needed to do meaningful statistical analysis. I assume that is the correct value?