Instructors are trying their best but they are busy and often don’t themselves understand statistics so well. How can they avoid using bad examples, or does this even matter?

Asher Meir writes:

I haven’t cried on your shoulder for a long time, but I know statistics education is one of your passions and I couldn’t resist.

My niece is studying political science at a university in Holland. She has a required course in data analysis/statistical methods. Everyone so often she asks me for help with her homework and every time I see black. Here is the latest example. The students are given a data set of 193 countries (all of them, according to the assignment), with regime type and education expenditure level for each one. Here are a few of the questions:

· Compute the proportion of democracies in the data set and calculate the 95% confidence interval.

· According to an earlier measurement, out of 185 countries 90 were democratic in 2000. Use the confidence interval method (i.e. compare the two confidence intervals) to decide whether the proportion of democracies significantly differs between now and then.

· According to an earlier measurement, the average education expenditure of democracies was 4,695 in 2000. Compare the mean education expenditure of democracies in 2000 and today (2019). Formulate appropriate hypotheses and investigate whether there is a statistically significant difference between the expenditure on education now and then.

The data set doesn’t have a subsample, it has all 193 countries in the world. Ergo, there is no confidence interval; if there are exactly 100 democracies then the confidence interval is [100,100]. If 90 were democratic in 2000, then the number of democracies is different at the 100% confidence level, and this would be true even if the number in 2000 was 99, compared to 100 in the new data set. The same for education expenditures. There is no sample and no indication of measurement error. If there is even a one-Euro (or dollar or whatever) difference in the expenditure, then that difference is significant at the 100% level.

Is that correct?

On the previous problem set I told my niece the questions were carelessly written, but “this is what they really want you to do.” But this one is so egregious I really don’t know what to tell her. I just told her that there is no relevance for a confidence interval if you have data for the whole population.

The problem set also has a lot of gratuitous left wing propaganda, I imagine that there is some kind of correlation between being ignorant of how to draw conclusions and taking certain conclusions for granted even when they have no empirical basis.

(Evidently what the instructor has in mind is a test which measures the following: Suppose each country observation is a random draw from a Bernoulli distribution with probability P of being a democracy/heads. We find that in a sample of 193 tosses, 100 are heads. What is the 95% confidence interval for P? But that’s not how regimes are decided, at least not in the short run.)

I have a few things to say about this story:

1. Is it really true that “there is no relevance for a confidence interval if you have data for the whole population”?

2. What about the particular example of the 193 countries?

3. What’s an instructor to do?

1. Is it really true that “there is no relevance for a confidence interval if you have data for the whole population”?

No, I disagree with that strong statement. The statement would be true if sampling were the only theoretical basis for confidence intervals, but sampling is not the only form of randomness. We discussed this back in 2011 in our post, How do you interpret standard errors from a regression fit to the entire population?. Short answer is that even if you have the whole population right now, you’re still gonna want to use your model for prediction to new cases.

2. What about the particular example of the 193 countries?

Here I agree with my correspondent that his niece’s homework assignment doesn’t make sense, as the question isn’t asking about the prevalence of democracy or the relation between democracy and some other variable; it’s just flat-out asking for a 95% interval for “the proportion of democracies.” I can’t see how to wriggle out of that one and define a population.

Just by comparison, several years ago I was fitting a model to estimate the cancer death rate by county using ten years of data. Assuming the data give all the cancer deaths, the results are the results. But if a county has, say, a population of 1000 with 1 kidney cancer death, I’m comfortable saying that 0.001 is the observed kidney cancer death rate but there’s an underlying kidney cancer death rate in the county that I don’t know, for which I can perform inference. That underlying rate is just a mathematical construct but it makes sense to me to think about it. In the democracy example, I don’t see a corresponding underlying-parameter model that makes sense.

3. What’s an instructor to do?

As the above examples illustrate, these problems are subtle. Where does this leave the teacher in such a course, who needs to teach the basics (for example, how to construct and interpret a confidence interval from a binomial proportion) but would like to avoid bad examples such as the democracy problem above where the model doesn’t really apply?

The instructor then has two choices.

The cleanest option is to step back and only use models from pure math, so instead of countries that are democratic or not, all the problems involve sampling balls from a very large urn, or rolling a die with an unknown bias, or drawing cards from a well-shuffled deck. Even here you can run into trouble (for example, you can’t really find a coin where, when you flip it, it has probability p of landing heads, unless p = 0.5, 0, or 1), but if you stick to urns you should be ok.

When I teach this material, my go-to example is basketball shots, where I state the probability of success and also explicitly assume the outcomes are independent. Assuming known probability and independence of outcomes of basketball shots is no more unreasonable than assuming these properties for draws of balls from an urn, and I think the hoops example is easier to visualize, also you can easily talk about players with different abilities or the probability changing as you move closer or farther from the basket. All this is easier for me to think about than, say, comparing the unknown probabilities of black and white balls in an urn. The point is that the basketball example, the way I set it up, is pure probability—it’s really a math example, just like the urn.

The other approach when teaching confidence intervals from binomial proportions is to use real data. There are many available examples from sports (for example, basketball free-throw data), but for a social science class the go-to example will be public opinion surveys. I talk about these in my classes all the time!

When using real data I think it’s important to talk about the assumptions and their failures. For a sample survey, it’s easy: the basic assumption is that we have a simple random sample of accurate measurements from the population of interest, and these assumptions can go wrong because it’s not a random sample, the measurements have error (for example, respondents aren’t always telling the truth), and undercoverage (you’re not fully accessing the population of interest). For the countries-with-democracy example, the assumptions aren’t so clear, which should already be a signal that something’s going wrong.

Here’s an example, kinda like the democracy example but which fits better into the binomial modeling setting: “Suppose you say that an attempted coup d’etat has probability p of succeeding, and you have data on 185 attempted coups, of which 90 succeeded. Give an estimate and 95% confidence interval for the probability that an attempted coup succeeds.” Here the assumption is that coups are independent, each with ex-ante probability p of success. These assumps are clearly wrong—outcomes aren’t independent, and the success probabilities have to vary a lot—but so are the assumps of a political poll. The point is that the binomial model makes sense on its own terms here.

So my advice to instructors when creating such problems is to be explicit about the assumptions that make the math work, be clear about how these assumptions translate to real life, and, if you can’t figure out how to state the assumptions, don’t use the example.

Stating the assumptions and how they can go wrong, that takes work. But I think it’s worth it.

33 thoughts on “Instructors are trying their best but they are busy and often don’t themselves understand statistics so well. How can they avoid using bad examples, or does this even matter?

  1. I completely agree with your point #1. It is important for students (and everyone) to understand when you have population data and use it to infer something about a different place or time. It is done all the time, and can be useful, but an exercise stressing the mechanics and avoiding the dangers and limitations of doing so, is dangerous at best. I’m not averse to an exercise where students need to look at and interpret such population data as if it were a sample, but I’d prefer to see the emphasis first on graphical exploration of the population data, and then explicit treatment of what must be assumed to then treat it as a sample (including why you might want to do so, and what could go wrong – the mechanics is the least important point, in my opinion).

    I’m not sure I understand your objection to #2. I read the question as what proportion of countries were democracies in 2000, what proportion were democracies in 2019, and has that proportion increased or decreased? I don’t see anything wrong with such a question, but I’d again ask for a description of the data, an understanding of the limitations of any analysis and conclusions, with the least important part being the mechanics of comparing confidence intervals for the proportion of countries that are democracies in 2000 and 2019. In fact, I’d want to see some attention focused on the difficulties with such binary measurements (democratic vs not). But I don’t understand your objection that the question makes no sense (perhaps I’m reading the words differently than others).

    As for what instructors are to do, I empathize with the question. Every time I teach an introductory statistics course, I struggle to find good sample data to use as first exercises – while population data for a particular time period is readily available, I don’t want to start with that given that all of the statistical theory is based on random samples. But the best random sample data is usually from complex stratified sampling and that seems to complex to start with students. So, I usually opt for a fairly large nonrandom sample and discuss the ways in which it is not random, how that would affect the analysis, and then proceed as if it were random. But I try not to start with population data. I also try to start with fairly large multivariate data – while it makes the mechanics of the analysis less clear, I view that as a feature, not a bug. My guess from the example given above (and it is a guess, since I really can’t tell without seeing much more of the syllabus), is that the cited exercise comes from a course more focused on the mechanics of constructing and comparing confidence intervals.

    • To clarify my interpretation of #2, I read the question as asking whether democracies are increasing or decreasing over time – something that the asked for confidence intervals would say something (although not much, given all the required caveats) about.

      • Are you saying that they have 2 samples of the number of democracies and a process where they can sample the number of democracies in the future, and want to know about the future? If so I agree you can construct things that way but it’s a terrible example. In the future we aren’t drawing a new set of countries out of an urn, we are looking at changes over time to the same set of countries, it’s a process but it’s not random and standard confidence interval procedures are not appropriate to the task.

        There’s an easy way to turn this into a sampling problem though. Instead of giving data on ~200 countries … Just take a random sample of the countries, say 25 of them, and calculate both finite population and infinite population based confidence intervals.

        “If you only have access to data on these 25 countries chosen at random what would you extrapolate to the entire group of 193 countries and what are the confidence intervals for your extrapolation?” Is a legit classical stats question.

    • > I don’t see anything wrong with such a question,

      Do you see anything wrong with the following answer?

      “If there are 100 democracies then the confidence interval is [100,100]. If 90 were democratic in 2000, the confidence interval was [90,90]. The number of democracies is different at the 100% confidence level.”

      (I’m sure – at the 99% confidence level – that whoever asked the question wouldn’t like this answer.)

      • I guess I’m not understanding. Your stated confidence intervals are degenerate because you rightly view the data as population data rather than sample data. I thought that was Andrew’s first point – that we often treat population data as sample data and that can (subject to important caveats) be ok. So, if there were 100 democracies in 2019, and there are 92 non-democracies, then a 95% confidence interval for the number of democracies is (95.6,96.7). Of course this treats the 192 countries as if they are a random sample from a large number of countries, in this case over time. I see plenty wrong with this example, but not that the question is illogical. Published work often treats such population data at a point in time as if it were a random sample from many time periods. I think it is important to understand the implications and limitations of such an assumption – and in the case of the number of democracies among countries I agree that the example is bad. But when I say I don’t see anything wrong with such a question, I’m referring to the procedure of treating population data as if it were a random sample from many time periods (again, I’m not defending doing this without understanding what is being assumed and what the dangers are of doing so). Isn’t that what Andrew is referring to in his point #1?

        • If I understood correctly you said that you don’t see anything wrong with this homework question:

          “· Compute the proportion of democracies in the data set and calculate the 95% confidence interval.

          · According to an earlier measurement, out of 185 countries 90 were democratic in 2000. Use the confidence interval method (i.e. compare the two confidence intervals) to decide whether the proportion of democracies significantly differs between now and then.”

          I was just asking if you would consider what I wrote (echoing Asher) as a correct answer to that homework question.

        • OK, I’d accept your answer. I’d prefer an open ended question that would show your confidence interval as well as the one I computed, along with a discussion of what would make each of those appropriate or inappropriate and in which ways.

        • Dale, you might accept Carlos’ answers, but I doubt very much the professor teaching this class would. There’s too much “turn your brain off and calculate the results of this formula” in stats education. If people don’t turn their brain off, they’d start to seriously question a lot of what’s in Stats 101, and then they couldn’t “cover the material” and give out grades and rubber stamp who’s the “competent scientist” and who “doesn’t understand stats”.

  2. I’d like to take issue with #1. You’d be right if the regimes in 2019 were more or less random draws from an underlying distribution whose parameters are consistent over time. And actually, I think that logic applies to a number of situations, like the performance of students in a stats class. Since it’s your class and performance reflects that, you have the entire population in a given term, but of course you have longitudinal data to work with. Yes, there are slow-moving shifts in the student population or your teaching methods that cause parameter drift, but the effects are small enough (maybe) to ignore. But the political, economic and other factors that influence the likelihood of electoral democracy change rapidly and unpredictably from year to year, no? To put it differently, aside from autocorrelation, would there be any basis for using 2019’s total of democracies to predict 2020’s?

    • “like the performance of students in a stats class” – the performance of students in the same stat class isn’t independent as these communicate and may work together. I have seen classes that were outlying as a whole. You will quite certainly underestimate uncertainty of you treat students in a class as a random sample from some larger population as long as you don’t assume you could put all of them together in the same classroom and have them doing breaks and free time together..

  3. I agree that it is a terrible example. But I think the exercise of inferring future “popularity” or “frequency” of democracies based on data from 2000 and 2019 (for all countries) is a standard invalid inference exercise – just the type that Andrew’s point 1 refers to. Standard confidence intervals do not apply and the sample is not random, but the academic literature is full of such examples and not all of them are worthless. But, in my opinion, the exercise should be to understand the limitations and potential value of such an approach – not to learn that you should never do it. Again, this is a particularly bad example – but I don’t think it is because of those nonrandom and population features, but more because it involves bad measurement, an ill-posed question, and inadequate data for the underlying question. Also, I don’t think randomly selecting 25 countries would be particularly useful – it suffers from the same limitations, although the statistical foundation would be better.

    • A random sample of 25 countries to extrapolate to the rest in the given year at least is meaningfully a “extrapolation to a population via a sample” problem. It doesn’t address how one measures democracy or whatever but its exactly analogous to something like extrapolating the number of broken widgets in a shipment from randomly sampling 5 boxes.

    • It doesn’t make sense to randomly sample 25 countries if you are doing science, but if you are teaching statistics it can make a lot of sense. One year, in my very intro class I had everyone in my course take a random sample of US counties to work with over time. The we could see every single time we did examples that everyone got different answers but there were patterns to those differences.

      • Yeah, that’s exactly the thing I was thinking. you can even say “we do this so we can see how our estimates from the sample compare to the total population which we actually have data from in this case”. making it clear the educational motivation

        • I have used this device often. Any large data set will permit each student to have their own random sample, and the data can still be complex enough to be interesting. Sometimes, I have them take different size samples so they can compare the results. County level or zip code level data works well. Then they also have the population data to also compare their samples to.

  4. In the past I’ve used as an example the CI for the proportion of students who got an A on the midterm. And then gone into a discussion of whether this is a valid thing to do — which depends on how you define the population. If the population is “this class”, then there is no uncertainty. If the population is “all students who took this class at some point in time”, then CIs make more sense, assuming that the difficult of the class remains constant and the abilities of the students doesn’t change. I’ve found that it allows for an interesting discussion of the assumptions behind CIs.

    But CIs for proportion of democracies in all 193 countries? No.

  5. I agree with your point 1 that there are other sources of randomness in addition to that caused by sampling. But whether or not this is the case is surely context dependent and it is not obvious to me that confidence intervals based on sampling theory are appropriate when other sources of randomness are involved. I once asked the UK Gov Statistics organisation why they provided confidence intervals with suicide statistics. They did reply to me but didn’t answer my question to my satisfaction. In each region in England and Wales the total number of deaths/suicides is tabulated annually. Now, in each region in each year there is a finite number of death certificates and a small number of these record suicide as cause of death, The numbers vary from year to year but in each year one could regard the data as deriving from census. However there are several potential sources of within year random variation:

    1 Some people attempt suicide but fail so do not appear in the record. Of these some are serious attempts, others not.
    2 Some deaths are incorrectly recorded as suicides
    3 Some deaths are incorrect recorded as not being suicides.
    4 And so on

    It seems to me that to calculate appropriate confidence intervals one needs to have an understanding of the probability models corresponding to each source, and that CIs based on sampling theory is not appropriate.

  6. So often I see supposedly beginner problems like this in which, if I thought about it, I would analyze the whole thing differently.

    For example. You could figure out which countries have changed regime type in the time period 2000 to 2023 and in what direction they changed. This could be a really good example of a kind of data visualization where you display all the data.

    Related to that, why did the number of countries change? How would you display a country that was a non-democratic state in 2000 but is now 2 separate countries one democratic and one not.

    What’s the definition of democracy anyway? Gap Minder has at least 20 different indicators.
    This one is called the “Democracy Index.”

    This is a really great chance to let students think deeply about data, where it comes from, how to interpret it.

    They could think about the question “Is the world getting more democratic?” and consider whether it makes sense to

    No, I don’t like the confidence interval for this. I don’t even know what it means. I do think it’s an interesting proportion for students to calculate, and I would hope that it would be follow up with maybe cross tabulation by region or some other social or geographical variables.

  7. But what is a student to do?

    Take 15 minutes to provide the answer the professor is looking for?

    Or take 90 minutes to explain why the question is ill-posed and solve an analogous well-posed problem?

    You gotta know your professor and his/her track record really well to identify the best action.

  8. Everybody is debating the statistics, which is fine, but is nobody going to mention the weird dog whistle about “left wing propaganda” being slipped in there?

    • Maybe I missed it, do we know anything about the problem set other than what is contained in this post?

      For all I can tell, the problem set may contain left-wing propaganda, right-wing propaganda, no propaganda, flat-earth conspiracy theories, anti-vax nonsense, anti-global-warming nonsense, etc etc etc. I agree it’s a bit odd for the correspondent to mention it and perhaps even odder for Andrew to include that bit, but I’m not sure what I can say about it.

      Hmm, ok, I take that back: the poster seems to imply (but does not say) that people who promote left-wing propaganda are especially likely to be poor statistical thinkers. It’s not impossible but it’s a bit hard to square this with the fact that the right is the science-, math-, and fact-denying denying side on major issues (climate change, vaccine effectiveness, effects of tax cuts on the deficit, who won last presidential election, etc.).

      1/ I think statistical, mathematical, and scientific ignorance are bipartisan and 2/ I agree that it’s odd to interject a slam at the left in this post about statistics questions on an exam.

      So, ok, thanks for calling out that comment. Now I’ve done my part.

      • On the other hand it’s plausible to say that if some supposed educational material contains any strong ideologically slanted stuff it’s reasonable to think that the educational material may be less than good. People with strong ideological slants rarely care as much about facts and logic as they do about “winning”. The strongly politically oriented left and right are both deniers of reality in their various ways. For example using signs with slogans about “peer reviewed science” as the “arbiter of truth” on the left can be as bad climate change denialism. Wansink’s stuff was peer reviewed science. Hauser’s stuff too. You might call this “problematic science denialism”.

        My only point being when it comes to teaching statistical methods it’s a good idea to leave out political digs and hobbyhorses regardless of which perspective they’re from.

  9. I think a question like this would be fine:
    “Assume that for every country there is an independent random experiment with a probability p_year according to which the country is a democracy or not. Using the corresponding model, test whether the data are compatible with p_2000=p_2019. Discuss to what extent this is a good model (valid assumptions) for these data and whether the test could be reasonably be interpreted by saying that in 2019 there were significantly more or less democracies than in 2000 (or that any observed differences were insignificant).” One could also bring in confidence intervals in this way but I’m too lazy to word it.

    Note that a major issue I have with this model is that observations are actually mostly paired, which is ignored in the model. Not sure whether this has already been mentioned.

    To be honest, even though I still don’t think that this is a terribly sensible analysis, at least my wording above suggests (to me, that is) that the issue really isn’t so much that we have observed the full population, as the probability p_year can be interpreted in such a way that we could imagine that things could have turned out differently, which creates an imaginary “population” of parallel worlds from which our world can be thought of as drawn. We may still stick to this if we want to model the situation more thoroughly, taking really existing dependencies appropriately into account. I don’t think the Bayesians here would shrink away from such modelling! (Although they may not like the “population of imaginary parallel worlds” interpretation.)

    • Yes I was thinking about something along these lines. Also I feel sorry for the poor instructor who may really have only been trying to make a baseline calculation as a lead in to Chi Square or logistic regression.

    • Roger:

      Shots are not independent and they do not have equal probability of going in, either. It’s just a mathematical model! That’s fine. In class we use mathematical models and we also discuss how they are imperfect. Sometimes we even get to the part where we check the model and see where it does not fit the data.

    • Noord-Holland and Zuid-Holland are just two provinces in the West-Nederland region (there are 10 other provinces). Amsterdam, however, is located in Noord-Holland, so the correspondent’s niece might actually be studying at a university in Holland.

      By the way, it’s rather unfortunate that in English the language Nederlands came to be known as Dutch, while Deutsch is called German.

    • The country brands itself for tourism purposes as the Netherlands, or just Netherlands, of which North Holland and South Holland are major regions. As I understand it, they don’t have an ideological commitment to Netherlands but want the world to know their country contains more than Holland.

    • Strictly speaking Holland refers to the western two provinces (North and South Holland) while the Netherlands refers to the whole 12 provinces.

      The Dutch (especially those from Holland) tend to mix the two terms, but this may sometimes result in a big-city-arrogance accusation being thrown your way.

      I believe all Dutch universities where you can study political science are located in Holland though (Amsterdam, Leiden and Rotterdam) so in this case the term is applicable.

  10. FWIW, I actually kinda like this example, IF and only IF it leads to a good discussion about what the meaning (if any) of confidence intervals are in this situation. There’s a tendency for educational examples to only provide “ideal” problems, and in my opinion things like this are good for promoting critical thought instead of blindly applying the method just because you’ve been taught it.

    I suppose you can argue the 95% CI here could be used on predicting whether a newly created country is a democracy or not, assuming heroically it’s from the same population as existing countries.

    • Wow what a terrible model “countries spawn into existence by a process of rolling a 100 sided die and looking in a table to determine whether they are democracies or not.”

      This is one of the best examples I’ve heard of what’s wrong with all of Frequentist statistics.

Leave a Reply

Your email address will not be published. Required fields are marked *