MA206 Program Director’s Memorandum

A couple years ago I gave a talk at West Point. It was fun. The students are all undergraduates, and most of the instructors were just doing the job for two years or so between other assignments. The permanent faculty were focused on teaching and organizing the curriculum.

As part of my visit I sat in on an intro statistics class and did a demo for them (probably it was the candy weighing but I don’t remember). At that time I picked up an information sheet for the course: “Memorandum for Academic Year (AY) 13-02 MA206 Students, United States Military Academy.” Lots of details (as one would expect in that military-bureaucratic ways), also this list of specific objectives of the course:

1. Understanding the notion of randomness and the role of variability and sampling in making inference.

2. Apply the axioms and basic properties of probability and conditional probability to quantify the likelihood of events.

3. Employ models using discrete or continuous random variables to answer basic probability questions.

4. Be able to draw appropriate conclusions from confidence intervals.

5. Construct hypothesis tests and draw appropriate conclusions from p-values.

6. Apply and assess linear regression models for point estimation and association between explanatory and dependent variables.

7. Critically evaluate statistical arguments in print media and scientific journals.

This is all ok except for items 4 and 5, I suppose.

Also, at the end, a list of rules, beginning with:

a. All cadets are expected to maintain proper military bearing and appearance during instruction in accordance with appropriate regulations.

b. Respect others in the classroom – No profanity, unprofessional jokes, or unprofessional computer items . . .

e. Jackets are not permitted in the classroom . . .

g. Drinks must be inside a closed container (plastic bottle with a top, for example) or in the Dean-approved mug . . .

and ending with this:

j. Rules common to blackboards, written work, and examinations:

1) Draw and label figures or graphs when appropriate.

2) Report numerical answers using the appropriate number of significant digits and units of measure.

Now those are some rules I can get behind. They should be part of every statistics honor code.

32 thoughts on “MA206 Program Director’s Memorandum

  1. On point j2 (and part II of that), I was interested in views about differences between disciplines.

    I started undergraduate in the physical sciences, where there was much stress on units of measurement. Getting those right was often useful to make sure you were using the right equations, so there was immediate value for it (beyond just not losing marks on assignments). In grad school moved into psychology. While measurement is a big topic in some parts psychology (and there is much work in part of psychology, e.g., the Foundations of Measurement trilogy), it seems less stressed to report units. It seems all the more important in psychology where sometimes the measure is some combination of several rating scale responses. Is my view of this discipline difference something other people also see?

  2. Ah…the Corps has. When I took that course some 30 years ago, there was no thought of taking coffee into the classroom. Other than that, the list of objectives look rather familiar. That’s a core course, would imagine the curriculum evolves more slowly than the electives.

    • Hoekstra, Rink, Richard D. Morey, Jeffrey N. Rouder, und Eric-Jan Wagenmakers. „Robust Misinterpretation of Confidence Intervals“. Psychonomic Bulletin & Review, 14. Januar 2014, 1–8. doi:10.3758/s13423-013-0572-3.

      http://www.ejwagenmakers.com/inpress/HoekstraEtAlPBR.pdf

      “Confidence intervals (CIs) have
      frequently been proposed as a more useful alternative to
      NHST, and their use is strongly encouraged in the APA
      Manual. Nevertheless, little is known about how researchers
      interpret CIs. In this study, 120 researchers
      and 442 students—all in the field of psychology—were
      asked to assess the truth value of six particular statements
      involving different interpretations of a CI.
      Although all six statements were false, both researchers
      and students endorsed, on average, more than three
      statements, indicating a gross misunderstanding of CIs.
      Self-declared experience with statistics was not related
      to researchers’ performance, and, even more surprisingly,
      researchers hardly outperformed the students, even
      though the students had not received any education on
      statistical inference whatsoever. Our findings suggest
      that many researchers do not know the correct interpretation
      of a CI. The misunderstandings surrounding pvalues
      and CIs are particularly unfortunate because they
      constitute the main tools by which psychologists draw
      conclusions from data.”

        • The thing is, that CI does not mean what people *want* it to mean. It’s the same problem as with the p-value/NHST but less dangerous because it is used less strictly as a decision tool and gives a better *idea* about uncertainty.

        • Daniel:

          Indeed. Recall Larry Wasserman’s statement:

          The particle physicists have left a trail of such confidence intervals in their wake. Many of these parameters will eventually be known (that is, measured to great precision). Someday we can count how many of their intervals trapped the true parameter values and assess the coverage. The 95 percent frequentist intervals will live up to their advertised coverage claims.

          To which I replied:

          As a Bayesian, I won’t try to predict the future with the certainty that Larry claims. But I can look at the past, and in fact the frequentist intervals of physical constants did not in general live up to their advertised coverage claims (see Youden, 1962, and Henrion and Fischhoff, 1986). It is possible that researchers are more careful now then they were in the 1950s, but I would guess that systematic error will always be with us . . .

          What happened here? Maybe the Law of Large Numbers and the Central Limit Theorem weren’t actually true back then? No. The mathematics was fine, but the models were not. A little thing called systematic error got in the way. Or, to put it in Bayesian terms, the assumed likelihoods were wrong, and, as a result, fewer than 95% of the 95% intervals contained the true values.

          In fairness, I agree with Larry that Bayesian inferences would probably not have the advertised coverage either. Like the frequentist intervals, they’re only as good as their models.

          The point is, Larry wanted the confidence interval to be something it isn’t.

        • Isn’t this just a criticism of model quality?

          A bad model will give badly performing CI’s, yes. But a bad model might / will also give you bad means, medians, sums whatever other property you may choose to extract from it.

          Am I misunderstanding?

        • It is such a common error in science (it’s even inspired poetry, see below) but the one variation I dislike the most is taking the implications of continuity as relevant in science as in science continuity is only a very convenient approximation.

          “The word [model of a] butterfly is not a real butterfly. There is the word [model] and there is the butterfly. If you confuse these two items people have the right to laugh at you. Do not make so much of the word [model].”

          http://www.youtube.com/watch?v=r2XkfBWSmcs

        • I agree with Daniel that a big part of the problem is that the CI and p-value are not what people “want”. So I often discuss those concepts starting out with “what we want” and ending with “what we get.” I think (at least hope) it helps more people get it right – or at least, realize that they don’t really understand.

          Also, I’ve found (in a master’s course for high school teachers, who had already had a more-or-less standard first course in statistics) that doing a little Bayesian analysis helped them gain a better understanding of frequentist CI’s (and convinced some of them that they’d prefer the Bayesian approach.)

        • Martha, that’s interesting to hear and while I won’t be able to reach any Bayesian statistics in our introduction to quantitative methods I might be able to try to give a better idea about then true meaning of p values and CIs. Need to be careful not to irritate the students too much.

        • @Daniel

          So, if not CI’s nor p-values, is there another metric that captures what people *want* it to mean?

          i.e. If we cannot get rid people of their misinterpretations, can we produce a metric to be aligned with what people are expecting?

          Or are people expecting answers we just cannot deliver.

        • Well, most often CIs and the like are interpreted as if they were BCIs but those of course have their own caveats and I guess they would be misinterpreted as well. They at least pretend to deliver something more akin to what we want, though. I guess it would be important to stress more strongly though what statistics just can’t deliver. Embrace uncertainty. ;)

    • Nothing as long as the problem is simple, has no nuisance parameters, no strong prior information, and useful sufficient statistics. Outside of that problem domain, it quickly leads to nonsense.

      Then there’s the issue that even within that domain CI’s don’t have the coverage property advertised. For this to happen, the system has to have approximately stable frequency distributions. This is never checked by frequentist model checking, and rarely checked by independent experiments. It’s hardly ever true which is why as a rule of thumb 95% CI’s are lucky to have 30% coverage.

      Then there’s the whole issue that CI’s don’t answer the question people want answered. They don’t want an interval that works on average over some non-existent future repetitions, they want a best interval for the one repetition (data set) that exists.

      But hey, if we give frequentist another 70 year monopoly on the teaching of introductory stat, maybe they’ll somehow magically just ‘teach’ all this right and all the problems will go away.

      Or we could recognize that it was an unfortunate historical accident that Fisher, Neyman, Pearson, checked their ideas on these simple kinds of problems which happen to give answers operationally equivalent to Bayes, and so seemed to work, and wrongly assumed they’d still give good results more generally. Once this unfortunate historical accident is recognized, then we can all just use the Bayesian answers, which work well even outside this narrow problem domain.

  3. If you are on a Crusade it’s OK to pillage a few villages on your way or maybe sack Constantinople once in a while. Otherwise, students need to know what is written in all those papers in all those journals printing p-values and CIs and such. Change does not come from the classroom, it’s a top-down approach.

  4. “most of the instructors were just doing the job for two years or so between other assignments”

    Reminds me of when I was an instructor at Rice in the early 70’s. One of the graduate students (taking one of my undergraduate courses) was in the Army and was there to get a master’s degree in math (Rice usually only admitted graduate students for a Ph.D.), after which he “would be” (his words) and instructor at West Point, where he had been an undergraduate.

    And that reminds me of the student I had a few years ago in a graduate statistics course at Texas who was in the service ( a captain, I think?) and was getting a Ph.D. in business/OR, after which he “would be” in charge of a VA hospital. The military is a different world than I’m used to.

  5. But here’s a line of arguments I have not been able to refute if students bring it up.
    From the cited paper (http://www.ejwagenmakers.com/inpress/HoekstraEtAlPBR.pdf), the following statement S1 was labeled correct (as I have also learned it):
    S1: “If we were to repeat the experiment over and over, then 95 % of the time the confidence intervals contain the true mean.”
    S2: “Then we can be 95% confident that we got one of the CIs containing the true mean”
    S3: “Then we can be 95% confident that the true mean lies between 0.1 and 0.4”
    Statement 3 was commented with: “mentions the boundaries of the CI (i.e., 0.1 and 0.4), whereas, as was stated above, a CI can be used to evaluate only the procedure and not a specific interval”.

    I have not found a good reply to students arguing S1 -> S2 -> S3. Can anyone help me out here?
    I know we should change teaching statistics to a bayesian framework, but frequentist stats is so ubiquitary that they should also be able to interpret it correctly.

  6. S2 introduces the natural-language concept of “being X% confident that” – what does this really mean? (In contrast, S1 just uses the word “confidence” as an arbitrary technical label), and S3 goes further in applying this concept to concrete results of an actual observation (so there’s no clear sample space left). I assume you (or your students, if of a frequentist bent) would not be comfortable substituting “the probability is X% that…”. But if it’s not (Bayesian) probability, if hard to critique the argument further without hearing what this concept actually means.

    There are many examples were the specific interval produced by an entirely correct 95% interval procedure can be seen to be absolutely certain to contain the correct value (or in other examples: provably cannot contain the correct value). So when we pin down want “95% confidence” means, it needs to be consistent with “and furthermore, we know for sure that the statement false” or
    “and we know for sure that is true”. So it’s fair to press a bit more on what is actually being said here, because it’s surely
    not obvious.

    • I guess what S1->S2 transition means is that instead of “let’s repeat experiment A N-times, then for 0.95N of them the true value will be within CI” one says “let’s go through life doing each experiment ones or only a few times, then out of accumulated N different non-repetitive experiments 0.95N will have true value within CI”.

      • I can’t distinguish this from a statement about probability. Perhaps the meat of it then is S2 -> S3, when the step is made that we are making a statement – not about the probabilty of correctness across our life experiences with many tests, but a statement about a particular interval – nothing random left. What interpetation other than Bayesian is left?
        What if the interval is provably wrong, as can happen?

  7. Items j.1 and j.2 have been complaint of mine with co-workers pretty much since I was a postdoc. Widespread obliviousness to significant figures and to the importance of properly labeling x- and y-axes never ceases to amaze (dismay?) me. (My sense is that people working in academia or academic-oriented environments tend to be a bit better at it.) I’d like to believe that if you’ve been through grad school in the physical sciences, engineering, or the like – basically any field which requires that you report quantitative results in writing and/or create graphs on a regular basis – would at least have the basics down but my experience doesn’t support it. Anecdotally**, there are far more people who are highly-skilled at the technical aspects of what they do than there are people who can accurately and effectively communicate the results of their work.

    ** Caveat: The plural of anecdote is “bullshit”.

  8. “as a rule of thumb 95% CI’s are lucky to have 30% coverage.”
    Well, if the model assumptions (including iid) don’t hold precisely, the “true parameter” is not defined, and therefore it is not well defined what “coverage” means. So one may say that 95% coverage is illusory, but quite certainly there is no other more correct “true coverage percentage”-value.

Leave a Reply to Martha Cancel reply

Your email address will not be published. Required fields are marked *