I agree with Chris when he says the utility of these methods is extremely limited. In fact, I know of no situation where a nonparametric test is better suited than a parametric alternative. Think about it: Most of the nonparametric tests used today have been invented in the 1940s and 1950s, a time in which not much computational power was available to the ordinary researcher. Thus, tests for which no more effort than sorting and counting is necessary seemed very attractive at that time. Today, however, more sophisticated methods exist which make nonparametric tests completely needless.

]]>One example in a science field which is like Option #1 is the “Physics for Future Presidents” course at Berkeley, where instead of going though the standard toy problems of intro physics (force diagrams, inclined planes and so on) focuses on the physics concepts underlying various topics of interest (nuclear power, earthquakes, climate change, and so forth), with few to no equations.

]]>My challenge in teaching is that (for busy science professionals) I only get an hour, maybe up to 6 episodes over the year, so little time for group discussion or assessment of their actual grasp of concepts :-(

But I do get to see what a subset subsequently do in their work :-)

So I go long on concepts using metaphors and simulation as here https://statmodeling.stat.columbia.edu/2019/10/15/the-virtue-of-fake-universes-a-purposeful-and-safe-way-to-explain-empirical-inference/

And with the hope that those who really want to learn will persist in running and modifying the code that is available.

]]>“Seems like it would make sense to yes chi the big unifying idea early on!” –> “Seems like it would make sense to introduce the big unifying idea early on!”

]]>(emphasis original)

— p. 276 of: Cobb, George (2015). “Mere Renovation is Too Little Too Late: We Need to Rethink our Undergraduate Curriculum from the Ground Up.” _The American Statistician_. 69:4, 266-282. [http://dx.doi.org/10.1080/00031305.2015.1093029]

]]>“Seems like it would make sense to yes chi the big unifying idea” -> “Seems like it would make sense to introduce the big unifying idea”

I found the George Cobb quote:

“Our junior-senior course in ‘mathematical’ statistics sits serenely atop its mountain of prerequisites, unperturbed by first and second year students. Tradition takes it for granted that they must climb through the required courses in calculus and probability before ascending to maximum likelihood and the richness of profound ideas. Of course we want students to learn calculus and probability, but it would be *nice* if we could join all the other sciences in teaching the fundamental concepts of our subject to first year students.”

(emphasis original)

— p. 276 of: Cobb, George (2015). “Mere Renovation is Too Little Too Late: We Need to Rethink our Undergraduate Curriculum from the Ground Up.” _The American Statistician_. 69:4, 266-282. DOI: 10.1080/00031305.2015.1093029

]]>So sad! Applets can be so much more helpful in instilling conceptual understanding, which is what is important in real application.

]]>“I still cant get over the lack of recognition that people have a variety of learning styles and ‘statistics by proof’ does not serve all learners.”

This is not just a matter of a variety of learning styles — it’s that “statistics by proof” is not real statistics. – -it’s (I assume) just giving proofs, with no discussion of when they are appropriate to apply, and no discussion (including examples) of what the concepts are (and are not), of model assumptions, etc.

I’m a mathematician, but for non-math students, I don’t give proofs — but focus on what a relevant theorem does and does not say in various contexts. Even when teaching math students, I emphasize the applications, and only give proofs when they seem to make an important point; in stead, I emphasize conceptual understanding and discussion of what the hypotheses and conclusions of the theorem say or don’t say in applications. Seeing the proof of a theorem is (usually) a waste of time, compared to learning NOT to “apply” the theorem in a garbage in, garbage out way.

]]>It’s just that the calculation is a bit more straightforward to do than plugging into Bayes’ theorem the way we were taught it. Yes, you can interpret it as a calculation of the rate at which people with a positive test turn out to have the disease, but you can also interpret it (as does Gigerenzer) as the probability that an individual has the disease, given that that individual’s test was positive, which is definitely Bayesian.

I have used this approach many times teaching statistics-naive honors college students at two universities the basic ideas of Bayesian decision theory. It was taught as a seminar so not a lot of students each time, I would guess that over the nine times I taught the course about 200 students took it (and at the University of Vermont the course has been taken over and taught several times by another faculty member after I stopped formal course teaching). I know of several students in that course who became so interested in what they learned in that course that they went on to careers in statistics.

]]>Why not launch it on Coursera?

]]>MERLO stands for meaning equivalent reusable learning object

A MERLO items has 5 alternative representations, some with meaning equivalence, some with surface similarity. Learners are asked to mark down those with meaning equivalence.

The pedagogical approach is using MERLO items in formative assessment sessions, such as the one the above video.

How to use this in stats education is discussed in chapter 6 of https://www.amazon.com/Information-Quality-Potential-Analytics-Knowledge-ebook/dp/B01MEERM38/ref=sr_1_3?qid=1573050181&refinements=p_27%3ARon+S.+Kenett&s=books&sr=1-3&text=Ron+S.+Kenett

See also https://iase-web.org/documents/papers/icots8/ICOTS8_1C3_ETKIND.pdf?1402524969

]]>After completely bombing his first test, he came to me for tutoring. I was shocked that all his lectures slides were only equations, not a single plot of some data to illustrate the concepts. I couldn’t believe that this level of instruction is considered acceptable. It was easy to find good lectures online, and use this material to tutor my son and his friends but I still cant get over the lack of recognition that people have a variety of learning styles and ‘statistics by proof’ does not serve all learners.

I would start a course with a text like Phillip Good’s Resampling Methods. Lots of examples, and then build the math from the data and the problems. That would provide a reason for the math, and a background in data handling as the course progresses.

]]>So there are two conceptions of the problem:

A) “a Frequentist cares about determining the frequency with which a randomly selected patient from a particular population who has a certain WBC is in the sub-population of people with the disease”

Is pretty unambiguous I think.

B) “a Bayesian cares about determining how much credence to give to the idea that a particularly given person who has a particular medical history and a particular WBC count does or does not have a disease”

Note that whether the actual person in front of you has the disease or not does not affect the answer in (A)… since the answer in (A) is about a process of selection and how often that would result in a certain event.

Whereas as information comes in from additional tests etc, the answer in (B) converges either to yes, or no… while the person’s disease status does not change… showing that the credence is a function of the available information.

]]>The students in the course generally hate it. They’re 19 and it’s very hard for them to see the relevance of having quantitative. In general, the course fits some of the advice that Andrew gives on this blog – there is no single course that can cover it all or fix any deeply held misunderstandings. At least not for a majority of students. The main weakness in our approach is that they don’t have repeated consistency in later classes. We can teach them best principles in the intro course, but then they learn all kinds of things from professors in later courses that don’t match up, and they become confused and frustrated. In a perfect world, this course would be a first step and everything later would build off of it.

]]>Carlos: that concept, and the concept of a population with a known base rate of disease are also frequentist. The key is defining some random selection process.

This is why I say that using the “known base rate” makes the Bayes theorem calculation Frequentist. At that point there are just frequencies under repetition involved, and the question is how often would a particular combined even occur. Bayes theorem describes the math just fine but without a probability distribution over an unobservable it isn’t particularly Bayesian, at least it doesn’t use the main technique of Bayesian stats.

]]>What about “how often would a random person who has this WBC count have the disease?”

Is it a Frequency (..tist?) concept? Is it a Bayesian concept? Both? Neither?

]]>I suspect when people are wary of Bayes, they are worried about the need to formulate an explicit generative model, because this model can of course be wrong in which case the analysis is useless (this worry is often framed in terms of “priors”, but I think people often mean “models”). Particularly given how Frequentist methods are often taught, people are led to believe that the implicit definition frees them from this kind of error, but really it just shoves the error into an assumption that is typically unstated or forgotten.

]]>nope, that’s a Bayesian concept. The Frequency concept is “how often would a random person who has the disease have this WBC count? The Frequency concept refuses to assign probability to the particular person… it’s all about the repetitive process of selecting a random person from the pool of diseased people and doing the WBC count…

If you are assigning probability to an individual rather than a process, you are doing Bayesian analysis

]]>If I’m interpreting your comment correctly, the Bayesian cares about determining the probability that this patient has the autoimmune disease, which she determines using two kinds of information: prior information (e.g., medical history, reported symptoms, recent outbreaks) and the new test results. Each piece of prior information has a weight associated with it (determined empirically or theoretically or however) and their collective weights are entered into a model for the probability that the patient has the disease. This prior probability distribution is for that probability and is defined by the spread and location parameters for the probability. The new information from the test results receive their own weight (again, determined however) and are added to the model, which then reports a new estimate of the probability that the patient has the disease.

*IF* this interpretation is correct, then it seems the two main differences are 1) the probability distribution of interest–the score of sick people for frequentists and the probability that the patient is sick for Bayesians, and 2) whether we stop to compute a result using only the prior information before including the new information–frequentists don’t and Bayesians do. (I realize that there is some sophisticated math that happens when a Bayesian modifies priors, and it’s not the same as including a new predictor in a regression model.) You mentioned that the big difference is that Bayesians assign probabilities to specific events instead of long-run occurrences, but my interpretation is that this is the same as saying that the distribution is for the probability instead of a score (difference #2). Is all this correct?

]]>Here’s my question. Why isn’t likelihood and maximum likelihood in all intro stats classes, and somewhere near the front? I, and I think many nonstatistician scientists, are surprised when we eventually learn that most of the standard cookbook stats tests and estimators can be derived from ML, and that the same method is also vastly more flexible than the cookbook tests of STATS101. Its not even that difficult, it can be introduced conceptually and done in lab with a hill-climbing search algorithm – derivative calculus to find analytic solutions for the ML estimator can be described conceptually also, and then students have some idea of where these formula come from, and also what to do when data don’t fit in one of the cookbook boxes.

Seems like it would make sense to yes chi the big unifying idea early on! I saw George Cobb make this argument in a 2015 paper, but it still doesn’t seem to be catching on in intro stats.

]]>Cheers,

Justin

First let’s get a few brief things cleared up

What is probability? it’s a kind of math you can do. At this level, it’s purely formal.

What does probability mean? Now we’re asking about how it can be used to match purely formal quantities to real world quantities.

Probability is a kind of math that you can do to manipulate quantities that can represent multiple kinds of things. So there *isn’t a unique meaning for probability*. For example, you can represent how often an infinite sequence of numbers will give you a certain result if you take the sequence in batches of a certain size and calculate the result from each batch (like say an average). But probability is also the rules for weighting how reasonable it is to conclude that a certain thing might be true if you require that your weighting scheme has certain properties (this is Cox’s theorem). You can also motivate probability as pure measure theory on finite measures… Probability can be used to do calculations of the weight of metal bars as a fraction of the weight of the full initial uncut bar when you subset the metal bars in certain ways for instance (or if you like, to calculate properties of chemical reactions in terms of what fraction of atoms are participating in a certain kind of chemical in solution). So it’s not a single thing.

The relationship between Bayesian Statistics and Probability Distributions is that Bayesian Statistics calculates a kind of weight of evidence for a certain subset of parameter space using the calculus of probability.

Contrast that to Frequentist probability theory in which probability is *only* used to express a model for how often observable outcomes will occur when we repeat our observations. Frequentist inference *refuses* to assign probability to quantities that are not observable in repetition.

Bayes theorem is just a mathematical theorem about any mathematical system that obeys the probability axioms. As such, it’s a fact about the numbers, and so it holds when it comes to numbers used in *either one* of the models for probability. What tells you whether a calculation is frequentist or Bayesian is whether or not probability is assigned to things that aren’t repeatable random samples. It has nothing to do with whether or not you used Bayes Theorem to do the calculation really.

For example, if we choose a random patient and give them a test that has random outcome then we can use Bayes Theorem to calculate *how often* this process of choosing a random patient and giving a test will it occur that the patient has the disease and also tests positive…. The fact that this is a *how often* question is what makes it a Frequentist calculation.

On the other hand, if we choose a *particular* person and fix them, and then give them a test and see it shows positive, we aren’t interested in how often *other* patients chosen at random would have the disease… we’re interested in the “unobservable” fact of whether or not *this patient* has the disease. So we can use probability to provide a kind of weight of credence to lend to the idea that the parameter (the true state of the disease) is equal to 1 vs equal to 0.

Doing this calculation, we can provide some prior information we have about whether the person has the disease, for example some information from the verbal history of the patient: “I was bitten by a tick last week and then I got a rash”. This lets us assign credence to the idea that “unknown value of disease variable is 1”. There are a number of ways to assign this credence, *some times* we use *how often has this sort of thing happened in the past* but it’s not the only method of coming up with information.

So, the quick version:

Frequentist calculations: are entirely about *how often OBSERVABLE things occur in repetition*. Probability is never assigned to unobservable quantities.

Bayesian calculations: are about how much credence we should lend to an idea that we have about an unobservable thing conditional on what relevant information we’ve observed or assumed. To the extent that future repetitions haven’t been observed yet, they too can be assigned Bayesian probability.

]]>You could also bring in the notion that we all act as amateur scientists all the time. We apply implicit models of events and behavior, like when we plan what we’ll wear tomorrow, or evaluate why an acquaintance is acting strangely, or decide whether a girl likes you or is just being nice. These models are devised by reflecting on the past, tested by observing and asking questions, and revised based on those results. This could lead to introducing the scientific method generally and Bayesian priors in particular.

]]>I don’t understand your question. Bayes is entirely about probability distributions. Bayes’ theorem is a statement about probabilities! In contrast, Bayes’ theorem and Bayesian inference say nothing about “belief.” Belief is one model for probability, but only one model.

]]>I do think that is a key insight/distinction that needs to be brought out early in one’s learning about statistics.

Though instead of a physical (closed/black box) artifact I use “a shadow metaphor, then a review of analytical chemistry and move to the concept of abstract fake universes (AFUs) … ones you can conveniently define using probability models where it is easy to discern what would repeated happen – given an exactly set truth”. https://statmodeling.stat.columbia.edu/2019/10/15/the-virtue-of-fake-universes-a-purposeful-and-safe-way-to-explain-empirical-inference/

This automates the experimenting process and if the students can use R that can make their own boxes to explore. Here you can start become realistic adding such things as systematic error, selection bias, confounding, etc.

]]>I do think it is helpful to view mistakes though, like the examples on this blog, and the ideas that go along with those examples. So I would include that in a course as well. ]]>

To extend your thoughts a bit, I think a course that took a single or a few applied problems and worked them from start to finish would bring up most of the issues that are most problematic such as unrealistic modeling of error, assumptions of causality without any argument for it, and unmeasured variables. They could not be of the sort typically presented in a text book and would need to be unstructured in a way that the structuring of the problem would be a primary focus of the course and the statistical methods a way to model reality relative to the problem at hand.

1. Formulating the question

2. Identifying sources of data that will be useful

3. Data collection/acquisition

4. Models of the problem

5. Statistical methods that would be useful for shedding light on these models

The medical residents (and docs) that I help have had stat 101 courses. They’ve had some brief biostat lectures in MS1 year. They don’t get it, nor should they, because I’m sure the main concern was the couple of questions on the USMLE Step 2 exam rather than learning (they have lots of other stuff to learn!). It would be far more relevant for them (and I am sure a variety of other disciplines) to never take a Stat 101 course, but rather a course on how to think about research and how to read a paper.

]]>In many ways inference is just making decisions about approximations.

]]>I can easily imagine a course which begins with the problem or opportunity faced by a Pearson, for example, and then how he got to where he did, and then applying those ideas in practical form.

A great joy of a grad level statistics course was being set free with big data sets (census and land use, in urban planning) and simply exploring what one can make of it. That aspect of ‘play’ is both useful and unexpected, and may lead our poets to work that connects with them! ]]>

Bayesian thinking let’s you escape the straitjacket of a yes / no mindset that comes with p vales and NHST but it’s somewhat unsatisfactory if there’s a real-world yes-no decision at the end of it all, but the course leaves you hanging about how to make that decision.

This is my beef with stat courses whether Bayesian oriented or traditional. Ironically, even as the courses progress to more advanced ones they don’t seem to add much more material on the decision theory side.

It’s somewhat of a no mans land that no course wants to tread on.

]]>