Dick,

what is the difference between “Intro Stats” and “Stats: Data and Models”? They both seem introductory statistics books: are they different in terms of intended audience?

]]>Andrew, are you referring to the “Intro Stats” book or to the “Stats: Data and Models” book?

]]>Hi Professor De Veaux, I just wanted to give an actual student testimonial for your book (I used the AP “Stats” version).

Overall, I think your textbook is outstanding. I appreciated the obvious effort you and your co-authors made to carefully think through the narrative flow. I also enjoyed how much personality and good sense of humor comes through! It really makes a huge difference in reader engagement.

The only caveat I would offer : please consider rethinking and rewriting the chapter “More About Tests and Intervals”, in which you cover topics that have been so controversial here on Andrew’s blog : p-values, NHST, Type I/II errors, etc. My novice view is that students definitely need to be exposed to all these concepts. They can certainly be misused, but also have important, real-world, practical relevance.

I think this chapter might be helpfully expanded, and even split into two. Some of the material feels way too compressed and frankly rushed a bit. Given that you’re covering NHST, I feel there should be a chapter title which directly conveys that, rather than lumping into a vaguely-titled hodgepodge chapter, as at present.

FWIW, I’ve gotten into long debates here on the blog with other readers, who insist that NHST is completely unfounded, misleading, and harmful to the student, and that only Bayesian inference should be taught. Perhaps your next edition could take that bull (so to speak ;) more fully by the horns, please? It’s very confusing to hear so forcefully, that some of the foundational material you teach is considered by some to be fundamentally wrong.

Thank you for your consideration, respectfully submitted.

]]>And the cost of printing and sending out all those “examination copies” is absorbed in the price students pay for the textbook.

]]>Yes, this points out a big problem in teaching intro statistics — it’s just not possible to do a good job (except possibly with really exceptional students) in one course.

]]>Sounds like this was not exactly a topnotch math student — maybe they thought (hoped) statistics would be easier.

]]>+1

]]>+1

]]>+1

I’d say it’s not usually a deliberate lie, but confusion — with the result being de facto a lie.

+1

]]>Good question. I was thinking of Stats: Data and Models.

]]>> Old stat is filled with (old) ideas in the language of algebra … learning ideas of statistics by practicing the language of programming

Agree, when I proposed at an JSM panel on teaching (2014 or 15) that shift in balance from statistical concepts to programming in intro course – most of the panel and many in audience audience pushed back on that.

I think its starting to happen now.

]]>The problem here seems to be that the null hypothesis was false, and rightly detected to be false, but factors than the drug could plausibly account for the deviation. The correct conclusion to draw from the rejected null hypothesis here is that “mean mortality rates of the treatment and placebo groups are different for some reason”. That is it.

Stats textbook authors can’t get away with that though, too many people would ask what the point of the test is then.

]]>>”I am not aware of a statistical approach that can adequately deal with this…”

Their “hypothesis” that “the drug works” is too vague. If instead they had a model about how it worked and deduced that “the mortality rate should drop by 15% for this dose” (or more likely, predicted some functional relationship between dose, platelet levels, and mortality), then messing up the study isn’t likely to give you results consistent with the model. That is not the case when half the possible results are consistent with your theory (eg “drug leads to lower mortality rate”).

In other words, these clinical trials that just look for the existence of “an effect” are poorly designed to begin with. The problems encountered here are inherent to that design and the goal it is based on, and will not be solved by stats. So why do stats textbooks claim they can tell you whether “the drug works” from such studies? That is either a lie or result of confusion.

]]>Andrew: Do we know whether a PNAS editor for an article attaching his or her name to that article is current PNAS policy

(all editors must do so) or the choice of the editor?

You’d have stiff competition from academics with their time already paid for. Besides, isn’t writing books a big part of what they are already paid for?

The other point is, full time book writing (unsubsidized by a primary job), from a strictly economic perspective is a hard activity to justify: There’s already tons of stuff on Bayesian statistics already out there, some even in the open domain. So first, one would have to be reasonably sure you’d produce stuff better than the baseline. Now, I’m not doubting your competence but it’s a risky proposition for any author. It’s like knowing you’d end up with a “War and Peace” the day you commissioned someone to write it for you.

Then again, even if you are better than all known sources, the question is how much better. If there’s a marginal improvement over all the stuff out there would people be willing to fund it?

et cetra. I think there’s a reason why book writing has largely been a by-product of the academic enterprise and the itch to write and be known for your work. Rather than a full time, directly paid for activity.

]]>I think I see what you mean. Speaking from a student’s perspective, I like the Guttag text for how quickly the student is doing something meaningful. By about page 30, the student is already working with code for simple bisection search, newton-raphson etc. The text is focused on ‘computational thinking’ combined with stats, data analytics, machine learning. Highly recommended … but then I am not a prof so YMMV!

]]>One hand: you got me there :) there might be a pretty strong preference bias going on.

Other hand: you learn a language differently than ideas. Old stat is filled with (old) ideas in the language of algebra.

My suggestion is learning ideas of statistics by practicing the language of programming. That starts with learning to program. Stat 101 presumes algebra as well. I only made it explicit.

I dare say things like GLM or Boosting are easier understood as algorithms than in algebra. Again might be personal preference.

]]>There are some more details about the book here:

]]>Isn’t it stupid for a Journal to ask you to review a paper & yet go ahead & accept the paper without even waiting for the authors in question to respond?

]]>I cannot speak to ‘Think Bayes’ or ‘Think Stats’ (I have them both but haven’t gotten around to them yet) but ‘Think Python’ seems to me very much like the book you are critiquing. Just look at the first dozen or so chapters:

Variables, Expressions, Statements

Functions

Conditionals & Recursion

Iteration

Strings

Lists

Dictionaries

…

It is a fine book as it is, but very much a ‘traditional’ walk through a language. It is not much different from my ancient copy of “Programming and Problem Solving in Pascal”. How many programming etxts have been written in the same way, with the same progression, entire chapters devoted to “types”, then “statements”, then “functions”, etc.? Most of them!

Guttag’s “Introduction to Computation and Programming Using Python” is much better. It only uses the language as a tool to teach computation thinking, and as such only brings in language elements as needed. Also, very affordable.

]]>I feel it is somewhat half hearted. As though they did the first one, then people said they wanted a simulation based book because that’s what people recommend (and is in the Common Core), but it did not cause them to really rethink. I don’t blame them, it is a lot of work to put together a book like that and I think only one of the authors is an academic. That said, anyone can take the files and make their own version … I looked at that idea but it’s still a huge amount of time.

]]>> before studied with NHST no one knew what was going on, after NHST still no one knows.

Most of this disaster seems to have been the result of poor trial conduct and misreporting.

I am not aware of a statistical approach that can adequately deal with this especially given only regulatory agencies can get access to data and records to learn about these real problems.

]]>Cool idea but the way. I’d happy to contribute pro bono.

]]>I explain that I do this solely out of environmental considerations.

So far this “threat” has worked.

I do not get upset so much about receiving yet another intro to xxxx book which says exactly the same as all the other books in way too many pages with way to many pictures. What really bothers me is the waste AND the outrageous price.

]]>And the _right_ big picture (along with a “singular vision to be coherent” text) would need to be appreciated and grasped well enough to be taught adequately.

(Maybe then these endless JSM talks each and every year on teaching stats could be switched to a short course).

]]>Ben:

This endorsement seems overly hopeful in the extreme for someone at the intro level – It provides a seamless path from ignorance to insight in a few hundred clear and enlightening pages.”–Gary King, Harvard University

Do you have a sense of what the students actually learned in their one or two semesters using this book?

]]>> math I emphasized “explain your reasoning” rather than using algorithms to come up with an answer

Interesting.

I once had summer student who had just completed their Phd in math who thought they might want to switch to statistics.

When I tried to get them to explore/grasp the reasoning in a statistical approach (e.g. what do you think the motivation was for this novel method proposed by Efron) they retorted “isn’t there just an algorithm you look up somewhere to do that?”

Whether that was a reflection of how they were taught math or their expectations of what statistics was I am not sure.

p.s. They went into statistics and successfully published there but the last time I reviewed one of their drafts I raised a fair number of criticisms. The draft was accepted by the journal by the time they got these, so they responded with “I guess I don’t have to respond to these” and they didn’t until they had subsequently published another 2 or 3 papers that only addressed a subset of them.

]]>A 2016 version could start with the books by Allen Downey. First some programming in Think Python, then an application in bayesian science in Think Bayes. Can easily fit in a semester course. Easy, accessible and outright skips all the stuff we learn to forget.

The linear model and then quickly more advanced stuff can be introduced later on, when the student has a feeling for working-with-data and working-with-inferences. It’s all about inference nowadays in both stat and machine learning.

]]>Which one are we talking about. The “Intro Stats” oder the “Stats: Data and Models”?

Joel (student)

No table of contents, that’s frustrating. It looks interesting but sounds like it (as with most of these books) covers much more than it is possible to meaningfully cover in a one semester course with beginners. Having taught my last course of the semester tonight, I was thinking about this. First, I’m glad my beginners learn some programming via R even though that took time. I’m fine with “losing” a few hours to having groups present research results during one class. I’m frustrated that I didn’t cover everything I wanted to cover. But my students are real beginners, .

]]>I do think that Franklin’s book for high school students would be very useful for a first undergraduate course if it didn’t rely on TI calculators instead of statistical software.

]]>I want to like Open Intro so much, but I find aspects of that book really frustrating, it’s as if they didn’t think at all about how intro students learn not to mention the graphs. I do think it isn’t sure who its audience is.

]]>That gets tossed into the mix as well — some of it precede the above; some follows.

]]>I was lucky in some sense: I was not trained in statistics, but in mathematics. So when I first started teaching statistics (because I was interested in it and there was a severe shortage of statisticians at my university), I found it really frustrating that the textbooks were so “authoritarian” — i.e, “this is how you do it” with little or any “this is why we do it.” (All the more so because in teaching math I emphasized “explain your reasoning” rather than using algorithms to come up with an answer.)

Fortunately, there were a couple of statisticians around who were glad to answer my questions and point me to things to fill in gaps in my background. Also, an NSF funded summer workshop for mathematicians-teaching-statistics was helpful in pointing me toward textbooks which were better-than-average.

]]>Martha:

I agree; I too doubt that the publishers realize that it’s crap. But they do send books out for review, so my point is that they don’t care if the book is crap. Just like the manufacturer of other “lemons”: they carefully avoid assessing the quality of the product that they’re selling. They don’t care as long as they’re getting their 200 bucks. It’s not like the editor’s or publisher’s reputation are on the line.

Say what you want about Susan Fiske, at least she attached her name to those PPNAS papers. For better or for worse, she’s standing by himmicanes and the rest: those papers got approved on the benefit of her reputation, and now her reputation is tied in part to those papers that she approved. For the book editors and publishers, though, the reputational link seems very weak—indeed, I didn’t mention the book or publisher names in my above post because it seems like just about every textbook publisher puts out books of this quality. It’s their bread and butter.

]]>Martha:

I’ve come to the conclusion that it’s not enough to avoid simplistic language. I think it’s necessary to directly address the problem by stating the simplistic version, explaining that’s what most people do, and then explaining why it’s wrong.

]]>“No, but they’re knowingly flooding the market with crap. Or, perhaps I should say, flooding the market with material that, as far as they can tell, is crap.”

I doubt that the publishers realize that it’s crap.

]]>“Because the p-value is less than the significance level (a = 0:05), we say the null hypothesis is implausible. That is, we reject the null hypothesis in favor of the alternative and conclude that the drug is effective at reducing deaths in heart attack patients.”

When teaching hypothesis testing, I make an effort to avoid simplistic language such as that and say things like the following:

“If we obtain an unusually small p-value, then (at least) one of the

following must be true:

I. At least one of the model assumptions is not true (in which

case the test may be inappropriate).

II. The null hypothesis is false.

III. The sample we’ve obtained happens to be one of the small

percentage (of suitable samples from the same population and

of the same size as ours) that result in an unusually small p-value.

Thus, if the p-value is small enough and all the model assumptions

are met, then rejecting the null hypothesis in favor of the alternate

hypothesis can be considered a rational decision, based on the

evidence of the data used.

However:

1. How small is “small enough” is a judgment call.

2. “Rejecting the null hypothesis” does not mean the null

hypothesis is false or that the alternate hypothesis is true. (Why?)

3. The alternate hypothesis is not the same as the scientific

hypothesis being tested.

For example, the scientific hypothesis might be “This reading

program increases reading comprehension,” but the statistical

null and alternate hypotheses would be expressed in terms of

a specific measure of reading comprehension.

• Different measures (AKA different outcome variables)

would give different statistical tests (that is, different

statistical hypotheses).

• These different tests of the same research hypothesis

might lead to different conclusions about the

effectiveness of the program.”

I agree — DeVeaux’s text is the best of the intro textbooks I have looked at. (But some people don’t like it — in some cases, precisely for the reasons that make it stand out above the competition.)

]]>Jan:

That might be the one we used. It was OK. I liked it a lot at first look, but then when we actually had to put together the course I realized there was a lot that we needed that wasn’t there. My collaborators and I wanted to then write our own book but that takes a lot of work. In retrospect maybe we should’ve just whipped something out and gotten it out there. At some point I plan to get back to this project.

]]>Did you ever try the “Randomization and Simulation” version? I am going to give it a try this Winter semester.

]]>If you can wait until March 2017, I would recommend Kosuke Imai’s book for the undergrad or master’s level

http://press.princeton.edu/titles/11025.html

even though my undergrads did not like it any more than they would have liked a traditional textbook.

Andrew:

Yeah, perhaps that is why open source software has been so successful, but open source text has not. Maybe software has a more modular structure, whereas a text needs a more singular vision to be coherent.

]]>It gets me every time someone says “the late David MacKay” he was only 49!!

]]>The thing is, although there’s a market for textbook publishers to produce that crap book you wrote about because they get a monopolistic captive market, is there a market for kickstartering the production of a fully fledged good open access resource?

To release it Creative Commons at the back end requires getting paid up front for the full cost of production (or at least getting some money up front to fund production, and the rest at release). I estimate maybe $200,000 – $500,000 depending on what’s produced, I’m talking books, worked examples, homework problems, code, downloadable datasets, a USB-key image that boots to a modified Ubuntu with all the tools you could want: R, RStudio, Stan, Emacs, ESS, Maxima, Octave, MySQL, built-in datasets, the whole works.

I’m not an academic, so my salary isn’t going to come from a University while I write and produce and test the thing over a multi year period in my intro-class etc.

It doesn’t seem reasonable for college students to be willing to kickstarter this kind of thing… after all the ones who will use it are currently high school students with no knowledge that they even need such a thing, and the ones who wish they had it now will be well out of their first couple years of Ugrad by the time it’s produced.

So, you’ve got a big audience here, what do you think? Is there a way to raise the funds needed?

]]>tldr: (sqrt(-textbooks))^2

]]>Thanks Andrew. I both appreciate your comments — and agree with the criticism on the material. We’ve been trying to pull the Intro Stats market into the 21st (20th?) century with our books, so in each new edition, instead of just rearranging the deck chairs, we’ve actually tried to steer it into more relevant stuff. We’re working on 5th edition right now which I think is getting closer. But progress is slow! There’s a lot of inertia out there.

]]>It’s not pure stats, but the late David McKay’s book is free for personal use:

]]>