A few people pointed me to a recent news article by Stephanie Lee regarding another scandal at Stanford.

In this case the problem was an unstable mix of policy advocacy and education research. We’ve seen this sort of thing before at the University of Chicago.

**The general problem**

Why is education research particularly problematic? I have some speculations:

1. We all have lots of experience of education and lots of memories of education not working well. As a student, it was often clear to me that things were being taught wrong, and as a teacher I’ve often been uncomfortably aware of how badly I’ve been doing the job. There’s lots of room for improvement, even if the way to get there isn’t always so obvious. So when authorities make loud claims of “50% improvement in test scores,” this doesn’t seem impossible, even if we should know better than to trust them.

2. Education interventions are difficult and expensive to test formally but easy and cheap to test informally. A formal study requires collaboration from schools and teachers, and if the intervention is at the classroom level it requires many classes and thus a large number of students. Informally, though, we can come up with lots of ideas and try them out in our classes. Put these together and you get a long backlog of ideas waiting for formal study.

3. No matter how much you systematize teaching—through standardized tests, prepared lesson plans, mooks, or whatever—, the process of *learning* still occurs at the individual level, one student at a time. This suggests that effects of any interventions will depend strongly on context, which in turn implies that the average treatment effect, however defined, won’t be so relevant to real-world implementation.

4. Continuing on that last point, the big challenge of education is student motivation. Methods for teaching X can typically be framed as some mix of, Methods for motivating students to want to learn X, and Methods for keeping students motivated to practice X with awareness. These things are possible, but they’re challenging, in part because of the difficulty of pinning down “motivation.”

5. Education is an important topic, a lot of money is spent on it, and it’s enmeshed in the political process.

Put these together and you get a mess that is not well served by the traditional push-a-button, take-a-pill, look-for-statistical-significance model of quantitative social science. Education research is full of people who are convinced that their ideas are good, with lots of personal experience that seems to support their views, but with great difficulty in getting hard empirical evidence, for reasons explained in items 2 and 3 above. So you can see how policy advocates can get frustrated and overstate the evidence in favor of their positions.

**The scandal at Stanford**

As Kinsley famously put it, the scandal is isn’t what’s illegal, the scandal is what’s legal. It’s legal to respond to critics with some mixture of defensiveness and aggression that dodges the substance of the criticism. But to me it’s scandalous that such practices are so common in elite academia. The recent scandal involved the California Math Framework, a controversial new curriculum plan that has been promoted by Stanford professor Jo Boaler, who, has I learned in a comment thread, wrote a book called Mathematical Mindset that had some really bad stuff in it. As I wrote at the time, it was kind of horrible that this book by a Stanford education professor was making a false claim and backing it up with a bunch of word salad from some rando on the internet. If you can’t even be bothered to read the literature in your own field, what are doing at Stanford in the first place?? Why not just jump over the bay to Berkeley and write uninformed op-eds and hang out on NPR and Fox News? Advocacy is fine, just own that you’re doing it and don’t pretend to be writing about research.

In pointing out Lee’s article, Jonathan Falk writes:

Plenty of scary stuff, but the two lines I found scariest were:

Boaler came to view this victory as a lesson in how to deal with naysayers of all sorts: dismiss and double down.

Boaler said that she had not examined the numbers — but “I do question whether people who are motivated to show something to be inaccurate are the right people to be looking at data.”

I [Falk] geţ a little sensitive about this since I’ve spent 40 years in the belief that people who are motivated to show something to be inaccurate are the perfect people to be looking at the data, but I’m even more disturbed by her asymmetry here: if she’s right, then it must also be true that people who are motivated to show something to be accurate are also the wrong people to be looking at the data. And of course people with no motivations at all will probably never look at the data ever.

We’ve discussed this general issue in many different contexts. There are lots of true believers out there. Not just political activists, also many pure researchers who believe in their ideas, and then you get some people such as discussed above who are true believers both on the research and activism fronts. For these people, I don’t the problem is that they don’t look at the data; rather, they know what they’re looking for and so they find it. It’s the old “researcher degrees of freedom” problem. And it’s natural for researchers with this perspective to think that *everyone* operates this way, hence they don’t trust outsiders because they think outsiders who might come to different conclusions. I agree with Falk that this is very frustrating, a Gresham process similar to the way that propaganda media are used not just to spread lies and bury truths but also to degrade trust in legitimate news media.

**The specific research claims in dispute**

Education researcher David Dockterman writes:

I know some of the players. Many educators certainly want to believe, just as many elementary teachers want to believe they don’t have to teach phonics.

Popularity with customers makes it tough for middle ground folks to issue even friendly challenges. They need the eggs. Things get pushed to extremes.

He also points to this post from 2019 by two education researchers, who point to a magazine article coauthored by Boaler and write:

The backbone of their piece includes three points:

1. Science has a new understanding of brain plasticity (the ability of the brain to change in response to experience), and this new understanding shows that the current teaching methods for struggling students are bad. These methods include identifying learning disabilities, providing accommodations, and working to students’ strengths.

2. These new findings imply that “learning disabilities are no longer a barrier to mathematical achievement” because we now understand that the brain can be changed, if we intervene in the right way.

3. The authors have evidence that students who thought they were “not math people” can be high math achievers, given the right environment.

There are a number of problems in this piece.

First, we know of no evidence that conceptions of brain plasticity or (in prior decades) lack of plasticity, had much (if any) influence on educators’ thinking about how to help struggling students. . . . Second, Boaler and Lamar mischaracterize “traditional” approaches to specific learning disability. Yes, most educators advocate for appropriate accommodations, but that does not mean educators don’t try intensive and inventive methods of practice for skills that students find difficult. . . .

Third, Boaler and Lamar advocate for diversity of practice for typically developing students that we think would be unremarkable to most math educators: “making conjectures, problem-solving, communicating, reasoning, drawing, modeling, making connections, and using multiple representations.” . . .

Fourth, we think it’s inaccurate to suggest that “A number of different studies have shown that when students are given the freedom to think in ways that make sense to them, learning disabilities are no longer a barrier to . Yet many teachers have not been trained to teach in this way.” We have no desire to argue for student limitations and absolutely agree with Boaler and Lamar’s call for educators to applaud student achievement, to set high expectations, and to express (realistic) confidence that students can reach them. But it’s inaccurate to suggest that with the “right teaching” learning disabilities in math would greatly diminish or even vanish. . . .

Do some students struggle with math because of bad teaching? We’re sure some do, and we have no idea how frequently this occurs. To suggest, however, that it’s the principal reason students struggle ignores a vast literature on learning disability in mathematics. This formulation sets up teachers to shoulder the blame for “bad teaching” when students struggle.

They conclude:

As to the final point—that Boaler & Lamar have evidence from a mathematics camp showing that, given the right instruction, students who find math difficult can gain 2.7 years of achievement in the course of a summer—we’re excited! We look forward to seeing the peer-reviewed report detailing how it worked.

Indeed. Here’s the relevant paragraph from Boaler and Lamar:

We recently ran a summer mathematics camp for students at Stanford. Eighty-four students attended, and all shared with interviewers that they did not believe they were a “math person.” We worked to change those ideas and teach mathematics in an open way that recognizes and values all the ways of being mathematical: including making conjectures, problem-solving, communicating, reasoning, drawing, modeling, making connections, and using multiple representations. After eighteen lessons, the students improved their achievement on standardized tests by the equivalent of 2.7 years. When district leaders visited the camp and saw students identified as having learning disabilities solve complex problems and share their solutions with the whole class, they became teary. They said it was impossible to know who was in special education and who was not in the classes.

This sort of Ted-worthy anecdote can seem so persuasive! I kinda want to be persuaded too, but I’ve seen too many examples of studies that don’t replicate. There are just so many ways things go wrong.

**P.S.** Lee has reported on other science problems at Stanford and has afflicted the comfortable, enough that she was unfairly criticized for it.

This typo needs a correction:

“For these people, I don’t the problem is that they don’t look at the data”

As to

“When district leaders visited the camp and saw students identified as having learning disabilities solve complex problems and share their solutions with the whole class, they became teary. They said it was impossible to know who was in special education and who was not in the classes”

it is an emotional tug that is uncomfortable and to be absolutely cynical, it reminds me of Trump’s technique of self-admiration. On the other hands, perhaps it is just jealousy on my part.

Lee’s reporting on this was fascinating to me.

In a former life, I was a special education teacher.

Prior to that, I was a Montessori teacher. When I trained in the Montessori method I saw a number of adults who said, effectively, “Oh, I always thought I wasn’t a math person, but now that I’ve used these materials for math instruction I now see it’s not that I’m not a math person, but I’m someone who was taught math in a way that wasn’t effective for me.”

In essence, they worked with materials to see how concepts like place value and squaring and cubing numbers, and squaring and cubing binomials or trinomial, and the value of pi, and calculating the area of a circle, etc., can be represented concretely and not just treated as abstract concepts.

In my subsequent work as a special education teacher I worked with some students who “didn’t get math” when taught to them as a process of following instructions to carry out algorithms taught as abstract concepts (i.e., “cross out the seven and carry the one,” or “invert and multiply”), but who began to be a person who “understood math” when the method of instruction used concrete materials as a bridge to the abstract concepts and algorithms.

All anecdotal, of course.

And I’m well aware of just how difficult educational research is, because there are so many critical variables that are so difficult to control, and the Hawthorne effect, etc.

And while I was shocked at some of the unscientific comments from Boaler, and while I respect many of the criticisms of the concepts she pushes, and while I acknowledge a bias on my part…

I just have a really strong sense that there’s much that’s core to her educational advocacy that is true and critically important.

And I’m also a bit skeptical of the reliability of “brain science” more generally, but more specifically as a way to understand education. Valuable evidence? Sure. No doubt.

Yes. All of that really helps lots of students, both learn and not hate math. And also to think more abstractly.

Boaler starts with an appealing approach: “making conjectures, problem-solving, communicating, reasoning, drawing, modeling, making connections, and using multiple representations.” I wish that the claim that such an approach is “unremarkable to most math educators” were an accurate description of most K-12 teaching. Unfortunately almost all the K-12 teaching I’m familiar with is more of the form “when you see this do that”.

But then Boaler pulls a bait-and-switch. A broad vision of what direction teaching should go cannot justify promises of miracle cures or, much worse, dishonest claims about data. As Andrew says, the real disappointment is that major institutions accept the bogus claims even when the problems are in plain view. The fundamentally religious epistemology- anyone who is not already a true believer cannot testify- is becoming more accepted even in physics education research.

“When you see this do that” is all too common for sure.

Thats how math was taught to me until say about 9th or 10th grade. I was pretty slow at arithmetic, and did very well on verbal skills so was called a non-math-person. I later went on to get a BS in math and a PhD in engineering.

I’m still slow at arithmetic, but it turns out that by hand arithmetic is almost none of mathematics. Still if you were to randomly select people from the population the vast majority of them would top out at solving for x in 2x-3 = 13 which means that by hand arithmetic is almost all of their knowledge. This is still true for most 5th grade or under teachers. They can follow the curriculum but they don’t know math per se.

Daniel:

“by hand arithmetic is almost none of mathematics.”

But it’s a huge part of every-day life!!

If you have three different supply chains feeding your medical device manufacturing line, how you gonna figure out if you have enough parts to complete the units on order?? Arithmetic. Computerized modeling is great but if you’re checking in with Receiving on what arrived today and how much you need to produce in the next X days, you better be able to do the math in your head and get on the phone if it’s not enough.

Even as a scientist the first thing you should do when you see any numerical result is run a back of the envelope calculation with basic parameters in your head and see if it makes any sense. The reason we have so much idiotic research is because most people either don’t bother to or can’t do that – or even worse they believe in quack BS that they think subverts fundamentals.

The reason teachers default to “when you see this do that” is because that’s what students – and parents – demand: they want an infallible formula to get the answer right. They don’t give a rats behind about conceptual understanding.

Joshua:

most of your returning older people who have the light go on bring something to the learning at age 20+ that they didn’t have in grade school: experience in the adult world, where they’ve dealt with more arithmetic than they’d care to acknowledge or thought they’d have to face.

And I’ve found it common for people to claim they were “never taught” this or than when in fact I know they were since what they claim to “never” have been taught is a core part of the curriculum. So when people say “I was never taught that way”, chances are good they just weren’t paying attention, like a lot of kids in grade school, high school, and universities.

That’s why, again, controlled studies matter.

Chipmunk –

I can assure you that the people I’m talking about had never had basic mathematical concepts kresmtrd to them in the manner they were presented in the Montessori training.

Likewise, I know for a fact that the younger students I worked with, who were convinced they “couldn’t do math,” had been taught that doing math essentially boiled down to memorizing algorithms without being given thorough assistence in building the underlying concrete foundationsl knowledge first. Then when they didn’t understand what they were doing they internalized the message that they were stupid and concluded they can’t to math.

Of course my personal experiences aren’t validation for my view. So take my opinions based for what they’re worth to you. But then again, attaining good empirical data on educational research that controls for all the critical variables is incredibly difficult.

Chipmunk, by hand arithmetic with precision is almost useless today. If you need precision everyone is carrying a supercomputer in their pocket, and if not a $5 calculator is available everywhere.

ESTIMATION on the other hand is extremely important because it can keep you from wasting time doing precise calcs that are unnecessary. Get within 10-20% of most everyday calcs is sufficiently good to determine if you need to reorder supplies or buy extra buckets of paint etc. But when I was taught grade school math there was extremely little emphasis on estimation. It was all precise long division by hand over and over again etc.

To do estimation well requires understanding more fundamental mathematical concepts, like for example 347 * 18 = (350-3)*(20-2) which is reasonably well approximated by 350*20 which is about 7000 and the answer must be a few hundred less than that, so maybe that’s enough to know that you don’t need to order more of something until after the 15th of the month because you have more than 7000 of them and it only takes 3 days for delivery.

How many 8th graders are confident enough with estimation that they would choose to do that calculation though? I think even though estimation is more commonly taught now with common core, it’s still not something people are comfortable with

“If you need precision everyone is carrying a supercomputer in their pocket…”

I’m *just* old enough to have been constantly told in school that “you won’t always have a calculator in your pocket!”. I’m guessing teachers don’t use that line much anymore.

I have a PhD in physics but, like Daniel, I’m slow at arithmetic. Not terribly slow, maybe — in a race against high school kids I’d probably be somewhere in the middle — but slow compared to what one might expect from a physics PhD, especially if you think physics requires a lot of by-hand calculation.

So, yeah, what he said.

Daniel & Phil:

” Get within 10-20% of most everyday calcs is sufficiently good to determine if you need to reorder supplies or buy extra buckets of paint etc. ”

No way. That’s a recipe for failure in business. At most auto makers, the on-hand inventory is two hours of production. Imagine one part is 20% short. Entire production lines shut down, hundreds of people are idled. The consequences are equally severe for a small business with dozens of employees if they’re idled for two or three days because of parts shortages.

Even at Home Depot, the day your customer goes to Lowes is the day s/he may never come back. OTOH, if you get caught with outdated inventory you have to chuck it or discount it. The more you do that, the more the competition will eat your lunch.

I’m not saying that you have to be instantaneous in your head at this kind of calculation. But you have to be able to do it in a few minutes and make decisions. Everyone along the line has to be able to reason with basic arithmetic. That’s just reality.

Home Depot and manufacturing plants are not doing this in their heads or by hand. Even small local supermarket chains use inventory management software. And if you’re at a very small business and you need to do a calculation that could potentially leave your business idle for days, and you decide to do it in your head instead of pulling a phone out of your pocket, you’re just an idiot.

To be extra clear for you:

I am not saying it’s useless to learn how to do arithmetic in your head

I am saying that this imaginary phone order where taking an additional 30 seconds to pull your phone from your pocket and unlock it strands a production line for 3 days is laughable. If you’re at that point, your business was dead anyways

Joshua:

I can’t speak to the benefits or lack thereof of Montessori teaching methods, since I don’t know the method. The good news is that Montessori is (AFAIK) readily available for people if that’s what they think is best.

OTOH, the United States built the most technically advanced society in world history using “old” or “standard” teaching methods. So it’s not immediately obvious that why we should accept that, *on average*, those methods are deficient.

I agree that good clinical data is hard to come by, but given the above (standard teaching methods have historically been highly successful) it’s not apparent that we need research and data, particularly when “we” always think the data is great until the method used to generate it fails miserably when implemented in the real world.

IMO education is a lot like economics: there are lots of people using it to push an agenda and sell their beliefs, and it’s easy to bury invalid assumptions and to hide the fact that their special snake oil doesn’t actually work.

Have you ever left the country?

> At most auto makers, the on-hand inventory is two hours of production

Citation needed. Do we really imagine a truck comes and delivers *all* the parts needed to manufacture a car every 2 hours as they get used up? (I could call my family friend who runs a business doing auto parts quality control and remanufacturing for what used to be NUMMI and is now the Tesla plant in the bay area, but I won’t bother her because I’m pretty sure they don’t have 12 truckloads of parts a day coming through in a line, or if they do, then it’s all pre-scheduled anyway)

Also, that Joe in “parts accounting” is madly doing mental arithmetic because he broke his pencil and can’t even scribble while he figures out the precise interval he should wait before reordering a truckload of parts and that although he’s needed to order a new truck every 2 hours every day for the last 150 days since he took the job, nevertheless it’s critical for him to know precisely when to pick up the phone to within less than 20% of a 2 hour interval which is 24 minutes, so he spends maybe 1 hour running around counting things, then muttering to himself while touching his fingers together madly then sits by the phone until it’s 19 minutes to 11am and then hits redial to tell a guy “ok send in the next truck”?

And that there’s an entire class of mental arithmetic savants across the country doing this critical task because no-one allows anyone to use their supercomputer cell phone’s built in calculator, or an Excel spreadsheet, or even a Casio? Or Mrs Finkelstein will come and rap their knuckles with a ruler?

And if they had a truckload coming every 2 hours, they wouldn’t figure out that it’d be ok to maybe just have another truckload sitting in the parking lot in case a truck had a flat tire?

To the extent anyone does just in time parts management it’s 100% computer automated and the real human interaction is to enter some data for the computer to crank on that’s probably just counts of boxes on warehouse shelves. There’s a reason JIT supply chains only existed after about 1990.

I don’t know about “auto-makers.” But when you take your car in to jiffy lube or wherever, they mostly order whatever parts are required as needed. Source: multiple friends who had the job of driving around all day delivering parts in high school.

Anon: yes, because hundreds of different car types roll through the door at Jiffy Lube, so you have no idea what parts to keep in stock, except maybe the 3 most common vehicles.

but at an automaker they make a handful of variations on the same vehicle with largely the same parts needed. All the vehicles need the same blinker lever, the same headlights, the same gas pedal, the same seat rails and steering wheels and suspension springs and oil filters…

https://electrek.co/2022/01/24/tesla-operates-most-productive-car-factory-us/

Tesla’s Fremont plant makes 8550 cars a week, that’s 101 cars every 2 hours. Do they really need a guy to time the reorder to within 15 mins using pure mental arithmetic? Or do they have a continuous flow of trucks full of parts that were scheduled weeks to months in advance using predictive models calculated in a computer? What happens when the supplier runs out of steering wheels? It takes 3 weeks for them to come from China is what. It’s all pre-scheduled months in advance by computer, that’s the only conceivable way you could do 8550 vehicles a week. Either that or it’s done in big batches of 20,000 parts and stored in stacks of containers at the docks.

+1

Your comment matches my thoughts both with respect to what is and isn’t unremarkable, and the problems exposed in Stephanie’s piece.

Is education research any more problematic than, say, nutrition research? As I recall, it was Stephanie Lee who broke the Wansink story.

(I could make similar comments about economics, psychology, etc)

The first book to teach the NHST hybrid was EF Lindquist’s 1940 book “Statistical Analysis in Education Research”:

https://archive.org/details/in.ernet.dli.2015.18292/mode/1up

It isn’t clear whether he was the one who first combined Fisher’s significance testing with Neyman-Pearson hypothesis testing. Perhaps that was already going on in the 1930s and he just wrote it down. But anyway it seems education research is special because it was the first field to adopt this method.

Other fields like nutrition, psychology, medicine, and physics all adopted it 10-50 years later. So education research has a head start.

The way it works is that the generation in charge when it first appears still use the scientific method (replicating each others experiments, checking the predictions of their theories), and the NHST is mostly an afterthought. Then the students of that generation kind of remember science, but NHST plays a more prominent role. It is the third generation (~40 years after initial adoption) where the utter confusion sets in. At that point the field produces ~10,000 pointless or misleading publications for every useful one.

Huh, this is fascinating. Is there any longer-form piece which explores this transition?

See Gigerenzer et al., 1987, The Empire of Chance, especially chaps. 3 and 6, and the following 1993 paper by Gigerenzer: https://pure.mpg.de/rest/items/item_2547840/component/file_2566387/content ,

Quick correction: Gigerenzer et al. is 1989; and also, Gigerenzer 2004 is interesting: https://pure.mpg.de/rest/items/item_2101336/component/file_2101335/content#SOCECO377BIB21

I wonder if Boaler is able to comprehend Jaime Escalante’s success in teaching calculus at the high school level in Los Angeles. I don’t think brain plasticity was a thing back then.

What’s needed is a theoretical framework. Start here. “An Economic Model of Teaching Effectiveness” Anthony K. Lima The American Economic Review, Vol. 71, No. 5 (Dec., 1981), pp. 1056-1059 (4 pages)

https://www.jstor.org/stable/1803490

Boaler and the California Math Framework is to mathematics education what Hannah-Jones and the 1619 project is to history. Coincidently, or not, both of them are making bank on their grift.

Education researchers remind me of sales force training consultants: much of what value they deliver is as motivational speakers. So they brim with self-confidence in whatever system they advocate, and sometimes it rubs off and does some good in firing up the teachers/sales people.

I actually believe the 2.7 years of improvements on standardized tests, however, I think that most of that improvement came from the students becoming comfortable and confident enough to actually try to answer the questions, rather than from gaining 2.7 years worth of knowledge through the course of one summer.