## Don’t get fooled by observational correlations

Gabriel Power writes:

Here’s something a little different: clever classrooms, according to which physical characteristics of classrooms cause greater learning. And the effects are large! Moving from the worst to the best design implies a gain of 67% of one year’s worth of learning!

Aside from the dubiously large effect size, it looks like the study is observational yet cause-and-effect are assumed, and there are many degrees of freedom.

I would say this is just the usual dog bites man stats mistakes, but governments might spend many millions on this…

OK, I took a look. They report a three-year study of 153 classrooms in 27 schools. They do multilevel modeling—hey, I like it already. And scatterplots! The scatterplots are kinda weird, though:

It’s a full color (or, I guess I should say, full colour) report, so why not use those colours in the graphs, instead of the confusing symbols. Also it’s not clear what happened to the urban and rural schools in the top graph, or why there are so many divisions on the x and y-axes or why they have all those distracting gray lines.

Throughout the report there are graphs, but it doesn’t seem that much statistical thought went into them. The graphs look like they were made by an artist, not in consultation with a data analyst. For example:

Plotting by school ID makes no sense, the symbols contain zero information, the graph is overwhelmed by all these horizontal gray lines, and, most importantly, I have no idea what’s being plotted! The title says they’re plotting the “impacts of classrooms” and the vertical axis says they’re plotting “input.” Inputs and impacts, those are much different!

I’m not trying to be a graphics scold here. Do whatever graphs you want! My point is that graphics—and statistical analysis more generally—are an opportunity, a chance to learn. Making a graph that’s pretty but conveys no useful information—that’s a waste of an opportunity.

I think there’s a big problem, which is that graphs are considered as a sort of ornament to the “real analysis” which is the regression. But that’s not right. The graphs are part of the real analysis.

In particular, nowhere are there graphs of the raw data, or of classroom-level averages. Too bad; that would help a lot.

Also, I can’t figure out what they did in their analysis. They have lots of graphs like this which make no sense:

“Colour” is not a number so how can it have a numerical correlation with overall progress. And what exactly is “overall progress”? And why are correlations drawn as bars: they can be negative, no? Indeed, it’s suspicious that all of the correlations are between 0 and 0.18 and not one of them is negative.

The big picture

Stepping back, here’s what we see. These researchers care about schools and they know a lot about schools. I’m inclined to trust their general judgment about what’s good for schools, whether or not they do a quantitative study. They have done a quantitative study and they’ve used it to inform their judgment. They don’t know a lot of statistics, but that’s fine: statistics is not their job, they’re doing their best here.

The problem is what these researchers are thinking that statistics, and quantitative methods, can do for them in this setting. Realistically, doing things like improving the lighting in classrooms will have small effects. That doesn’t mean you shouldn’t do it, it just means that it’s unrealistic to expect large consistent effects, and it’s a mistake to estimate effects from observational correlations. (And effect size, not just direction of effect, can make a difference when it comes to setting priorities and allocating scarce resources.)

1. AllanC says:

I would be really skeptical about how they quantify learning. I briefly skimmed the report and they mention sub-levels but I don’t have the faintest clue what that actually means. Apparently, the physical environment accounts for 1.3 sub-levels improvement of the 2 sub-levels an average student is expected to complete/progress. I’d imagine the only sensible way to define an expected increase in “learning” would be to have a standardized test and somehow set an expectation about how kids should perform at the end versus the start of the school year; or use the average difference as a benchmark. But then suggesting classroom differences account for a 67% level of change relative to the benchmark would require one very complex model with a bunch of assumptions about a nearly infinite number of confounders; many of which would plainly be expected to dwarf a classroom effect.

They mention a model and holding factors constant, which I guess is okay (but very likely means over interpreting regression coefficients in the usual way), but if I’m getting a 1.3-unit marginal contribution of an expected 2 units entirely from a single factor…which, apriori I’d expected to be limited…I’d check the model…Or the data. Probably both.

Also their quick discussion of air quality really makes me not trust anything in this report. How can you talk about air quality and recommendations for designers when you don’t even mention the relevant ASHRAE standards (or whatever equivalent they use) and compare the relevant rates, by I don’t know, testing something? I’m sure they have engineers with access to ASHRAE Standard 62.1 across the pond….

They also say “As far as we are aware, this is the first time that clear evidence of the effect on users of the overall design of the physical learning space has been isolated in real life situations.” I am not an expert in this area, but there is an entire field dedicated to Environmental Psychology. I would be shocked if someone had not done an observational study and done some sort of regression on design characteristics; even if there was no such study on schools specifically, it would not be a hard go to relate research on the psychological effects of building design in general to schools. At the very least, some comparison to the relevant literature should have been made and the direction/magnitude of impacts discussed on this basis. The book Environmental Psychology for Design by Dak Kopec might have been a good start for these researchers….

I am far less charitable towards these researchers then Andrew. For some reason they didn’t think it would be important to discuss any relevant design standards before issuing recommendations on design. Add that to the obvious statistical issues….To borrow how Andrew thinks about their graphs….I think the author’s missed an opportunity to put funds to good use with this report/study.

• AllanC says:

On second reading that was far too harsh a comment. I was simply rather struck on my limited reading that they were offering advice on design without mentioning relevant design standards or discussions with HVAC engineers.

They do provide some relevant literature on the individual factors they looked at. Which is a positive.

This study probably tells us something. It just doesn’t tell us what they claim it does.

2. Lighting could be a huge effect. It’s hard to read in a pitch black room. And air quality could be huge, really hard to learn if plumes of smoke are wafting through the room… But how many classrooms are pitch black smoke filled rooms?

Another opportunity lost to do a good job of measurement and scaling. If they just got the measurement and scaling right it would’ve been a big step forward.

3. Jonathan says:

When you’re looking for externalities that can leverage internal processes, you rarely find a large effect. I was talking about this with my kid who has worked in cancer/smoking research: that’s the rare occasion when you find a ‘holy cow that’s really bad for you / they’re selling death’. Drunk driving is another. Even toxic conditions like poison in the ground water tend to cause a few extra cases – and then we argue about the natural versus unnatural occurence of clusters – so causation is often questionable and estimated. I mention cancer because we look for externalities that cause internalities. We actually have better luck in general finding large effects from specific internalities, notably genetic mutations that have rates of illness and/or fatality.

I was also discussing classrooms with my wife, who is a teacher. She has a friend whose classroom has no windows because the school is so over-crowded (and we aren’t well run enough to find a way to build more space). A basement classroom. Except this is one of the best school districts in the US, so one could look at the kids who go through that school, compare them to kids in better lit classrooms in Boston, and conclude they should board up the windows. In other words, the externality has little relation to the internality when it comes to education because the pressure to learn comes from family and peers. A famous example of this is Feynman’s story about going to Princeton where they had a lovely accelerator from MIT, where the accelerator was wonky and had to be fixed all the time, and his concluding that MIT produced better results because they had to tinker with the machines so much. Another vote for worse is better! And in my experience in private and Ivy: in my day the classrooms at Yale were freezing in the winter and much of the campus was run down, but the education was excellent because it was really bright kids who were motivated to learn even in WLH when the heat would come on full blast so those near the windows would pass out while the ones on the other side of the room were still shivering in their coats.

People have this weird idea that ‘controlling’ for a variable means they actually isolate the effects of larger context, meaning all the potentially affecting externalities, and the relationship of those externalities to the internal contexts of the people studied.

4. Zad Chow says:

Not sure what they were thinking with the graphs here. The lack of contrast makes it highly difficult to even inspect the data points

5. Mat says:

I read something the other day about many children learning in classrooms where the lighting flickers thousands of time per minute, and that this poorly designed lighting can have serious effects on concentration and even cause headaches. So, it is possible that lighting could have a large effect.

• AllanC says:

The thing that really bothers me about this report is that they don’t appear to connect any of their measurements of building design features to the relevant building design standards.

There are local standards that govern lighting. It is typically specified as number of lumens per a given area. Depending on their jurisdiction there is also likely a specification about consistency.

There are standards that govern air quality. ASHRAE Standard 62.1 specifies an indoor air quality procedure in terms of number of contaminants. The last I checked there were 10 contaminants for which there are limits. From my brief review of the report they seemed to only record CO2 levels in “unventilated” rooms. Which is an utterly ridiculous assertion. There is no such thing as a room with no ventilation. Every building envelope in the world leaks. Even if there is no supplied air there are air interchanges. Are the doors hermetically sealed? No. There’s air exchange between the hallway and the classroom. Question is in what direction, which will depend on the pressures in the room relative to the hallway and outside. These are existing schools and unlikely to have been built to passive building standards, so they probably leak like a sieve across the building envelope. Not all buildings leak at the same rate….we can test for these things. It’s important to know when discussing air quality.

The same is true of all other building design elements. There are local codes that govern what is deemed acceptable.

The question shouldn’t be if there is light how much better do students perform…because without light, there is clearly going to be poor performance. I don’t need a study to tell me that. The same of indoor air quality. The same with sound transmission. Hard to concentrate with a band playing next to you all day.

What they needed to do was look at the local building design standards. See how the classrooms perform relative to those standards by testing the relevant criteria. If there is an effect of different designs once they meet the minimum standards then maybe we can talk about something important. Until then the estimates are nearly useless.

Or it maybe the existing design standards are over specified for a learning environment. That’s cool too. But we need to actually relate the building designs that were captured by the study to the relevant design standards to actually learn something. They didn’t appear to do that.

6. anon_007 says:

Not sure why you say colors are better than symbols? I got curious… I dont find symbols confusing, but I have a really hard time seeing colors. Maps are usually an uninterpretable mess for me. So I try to incentive people to use more symbols, patterns and shades of gray.