Obama effect on the black-white test gap?

Posted on February 3, 2009 9:19 AM by Andrew

Eric Loken writes:

Last week the New York Times published an article on a possible Obama effect on test scores of black test takers. . . . The authors claim that they gave a short academic aptitude type test to black and white test-takers. When they administered the test last summer, they noted a difference between average scores for blacks and whites. However, after (now) President Obama had received his party’s nomination and given his acceptance speech, the difference in scores disappeared. The theory is that Obama’s rise has had a positive motivating influence on test taking performance.

Eric then gives some background:

The story has legs because there is a well-documented body of research on test performance, and how it can be affected by contextual cues. You can start with the cultural beliefs about aptitude tests in general. If there is a belief among one target group that the tests always show underperformance, then that belief can have a self-fulfilling aspect. Researchers have experimentally manipulated that contextual clue by describing tests differently to participants before they take them. Researchers have also manipulated the race and gender of the test administrator and done a variety of clever tricks to see to what extent performance can be affected by context. One enterprising team actually had women of Asian heritage take a math test, randomly dividing them into one group who answered a questionnaire designed to get them to think of their female identity, while the other group answered questions about their Asian identify. Guess what? One group underperformed relative to the other, and because the study was conducted as a randomized experiment, the authors are allowed to infer that their contextual manipulation caused the differences in performance.

But then he raises some questions:

That said, there are a couple of warning flags about the study. First, it is unclear from the Times piece whether there was any reference at all to Obama before the participants took the test. If not, then the story must be that if there was a difference in performance over time it was because Obama was “in the air”. That’s true enough – he certainly was in the air. The country was electrified. But most studies on test taking performance try to make the contextual cue more closely connected to the test taking event. Lots of things happened from last summer to now…millions of jobs were lost, the stock market tanked, Tom Brady was injured, and the seasons changed.

But the more worrisome concern is the quality of the data. Based on the Time article, it seems like there were four tests, and at each occasion there were maybe 20 black participants. Furthermore, the age range of the participants was around 50 years. I don’t want to make your eyes swirl with statistical mumbo jumbo…but let me throw out these two points. The degree of sampling variability from occasion to occasion would be huge. Would you trust the results of an opinion poll that gathered a group of 20 participants? So why trust the results of a test taken by 20 people? It’s all the more problematic that the researchers are trying to prove a lack of difference. With such a small sample size, and such wide variability of participants in age and occupation, it becomes very difficult to prove that a difference exists. But as I have to remind my PhD students everyday – failing to prove that there is a difference is not the same thing as proving that there is no difference. Their eyes swirl at me too.

Come to think of it, it makes you wonder why everyone is looking at the data in this particular way. The story is that on the one testing occasion before Obama’s meteoric rise, there was a black white difference, and then it disappeared over the next three testing occasions. The implicit reasoning is that something has happened. But why privilege the summer result so much? Why not ask “What was happening last summer that made a black-white difference show up?” Why assume that that result is somehow “true” and that it has recently “disappeared”?

And, looking forward:

At any rate, more data is already in hand. There have been several administrations of the SAT during the election run, and even one since President Obama’s inauguration. Let’s take a look at the national trend based on millions of scores. I’d be very happy if there is something to write about. I personally expect that there will be something to write about over time, but I also believe that the evidence is going to take some time to develop. Let’s hope the New York Times is still paying attention then, and not just trying to front-run another study that has barely been mailed out for review.

3 thoughts on “Obama effect on the black-white test gap?”

Jeremy Miles on February 3, 2009 8:41 AM at 8:41 am said:

Reminds me of a paper 'The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant'. Now if only I could remember who wrote it …

http://www.stat.columbia.edu/~gelman/research/pub…
A. Zarkov on February 3, 2009 12:29 PM at 12:29 pm said:

If their goal was to investigate for some kind of differential Obama "treatment effect," then it was poorly designed. Why did they have such an imbalance in the number of blacks and whites? How were the samples selected? With these small sample sizes, the fluctuations are very high. I think this study is virtually worthless.

I would also like to know the g-loading on the test they used. We have about 8 decades of data on the black-white difference in intelligence testing scores. With low g-loading the difference between the groups virtually disappears. But tests with low g-loading have little power to predict academic achievement. Thus far no one has come up with a test that has both predictive power and shows no racial difference.

I suspect this result will not replicate, but that fact will get little attention in the press.
M on February 10, 2009 6:21 PM at 6:21 pm said:

Mr. Loken should be careful what he reads into national trends in SAT results, too. If a black president inspired more black students to apply to college–a "positive motivating influence"–average scores for black students would probably DROP (because the best black students would have taken the test anyway). The testing population is self-selected and may vary widely from test to test: Fall testers are mostly seniors, spring testers mostly juniors; tightening family budgets may compel more students to apply to their state schools rather than more prestigious and distant schools, which may make them choose to take the ACT instead of the SAT; the same pressures may prevent some students from paying to retake the test for a higher score. He might be better off looking at the results of this Spring's mandatory state administrations of the ACT. For the last few years, that test has been given to all public school juniors in Illinois, Colorado, and Michigan (and this will be the second year for Kentucky and Wyoming). That would offer a better sample, but would still not be entirely free of similar pitfalls: an increase in the dropout rate would improve scores.

Comments are closed.