Quarterbacks and psychometrics

Eric Loken writes,

Criteria Corp is a company doing employee testing (basically psychometrics meets on-demand assessment). We’re also going to blog on various issues relating to psychometrics and analyses of testing data. We’re starting slowly on the blog front, but a few days ago we did one on employment tests for the NFL.. A few scholars have argued that the NFL’s use of the Wonderlic (a cognitive aptitude measure) is silly as it shows no connection to performance. But we showed that for quarterbacks, once you condition on some minimal amount of play, the correlation between aptitude and performance was as high as r = .5…which is quite strong. It’s the common case of regression gone bad when people don’t recognize that the predictor has a complex relationship to the outcome. There are many reasons why a quarterback doesn’t play much; so at the low end of the outcome, the prediction is poor and the variance widely dispersed. But there are fewer reasons for success, and if the predictor is one of them, then it will show a better association at the high end.

Here’s their blog, and here’s Eric’s football graph:


P.S. The graph would look better with the following simple fixes:
1. Have the y-axis start at 0. “-2000.00” passing yards is just silly.
2. Label the y-axis 0, 5000, 10000. “10000.00” is just silly. Who counts hundredths of yards?
3. Label the x-axis at 10, 20, 30, 40. Again, who cares about “10.00”?
I’ve complained about R defaults, but the defaults on whatever program created the above plot are much worse! (I do like the color scheme, though. Setting the graph off in gray is a nice touch.)

6 thoughts on “Quarterbacks and psychometrics

  1. Conditioning is arbitrary. If we move the bottom cutoff up to 2200, and eliminate one point in the upper right (Tom Brady?) then we have an amorphous cloud with a correlation of maybe .15.

    Plus, we've already picked just one position (quarterback) out of many roles on a football team.

    More statistical ink blot testing.

  2. To green apron monkey: pretty much any test that measures any human characteristic well will show different quantitative & qualitative results for different racial groups. (though there is often considerable overlap.) Different racial groups test differently because they're, well, different.

  3. "Ink blot testing" is a tad dismissive. The high point in the corner is not Tom Brady, it is Eli Manning. He's the Superbowl winning quarterback Malcolm Gladwell mocks as doing poorly relative to his Wonderlic score. Interesting that one pundit singles Manning out as an example of how useless the test is, while another person claims that Manning is the only thing driving the effect.

    Yes the sample is small and yes the conditioning is somewhat arbitrary. But set the line at 1800, and cut out Manning (why?) and the correlation is still .5. Or, condition on 5+ starts in the first 4 years, instead of something from this graph, and r = .5.

    The main point is we seem to have a heterogeneous regression, where there are many reasons for low (both poor and non-) performance. More data would be great to have. We didn't use any pre-2000 because the Wonderlic scores seemed incomplete and sometimes inconsistent (more than one recorded score). The NFL would be able to fill in more data.

    Given the graph and what I've been able to see working through the data, I'd lay odds that the phenomenon remains robust as new data roll in. Now what exactly it means is another thing entirely.

  4. Thanks. Very interesting.

    Dan Marino's low score would strike me as support for the value of the Wonderlic. He comes into the NFL with the best release ever seen. His second season he puts up the best stats in history. And then … kind of a long slow fade as the league figures him out but he doesn't figure out something new.

Comments are closed.