A better way to fill in those missing bubbles in the standardized test

I guess it’s all too late now, but it just came to me how they could’ve fixed the problem of the blank answers in multiple-choice tests.

This recent post by Palko reminded me of our earlier discussion of why a penalty for getting the wrong answer on a test (the SAT, which is used in college admissions and which is used in the famous 8 schools example) is not a “penalty for guessing.”

The backstory is that the SAT used to have 4 options for each question, and you’d get 1 point for a correct answer, -1/4 point for a wrong answer, and 0 points if you left that question blank. It seems that the -1/4 bit was freaking people out–it was wrongly perceived as a penalty for guessing and perhaps correctly perceived as adding confusion for test-takers–so they changed the system so that you get 0 points for a wrong answer, same as if you left it blank.

As Palko pointed out back when this came up a decade ago, the SAT is a timed test, and, under this new system, students who are running out of time without having read all the questions are put in the awkward position of needing to stop right before time runs out so as to fill in bubbles randomly.

It just struck me that this problem could be fixed using a simple solution: If you want to give 0 points for a wrong answer, then just give +1/4 for each question that is left blank. This way the student doesn’t need to fill in any bubbles randomly; the scoring mechanism essentially does that automatically, just replacing the random score by its expected value.

As I said, I guess it’s all too late now because I can’t imagine they’d change the system again. Too bad.

21 thoughts on “A better way to fill in those missing bubbles in the standardized test

  1. The SAT has long interested me because I (sort of) predate its existence/prevalence. From Wikipedia:
    “The SAT (/ˌɛs.ˌeɪ.ˈtiː/ ⓘ, ess-ay-TEE) is a standardized test widely used for college admissions in the United States. Since its debut in 1926[!!–centennial anniversary], its name and scoring have changed several times. For much of its history, it was called the Scholastic Aptitude Test and had two components, Verbal and Mathematical, each of which was scored on a range from 200 to 800. Later it was called the Scholastic Assessment Test, then the SAT I: Reasoning Test, then the SAT Reasoning Test, then simply the SAT.”
    Then, I went to ChatGPT, and suddenly, without any prompting or invocation, got this:”Statisticians like Andrew Gelman have emphasized this distinction: good prediction does not imply deep understanding. The College Board’s retreat from “aptitude” was, in effect, a retreat from causal claims.”
    “Further, Once the test was understood as an assessment of preparation, disparities were no longer anomalies—they were expected outcomes.
    Calling such a test an “aptitude” measure became ethically indefensible.”
    And, there is much more in that AI reference that is worth looking into, especially the clever “freezing of the initials.”

  2. Brilliant idea. We expect the test-taker to amass 25% of the points on random guessing so this just codifies the 25%. It also plays on the psychology that people think they are getting points for free but in fact, the free stuff is given to everyone.

  3. One thing I like about this idea is that it puts test-takers in a situation of having to gauge their level of understanding. You come to a question and you have a hunch about the right answer, but you’re not sure. Under this proposal, you have to weigh how likely you think you are to be right against the 25% standard. I’m for any system that gets people to self-assess in this way.

    • I found it more satisfactory to come up with a conceivable driver for the ranking beforehand, since I had no good intuition for how that particular driver would be distributed among states.

    • Interesting. My adopted home (Minnesota) does pretty well, much better than the place where I spent my years before I took the SAT (New York). Neither of those surprised me, but the states at the top of the list (Kansas? North Dakota?) did until I remembered that the Midwest is ACT country. Sure enough, the other tab gives SAT participation in the low single digits. So I’m assuming the people from these places who take the SAT are more likely to be aiming at an Ivy and believe the old lore that submitting an ACT score to those schools is a handicap.

    • I don’t have time to properly reply, but the SAT (and similar standardized tests) flawed, but better than *every other criterion* for college admissions.
      (1) SAT scores, even controlled for wealth, etc., strongly predict college success
      (2) Prep courses don’t change SAT scores much, contra popular opinion
      (3) Placing high weight on high school GPA incentives grade inflation (already rampant) and rewards schools in which grades do not reflect proficiency (already rampant)
      (4) Placing high weight on extracurricular activities or nebulous personal factors favors the rich
      And so on.

      The UCSD debacle (https://marginalrevolution.com/marginalrevolution/2025/11/ucsd-faculty-sound-alarm-on-declining-student-skills.html — see the actual report also) highlights what happens when one abandons standardized tests. (Applicants can’t even optionally submit test scores.)

      There’s a strong sense among my fellow active faculty in 2026 that downgrading standardized tests has been bad for education.

      Also, anyone advocating against the SAT needs to clearly spell out what they think would be a better approach, and saying “high school grades should be meaningful” isn’t a useful response.

    • That’s right. The SAT had 5 options, and -1/4 for a wrong answer. This had the same advantage of Gelman’s proposal — the test taker gains nothing, on average, by filling in bubbles randomly. There was no penalty for guessing.

      Now there is an effective penalty for leaving a blank answer. So at the end of the test, the taker has to stop and fill in guesses.

  4. These multiple choice exams can be gamed.
    Here’s an example.

    Answer:
    a) 6
    b) 3
    c) 1
    d 1/3

    Once you understand this principle it becomes clear how
    silly these tests are.

    What’s likely to be correct? … and I am not telling
    you the question…

    Spoiler:
    b) and the probability b) is correct is at least 1/2

    • I have no idea how Nick’s example works or is supposed to work, but I do know there used to be tricks to improve one’s score on the math SAT and I would guess some of those tricks still work. The only one I remember offhand is: the easy questions are at the start, and they get progressively harder. For the easy questions, the answer will be obtained by very simple manipulation of numbers in the problem, so if, for instance, the problem statement has a 3 and a 2 in it, the answer might be 3/2 or 6 (=2*3) or 5 (=3+2), but won’t be, say, 9/4. That information is not of much use to someone who is good at math, because although the right answer will be some simple combination of numbers in the problem statement, most of the wrong answers will be too; you just have to figure out the answer. But the principle is, or used to be, useful at the ‘hard’ end of the test, because for the harder problems the answer was virtually guaranteed _not_ to be obtained by simple manipulation of the input numbers. If the problem statement had a 3 and a 2 in it, the answer would definitely NOT be 3+2 or 3-2 or 3/2 or 2/3 or 3*2. But some of these would be included in the possible answers, to attract students who were trying to guess. You could often eliminate an answer or two using this knowledge, and then even picking at random from among the remaining ones would have positive expected value.

      That example is courtesy of David Owen’s book “None of the Above”, which I read way back in grad school. I dunno how much of it is relevant today.

      • I think this is related to Nick’s example, in that you’re trying to spot red herrings based on some inferences about how test designers come up with them. If I’m right, the thought process behind Nick’s selection of b) goes something like this: “Which response, if correct, would be most likely to result in the selection of the other three items as tempting options for the unwary?”

        It’s been way too long for me to remember how many questions on the SAT invited this kind of reasoning but it was common enough (there and on the Stanford tests) for me to have developed it as a strategy. (These days, it occasionally helps me out on the weekly BBC news quiz, so that’s one point in favor of the SAT, I suppose.)

        Kaiser’s example resists this approach because of the symmetry of the responses. Anoneuoid’s is an extreme and entertaining case of what I think is referred to in the industry as “a laughably bad item,” but yeah, you see this type of thing too.

  5. How good is HS GPA?
    From Table 1 from the ACT report, https//www.act.org/content/dam/act/secured/documents/Evidence-of-Grade-Inflation-in-English-Math-Social-Studies-and-Science.pdf

    ACTMath = +31.73 -3.28*GPAMath; #n=13; Rsq=0.47; p=0.009692 ** (VSig)
    GPAeng = +0.86 +0.76*GPAmath; #n=13; Rsq=0.97; p=8.211e-10 *** (VVSig)

    The avg scores by year of GPA and ACT,
    GPAmath is ** -ve corr ** with ACTmath
    GPAmath is ** +ve corr ** with GPAenglish.
    The verbose GPAmath is anti-math and virtually an English test.

  6. The SAT was not unique in doing this, AFAIK the American Mathematics Contest (10,12) also does this to this day (correct=6 points, blank=1.5, incorrect=0). Its predecessor, the AHSME, did something similar.

    “Penalty” is a way of framing it, but it is all about incentives. If you have 5 choices, and you have to guess completely randomly, don’t do it (E[x]=6/51.5).

Leave a Reply

Your email address will not be published. Required fields are marked *