How schools that obsess about standardized tests ruin them as measures of success

Posted on August 16, 2016 4:22 PM by Andrew

Mark Palko and I wrote this article comparing the Success Academy chain of charter schools to Soviet-era factories:

According to the tests that New York uses to evaluate schools, Success Academies ranks at the top of the state — the top 0.3 percent in math and the top 1.5 percent in English, according to the founder of the Success Academies, Eva Moskowitz. That rivals or exceeds the performance of public schools in districts where homes sell for millions of dollars.

But it took three years before any Success Academy students were accepted into New York City’s elite high school network — and not for lack of trying. After two years of zero-percent acceptance rates, the figure rose to 11 percent this year, still considerably short of the 19 percent citywide average.

News coverage of those figures emphasized that that acceptance rate was still higher than the average for students of color (the population Success Academy mostly serves). But from a statistical standpoint, we would expect extremely high scores on the state exam to go along with extremely high scores on the high school application exams. It’s not clear why race should be a factor when interpreting one and not the other.

The explanation for the discrepancy would appear to be that in high school admissions, everybody is trying hard, so the motivational tricks and obsessive focus on tests at Success Academy schools has less of an effect. Routine standardized tests are, by contrast, high stakes for schools but low stakes for students. Unless prodded by teachers and anxious administrators, the typical student may be indifferent about his or her performance. . . .

We summarize:

In general, competition is good, as are market forces and data-based incentives, but they aren’t magic. They require careful thought and oversight to prevent gaming and what statisticians call model decay. . . .

What went wrong with Success Academy is, paradoxically, what also seems to have gone right. Success Academy schools have excelled at selecting out students who will perform poorly on state tests and then preparing their remaining students to test well. But their students do not do so well on tests that matter to the students themselves.

Like those Soviet factories, Success Academy and other charter schools have been under pressure to perform on a particular measure, and are reminding us once again what Donald Campbell told us 40 years ago: Tampering with the speedometer won’t make the car go faster.

91 thoughts on “How schools that obsess about standardized tests ruin them as measures of success”

Corey on August 16, 2016 4:35 PM at 4:35 pm said:

I think Charles Goodhart has priority — he said it 41 years ago.

Reply ↓
- Andrew on August 16, 2016 4:44 PM at 4:44 pm said:
  
  Corey:
  
  According to your link, Goodhart and Campbell said it at about the same time.
  
  Reply ↓
  - Corey on August 16, 2016 5:01 PM at 5:01 pm said:
    
    Did you ever notice how men always leave the toilet seat up?
    
    Reply ↓
  - David Manheim on August 16, 2016 7:41 PM at 7:41 pm said:
    
    Actually, the priority probably goes to Lucas. From an early draft of my recent piece on Goodhart’s Law, (ribbonfarm.com/2016/06/09/goodharts-law-and-why-measurement-is-hard/) which discusses the points in this post extensively, especially as regards education, I looked into the provenance of the claims;
    
    Goodhart originally formulated his law to apply to the use of aggregate macroeconomic relationships; “any observed statistical reguÌarity will tend to collapse once pressure is placed upon it for control purposes.” As noted by K. Alec Chrystal and Paul D. Mizen (http://cyberlibris.typepad.com/blog/files/Goodharts_Law.pdf), “Robert Lucas almost certainly said it first,” in 1973 or earlier, though both really followed in the footsteps of Haavelmo (https://www.jstor.org/stable/1906935), who points out that economists have been missing the boat about how to use statistics in general.
    
    Reply ↓
    - Andrew on August 16, 2016 11:24 PM at 11:24 pm said:
      
      David:
      
      Thanks for the links. The basic idea has got to be very old. Campbell etc. get credit for formulating the rule and applying it to important examples.
Tom on August 16, 2016 5:49 PM at 5:49 pm said:

There appears to be an opaque matching process in the background. Are you sure it is the test results rather than the matching process? The rate of change is informative, 0 to 11 percent, in addition to the level. What changed that all of a sudden increased admissions?

http://citylimits.org/2016/02/08/the-problem-with-nyc-high-school-admissions-its-not-just-the-test/

In addition, is there a composition effect? You didn’t report the admission rates from those multimillion dollar zip codes. Perhaps they do very well on average but the brightest students are spread out over many schools in NYC. A comparison or dispersion measure, along with the state test result, would be more informative than the average admission.

I believe the larger point but the evidence you presented doesn’t seem sufficient.

Reply ↓
Peter on August 16, 2016 7:18 PM at 7:18 pm said:

New NBER Working Paper:
Will S. Dobbie, Roland G. Fryer, Jr: Charter Schools and Labor Market Outcomes. NBER Working Paper No. 22502
August 2016
Abstract:
We estimate the impact of charter schools on early-life labor market outcomes using administrative data from Texas. We find that, at the mean, charter schools have no impact on test scores and a negative impact on earnings. No Excuses charter schools increase test scores and four-year college enrollment, but have a small and statistically insignificant impact on earnings, while other types of charter schools decrease test scores, four-year college enrollment, and earnings. Moving to school-level estimates, we find that charter schools that decrease test scores also tend to decrease earnings, while charter schools that increase test scores have no discernible impact on earnings. In contrast, high school graduation effects are predictive of earnings effects throughout the distribution of school quality. The paper concludes with a speculative discussion of what might explain our set of facts.

Reply ↓
Fernando on August 16, 2016 9:39 PM at 9:39 pm said:

tl;dr

But it seems to me that on the basis of a few anecdotes you are casting a doubt on ALL charter schools. It seems you are going well beyond the evidence.

Some RCTs show charters do well with minorities assigned by lottery.

And what is the alternative? Back to DC schools of the 80s?

Reply ↓
- Andrew on August 16, 2016 10:25 PM at 10:25 pm said:
  
  Fernando:
  
  If “casting a doubt” means doubting various hyped claims, then, yes, we cast a doubt. If “casting a doubt” means saying charter schools are in general worse than public schools, then, no, are not casting a doubt. One of our purposes in writing this article was to move the discussion beyond simplistic “charter schools good” / “charter schools bad” attitudes.
  
  But I guess this is a risk when writing about a charged topic such as charter schools, that people are going to just look at a paragraph or two and think they already know what we’re going to say.
  
  Reply ↓
  - Rahul on August 17, 2016 3:49 AM at 3:49 am said:
    
    Some of your objections sound bizarre: e.g. How can you critique charter schools for their desire to “improve the conditions under which the test is taken”. Is this really so bad?
    
    Is it really a horrible thing for a school to try to make sure that kids are well-rested and motivated to answer all the questions on test day?
    
    Wouldn’t it be more productive if you exhorted *other* schools to try & make sure their students were well-rested & motivated?
    
    Alternatively, if you really feel this “resting” is unfair perhaps lobby the PARCC to require all kids to run 20 laps before they test or something?
    
    Reply ↓
    - Andrew on August 17, 2016 9:37 AM at 9:37 am said:
      
      Rahul:
      
      1. There’s nothing wrong with a school putting in extra effort to get students to come in on test day, and it’s even better, I think, for the school to put that extra effort to get students to come in on days where there is no test, days when they are learning in the classroom.
      
      2. There are several concerns here. One concern is that the administration of these schools is so focused on test performance that they encourage teachers to use tactics to intimidate kids who aren’t scoring well on tests.
      
      3. The other problem is when these schools report success based on their students’ performance on high-stakes, low-stakes tests. As we discussed in our article, this is a misleading measure.
    - Fernando on August 17, 2016 9:53 AM at 9:53 am said:
      
      I think you are making a point about the appropriateness of surrogate outcomes, and managing to those outcomes.
    - Rahul on August 17, 2016 3:22 PM at 3:22 pm said:
      
      Andrew:
      
      Re. your #1: That’s not an apples-to-apples comparison at all. The test is just one day. No-test is all other days.
      
      Just because it is possible for teachers to motivate kids or get them well rested on test-day not at all means that teachers can, with same or similar effort, replicate this on every day.
      
      Basically you are demanding an outcome that needs an order of magnitude more resource. But that’s no reason to criticize teachers for doing at least what they do (on test day). Apparently other school teachers cannot even manage that.
- mpledger on August 19, 2016 11:06 PM at 11:06 pm said:
  
  The lottery RCT don’t compare apples with apples – 1) it compares kids who get into the school they wanted to with kids who didn’t get into the school they wanted too and 2) the kid’s peers in the lottery school are different to the kid’s peers in the non-lottery school especially with regards to parental motivation and agency.
  
  Reply ↓
A.G.McDowell on August 17, 2016 1:03 AM at 1:03 am said:

Are there not some statistical ideas that could be applied to improve the metrics that charter schools are accused of gaming? Things like intent-to-treat analysis or the use of sampling to reduce the administrative costs of collecting more detailed metrics? Perhaps the article could have covered these.

I think it is dangerous to have a system that rewards the formation or maintenance of a magic circle of co-operating individuals claiming at the same time that ever-increasing proportions of GDP should be given to education and that ideas or initiatives about education from outside this magic circle are invalid. If charter schools are not a good way of exposing any present or future magic circle to competition, is there a better one?

Reply ↓
Rahul on August 17, 2016 3:53 AM at 3:53 am said:

In my opinion, whenever you find students gaming the test, the right approach is to write better tests!

Than to criticize the teachers for “teaching to the test”, whatever that means. Barring unfair means, it ought to be entirely legitimate to “teach to the test”.

It’s a strange dichotomy to tell teachers that we are going to judge you & your students on the basis of test scores and yet it is somehow pejorative to “teach to the test”.

If your tests are so easily gamed, change the damn tests!

Reply ↓
- SerPiggy on August 17, 2016 8:39 AM at 8:39 am said:
  
  There is often a twofold purpose to tests:
  -Getting an idea of how much a student knows. What fraction of relevant facts or ideas are they familiar with?
  -How well does what they learn generalize? How well can they see and pick up on patterns, to solve related questions – but of a sort they haven’t necessarily seen before?
  
  Teaching to the test undermines both points. In the former case, teaching to the test narrows down the domain of facts and ideas one learns to closely align with those of the test. This limits the test’s inference to merely that restricted domain. Ie, your estimated proportion (say, p-hat) becomes an estimate of theta — knowledge about the test — instead of the desired p — general knowledge in the field the test is purporting to measure.
  
  On the second point, if you’ve already seen problems like those you’re given, the ability to generalize and solve novel ones is no longer tested at all.
  
  However, I confess that I do not have any actual pedagogical experience, so take my words salted.
  
  Reply ↓
- HT on August 17, 2016 9:35 AM at 9:35 am said:
  
  I know there is a lot of anger in the country, but all the angry criticisms of Andrew’s piece are missing the point. What he’s saying is:
  1. Something known as standardized tests exist
  2. Students in the Success Academies achieve high scores in these tests
  3. Their high scores did not translate too well to latter success, specifically admissions into the “elite” high school network
  
  So yes, one interpretation is that these tests are not good enough, not because they are easy to game, but because they don’t predict latter success, which is exactly what Andrew is saying.
  
  The other way of looking at it is the Success Academies are focusing too much on the near term (tests) and not enough on the longer term (high school admissions). Arguably, one can extend this ad infinitum to Ivy league admissions, lifetime income, social status, etc. but at the very least Andrew seems to be on the right track.
  
  And the issue of fairness wasn’t mentioned, or IMHO even implied, in Andrew’s piece. Being temporarily well-rested and motivated does not increase academic abilities but can improve students’ test performance, so yes more bias is introduced — against the intended aims of the test. And Prof. Gelman, the statistician, is showing animosity towards bias. Duh, what do you expect?
  
  Instead of defending these practices on grounds of fairness, you can try framing it as positive lessons for the students. For example, some challenges in life, like the Olympics or the presidential elections, are like tests, so students should appreciate that these last-minute preparations also matter, sometimes as much as your daily sweat.
  
  Personally, I agree with much of what Success Academies are preaching, but that doesn’t mean that Andrew is wrong. It just means they can still do better.
  
  Reply ↓
  - Rahul on August 17, 2016 10:54 AM at 10:54 am said:
    
    That makes sense. Even if the tests don’t predict latter success, I think criticizing the tests is a better target for out indignation than criticizing the Charter Schools (which seemed the main focus of the article) because their students do well on these tests.
    
    Indeed, you (or Andrew) may be right and the Academies may be producing a unique cohort that does specifically well on the tests but bad in long-term goals. But I just don’t any evidence of this either way. ( Personally, I think the baseline of many students is so low that even producing a breed of students that does well merely on tests is still a step upwards! )
    
    Personally, I’m agnostic to the Success Academies, but I just didn’t like the argument being made in the article entirely.
    
    Reply ↓
    - Andrew on August 17, 2016 11:13 AM at 11:13 am said:
      
      Rahul:
      
      If all that happened was that Success Academy kids did well on these tests, we could say that it’s no big deal: they’re doing well on a poor measure, we can ignore the measure and look at something else. But it’s not just that. Success Academy got in the news because of they seem to have a policy of intimidating kids: this is a scenario in which they’re responding to incentives in a bad way. Thus, there is a problem with the incentives and also a problem with the schools that are encouraging bad behavior by teachers.
    - Rahul on August 17, 2016 3:08 PM at 3:08 pm said:
      
      Andrew:
      
      I disagree. Teachers behave badly for all sorts of reasons. Even if you changed the tests, or even eliminated standardized testing all-together, a driven teacher in a competitive school might still align herself to (say) the yardstick of Ivy League college admissions or SAT scores or whatever.
      
      The key ingredient here is competitive drive. So long as *some* goal exists there’s always the risk of driven individuals overstepping their bounds. The solution is oversight, checks & balances not the elimination of standardized tests.
      
      In fact the gaming of standardized tests seem a red herring in the intimidation story: Say you had the perfect test that aligned very well with long term success & was immune to gaming. Would that then legitimize intimidation, this time to indeed achieve lasting long term gains?!
      
      Of course it’s horrible to intimidate kids. You should indeed call out Success Academy for that.
      
      But standardized tests are not the key cause here. That’s like banning athletics because some coaches might have training regimens too harsh or inhumane.
    - parent010203 on August 17, 2016 4:31 PM at 4:31 pm said:
      
      If you establish a charter school whose funding and donations and the political profile of their CEO depends on having 90%+ passing rates on tests AND
      
      If you reward the teachers and administrators who are most successful at developing “got to go” lists (but knowing better than to put them in an e-mail), not sending home renewal forms with low-scoring kids (as one celebrated and still working principal does) and who have the highest PERCENTAGE of high scoring students (without regard to how many of them left) AND
      
      If you insist you must be allowed to give extraordinarily high numbers of Kindergarten and first graders out of school suspensions – as many as necessary – because they are violent and your hand is forced in order to “protect” the other kids from their violence AND
      
      Your extremely well-funded charter school has an attrition rate that seems to be 3 or 4 times HIGHER than the far lower-performing charters who serve even more disadvantaged students
      
      Then the teachers are not “behaving badly” as can be found in any random school, but instead they are behaving exactly as the incentives of the charter school force them to behave. They are behaving in an “excellent” way in terms of keeping their jobs and being promoted to better paying ones. The ones who “behave badly” from the charter school’s POV are the ones who speak out about these things and are gone. That’s why the Success Academy teacher videotaped the way a “model” teacher – whose tactics were rewarded by the school – treated a low-performing low-income child in a school where most of the students were middle class. The teacher with the camera had complained about this teacher’s tactics to administrators and the response was that this teacher was the model teacher and her own job was in danger for thinking what the model teacher was doing wasn’t proper.
      
      Mr. Gelman, I hope you and Mr. Palko will take a close look at all of the students who win the lottery for Kindergarten spots and find out what happens to them for the next 5 or 6 years (through 5th grade). (Forget using the Oct. – June attrition rates of an entire school that the pro-charter researchers are careful to limit themselves to examining.) Are many at-risk kids disappearing? Are a high number being held back year after year? Are some new students coming in who must pass a test before being allowed to enter their rightful grade? (One Success Academy parent told the public at a news conference exactly how that worked for his daughter so presumably this is Standard Operating Procedure.) Or are they not replaced at all? Look closely at the NYSED data website at a school like Success Academy Bed Stuy 1, where 54 of 56 5th graders (a whopping 96%!) passed the state ELA exam. But at the start of 3rd grade, that cohort had 93 students, so nearly 40% of them were gone 2 years later. And what is worse, that 3rd grade cohort had 57 low-income students, and that number had shrunk to 29 by 5th grade. To define “success” by a system in which almost half of the low-income students who begin third grade are missing from the 5th grade class is a very dangerous and misleading “success”. Dangerous because if that is “success”, why shouldn’t every charter school use the same methods without regard to the many students being left behind — the very students that charters were supposed to be helping.
      
      Standardized testing has its uses, but these days it is being mis-used by people who don’t really care about teaching kids but do care about promoting charters. Passing rates are nonsense if charter schools are given carte blanche to rid themselves of low-performers and return them to the public schools for testing. The fact that not all charters take advantage of this doesn’t make it okay. Because the ones who do take advantage of this are being held up as the models for all, in a warped view of what is “possible” despite all evidence that it is only possible if you treat far too many kids like bad apples who must be discarded in order to offer the “best” product to your customers.
- Ewout ter Haar on August 17, 2016 10:45 AM at 10:45 am said:
  
  What you are advocating for is called “formative assessment”. The problem is that formative assessment is hard to do with standardized tests, which are optimized for other assessment goals (classification, for example).
  
  Reply ↓
- Seth M. Spain on August 17, 2016 12:28 PM at 12:28 pm said:
  
  I once had a thing or two to say about this.
  
  https://sethspain.wordpress.com/2012/04/26/teaching-testing-student-achievement-and-teacher-evaluation/
  
  Reply ↓
  - Curious on August 17, 2016 1:19 PM at 1:19 pm said:
    
    Seth:
    
    How would one separate the political decision to advocate for a particular test construction and analytic methodology which provides support for their political position and against a particular test construction and analytic methodology which provides support against their political position?
    
    Reply ↓
    - Seth M. Spain on August 17, 2016 2:00 PM at 2:00 pm said:
      
      That’s a tough question, if I understand you correctly. The way I see it, everything comes down to specifying the learning objectives, and there’s no one right way to do that. It could even be an explicitly political process (e.g., key constituents vote on them or engage in some sort of parliamentary process). Democratic processes can be very useful for this sort of epistemic work.
      
      On the other hand, one could try to be as empirical as possible. This could be accomplished by blinding the process, perhaps? That is, the commissioning body turns over the test construction to an independent body that is answerable to some third party (acting as independent evaluators).
      
      Validation almost has to be tied to some “independent” metric, like the high school admissions in Andrew’s example. If test scores don’t predict that well, it’s a sign that something has gone wrong (which could be that the knowledge/skills/abilities the test assesses are not very related to the outcome, that there is some supervening process, like political gamesmanship, disrupting the expected relationship, unexpected third variables distorting the observed relationship, and so on.
      
      Essentially, I think what I’m suggesting more-or-less equates to (or at least analogizes with) “pre-registering” the methods used to design, construct, and evaluate the test!
      
      And again, it’s essentially impossible to completely avoid people cheating or “gaming” the test itself.
    - Curious on August 17, 2016 4:48 PM at 4:48 pm said:
      
      I think you point out an ideal method, but also one that is idealized by people in the field to a point of be unreasonably optimistic about the art of assessment that results in the reification of multidimensional phenomena as if they are assessing something unidimensional.
      
      1. “We will let the data speak for itself”
      2. “It is an empirical question”
      3. “The only thing that matters is whether it correlates with performance”
      
      The problem is that these statements are filled with assumptions as you rightly point out in your post. And the largest assumption, one you did not mention but of which I am sure you are aware, is that of the criterion problem.
      
      All of the contributions to measurement error you mentioned about the test also exist in the criterion space. Where they are confounded, for example if there is politization that biases the design and politization that biases the criterion, then the estimated strength of the relationship will be spurious.
      
      Simply because there are groups of people who have high skills & abilities across domains and there are people with low skills and abilities across those same domains does not mean that the domains magically become the same thing and a single domain, though that is how factor models are often interpreted in the real world by people with the capacity to know better, but a bias not to.
      
      I contend that politics and subjectivity play far greater roles in these processes than is typically acknowledged. And that methods typically used to attempt to disentangle these are biased toward confirmation of a process that is inadequate to answer the question.
    - Seth M. Spain on August 18, 2016 9:18 AM at 9:18 am said:
      
      Interestingly, my dissertation was largely about the criterion problem. What you mention is true.
      
      An exercise I always used to work through with my IO undergrads was to build a model for predicting “success in college”, but they had to define what they meant by success in college. Most said GPA, a few substituted or added something along the lines of “getting a good job”, some included things like “making lifelong friends” or “finding a spouse”. The theoretical models for predicting success are very different depending on how you define the outcome!
      
      For what it’s worth, I come from something like Humphrey’s “pragmatic behaviorist” camp (at least, as an applied psychologist, though not necessarily as a more theoretical or basic psychologist). Or to use a term from Jonas & Markon (in the January 2016 issue of Psychological Review, to which my colleagues and I responded in the same issue), a “descriptivist” approach; my take on this is that a well-fitting factor model is not evidence that a “true” (ontologically *real*) latent trait underlies the measurements, but that those measurements are *summarized* pretty well by the latent model (from this standpoint, a simply sum score or average over items would work fine, too). Then, if the aggregated score predicts an outcome of interest well, this means that you have a descriptively adequate model, and it can probably be used for predictions (e.g., personnel selection activities). This use, of course, implies *nothing* about the “true” structure of the world and doesn’t even necessarily map on to a theoretical model.
      
      I definitely agree that the entire process is *necessarily* subjective, and indeed, at least partially driven by value judgments and ethical considerations (what outcomes *should* we track? etc.), and that there is often a strong “political” component to determining those considerations. That’s not even necessarily bad! But, I think you’re right that these points about subjectivity and politicking are usually implicit and buried under the technical wizardry of test construction and validation studies. If we acknowledged them more directly, we could probably have a more open process that *might* meet the goals of these programs better.
Jay on August 17, 2016 4:23 AM at 4:23 am said:

This appears to be the same situation with the UK NHS, they continue to try to find different figures to measure the system and drive the clinicians to act in a way that suits the perspective of the month. Patients are people – each one is an on-going medical experiment and therefore these ideas simply don’t work.
Just as students are going through a change in maturity and have different rates at which they understand and have appetite towards different aspects of their world.
What students need more than anything is the ability to deal with change (slow or fast, good or bad, expected or surprise); both on their own and as part of a communicating and collaborating community.
Tests … Bah!

Reply ↓
- A.G.McDowell on August 17, 2016 4:10 PM at 4:10 pm said:
  
  The UK NHS has seen everything from gaming statistics to outright fraud. The state of the art there appears to be prior to the discovery of statistical sampling, so there is scope for a lot of good to be done by improving things.
  
  The UK government (which represents the electors and is after all paying for everything) has a utility function not identical to that of any particular patient (in particular it cares about things like herd immunity). Even the most idealistic NHS worker is likely to have priorities aligned with their personal technical interests and the part of this vast organization that they can see for themselves. It is unrealistic to assume that if the government just stood back and threw in money everything would come out right.
  
  In all cases I believe the UK government has a duty to find out the consequences of its actions. In this case I believe that it can improve outcomes by using what it finds out to apply incentives. The trick, of course, is to make all of this happen in practice, and in a political situation where the UK-wide consensus is that the allocation of finite medical resources is by need, not by market forces (which is very noble, but which also deprives the system of a mechanism which in both practice and theory does tolerably well in other situations).
  
  Reply ↓
Steve on August 17, 2016 11:44 AM at 11:44 am said:

I am very disappointed in the article. First, it repeats some of the same tired attacks that the unions and their backers have been making against Success Academy as if they were established facts. I am a parent of a child at Success Academy with special needs. If anyone was going to get creamed or pushed out, it would be my son. I have not seen any evidence of that. To the contrary, I see heroic attempts to figure out how to help him perform in a normal educational environment. I cannot speak for other parents, and it would be extraordinary if Success Academy never mishandled a single student, but anecdotes are not evidence of a policy to get rid of low performers. Second, I think that the if you look at the numbers, you will see that the difference between SA scores and the scores at Department of Ed schools simply cannot be explained by “creaming” especially if you look at the schools in the same areas that most of the SA children come from. But, I have a bigger problem with your analysis and the use of the Lucas/Campbell line of thought when it comes to educational testing. Unlike many other areas, when it comes to educational testing, we cannot have this idealized notion that the there is some variable like “mathematical knowledge” that can be measured independent of any effect of the test or test setting. In other words, you can ask did Success Academy skew their results by reminding parents to get their kids to sleep early before the test, but it would be an equally legitimate question to ask if the DoE skewed their results by not reminding parents to get their kids to bed early. We might think that an academic test, any academic test, is testing two separate aspects — the underlying knowledge and the test taking skills. Some kids will have the test taking skills and some won’t. Pretending that I can measure only the former without the latter affecting the score is nonsense. And, it is very unlikely that those skills are distributed randomly. Parents with higher levels of academic attainment are more likely to impart those skills to their children than working class/poor parents who do not have experience in navigating academic testing. If SA is providing test taking skills to its students, it seems to me it is removing a source of bias rather than contributing to it. A better approach would be to make sure all of our children have test taking skills, so that the test results reflect actual competence with the material. Your example of the high school SHSAT scores illustrates my point rather than supporting yours. Everyone with kids in New York knows that the SHSAT is a test that requires preparation. The regular high school curriculum will not cover what is in the test. One population, Chinese Americans, does overwhelmingly well on the SHSAT because in Chinatown and Queens, there is an industry of low cost high quality test prep schools for the SHSAT. If Eva is to be believed the SA students who got into the specialized high schools did so without any specific SHSAT prep, which is impressive. Again, the claim seems plausible because the test prep schools are not really in the neighborhoods where SA students come from. The problem is that kids are not lab rats. They and their parents know they are being tested, and the tests are being used to evaluate them. We use standardized test to hand out educational opportunities all the way up until graduate and professional schools. Maybe we should just hand out college and graduate school admissions randomly, but until we decide to do that, high stakes testing is a fact of life and Campbell’s law is unavoidable. The only solution is to make sure that all children, poor children included know all of the fair ways to game the system (like getting enough sleep–to use your example).

Reply ↓
- parent010203 on August 17, 2016 5:10 PM at 5:10 pm said:
  
  “If Eva is to be believed the SA students who got into the specialized high schools did so without any specific SHSAT prep, which is impressive.”
  
  And what if it turns out that some of the 50 plus students did do some prep? Eva Moskowitz would have us believe that all 50 Success Academy 8th graders walked in to the SHSAT without ever seeing a “scrambled paragraph” question, let alone getting any guidance or practice on how to approach them. If it turns out that 1 or 5 or even 10 of the 50 students did do some scrambled paragraph practice, are you willing to call out the lie? The problem with Success Academy is exactly what you just stated. There seems to be a pathological need by the network to exaggerate their performance and when caught out on their dissembling, they double down and go on the attack.
  
  No doubt if we find out that some — even most! — of the Success Academy students who took the SHSAT did do some scrambled paragraphs practice, Ms. Moskowitz’ supporters will double down and attack the people who point out that she was lying. They remind me of Donald Trump’s supporters who always attack if you try to have a reasonable discussion about why their beloved candidate would claim that President Obama wasn’t born in the US, or that Obama is “the founder of ISIS”. According to Trump and his supporters, we are all supposed to accept those as fun “exaggerations’ that are just doing what all politicians do. How dare you challenge him? He always tells the truth because he cares so much about doing what is right. Never question his honesty.
  
  No doubt this parent will go on the attack if it turns out that all 50 Success Academy kids didn’t walk into the SHSAT cold. Like Trump supporters, he can’t see the larger picture because it’s all about whether his child is being served, and his child is absolutely being well-served in the school. Just like Trump may do things that help his supporters if he gets elected. So what? That doesn’t make it okay to lie. It doesn’t make it okay to be dishonest. That doesn’t make it okay to hurt more vulnerable children. And anyone who thinks that huge cohorts of disappearing at-risk kids (which most certainly do not include this well-educated poster’s child) should be excused by dishonest claims that they are violent, can only be taught in a special school for kids with very severe learning disabilities, or just have parents who hate high performing schools and love failing ones is as bad as the Trump supporters.
  
  But even if it turns out that Eva Moskowitz was less than truthful in her claim that not one of the Success Academy students who took the SHSAT did a smidgeon of prep, her supporters, like Donald Trump’s, will continue to attack the people who dare to mention it. An honest discussion is not something they seem to value.
  
  Reply ↓
  - Steve on August 18, 2016 12:00 PM at 12:00 pm said:
    
    An honest discussion would involve you taking arguments seriously and not making personal attacks. I do not understand the hatred. These kids work hard and got good test results. They got into high-schools that virtually guarantee that they will get into elite colleges and be able to obtain a level of economic and academic success that we should want for all children. These are kids from some of the poorest neighborhoods in the City. Even if they did do some prep, I don’t understand why that is a bad thing. Like I said, virtually everyone who does well on the SHSAT does prep.
    
    Reply ↓
    - Andrew on August 18, 2016 2:43 PM at 2:43 pm said:
      
      Steve:
      
      The issue is that the Success Academy kids were doing well on high-stakes, low-stakes tests, not so well on high-stakes, high-stakes tests. One difficulty here, I think, is the political climate in which charter schools are under pressure to demonstrate that they are better than public schools.
    - Rahul on August 18, 2016 11:11 PM at 11:11 pm said:
      
      What’s a “high-stakes, low-stakes test” vs a “high-stakes, high-stakes test”?
    - Andrew on August 18, 2016 11:57 PM at 11:57 pm said:
      
      Rahul:
      
      From our linked article:
      
      The explanation for the discrepancy would appear to be that in high school admissions, everybody is trying hard, so the motivational tricks and obsessive focus on tests at Success Academy schools has less of an effect. Routine standardized tests are, by contrast, high stakes for schools but low stakes for students. Unless prodded by teachers and anxious administrators, the typical student may be indifferent about his or her performance.
      
      . . .
      
      What went wrong with Success Academy is, paradoxically, what also seems to have gone right. Success Academy schools have excelled at selecting out students who will perform poorly on state tests and then preparing their remaining students to test well. But their students do not do so well on tests that matter to the students themselves.
    - parent010203 on August 19, 2016 1:05 AM at 1:05 am said:
      
      Andrew,
      
      As a statistician you are also aware that it’s impossible to draw any meaning from a “passing rate” of any single school if only a selected number of students take the exam.
      
      Not all students in a school take the SHSAT and the “passing rate” is greatly influenced by how carefully controlled the number of students in the denominator of the fraction is. If Success Academy had limited the number of students to the top 12 students who proved on practice tests to be high scoring on the SHSAT, their 6 students passing would be a nearly 50% passing rate. If 100 of the 230 students had taken the SHSAT instead of only 54, the 6 students passing would be only a 6% passing rate. The students who sit for the SHSAT are not a randomly selected population, so limiting them to the very strongest students insures a better passing rate than encouraging every student in the school to take the test “just in case”.
      
      No doubt there is a failing public school somewhere in NYC where the one student who sat for the SHSAT got a seat. How is that “100% passing rate” meaningful? Likewise, maybe 3 sat for the exam and one passed. That means that failing school had a 33% pass rate, which is nearly 4 times higher than Success Academy’s passing rate. The more the students who take the SHSAT are limited to only the top performing ones, the higher the passing rate will be.
      
      However, Success Academy did have 6 out of its 230 students pass the SHSAT so it is reasonable for them to claim that they have AT LEAST a 2.6% passing rate. How high that rate would be if all students actually took the exam is just idle speculation.
    - Rahul on August 19, 2016 1:34 AM at 1:34 am said:
      
      Andrew:
      
      Thanks!
      
      So, isn’t the solution here to rate schools by those metrics that also matter to students. e.g. Just something like SAT or ACT scores?
      
      i.e. It isn’t that standardized tests are bad, we just need to zero in on those standardized tests that *everyone* is motivated to do good on?
      
      We could do better by using other tests, but still standardized tests, to evaluate schools (after all, ACT / SAT are standardized too & I doubt you’d say those are only “high stakes” for the schools but not students)
    - Steve Sailer on August 21, 2016 4:30 AM at 4:30 am said:
      
      An interesting phenomenon is that Texas students does extremely well on the federal NAEP test within each ethnic group, while California students do poorly. Yet, California students seem to do at least as well on the high stakes SAT and ACT tests.
      
      I also wonder about the international PISA tests — how much does student motivation matter?
    - Martha (Smith) on August 21, 2016 3:51 PM at 3:51 pm said:
      
      @Steve:
      
      Two plausible explanations for the differences in performance between Texas and California on the two tests:
      
      1. Random variation
      
      2. The tests are testing for different things.
    - parent010203 on August 18, 2016 11:12 PM at 11:12 pm said:
      
      Steve says: “Even if they did do some prep, I don’t understand why that is a bad thing.”
      
      Doing prep is not a bad thing.
      
      LYING and claiming that your students are not doing any prep in order to make exaggerated claims about the education your school offers is a bad thing.
      
      How can I make that any more clear? Lying is bad. Charter school leaders who lie are bad.
      
      It really shouldn’t be so hard for you to understand.
    - mpledger on August 19, 2016 7:35 PM at 7:35 pm said:
      
      Because Success Academy are parading their high scores on the state tests as if they are some super school that knows how to educate kids. And they leverage this to gain an unfair amount of resources usually at a cost to public schools.
      
      The thing is corobarating evidence of the quality of the SA school (the number of kids who get into selective high schools) is shown to be at odds with their state test results so it calls into question whether the kids are getting a good, well-rounded education or an education (plus non-educational gaming) where the sole goal is to do well on state test.
    - Curious on August 19, 2016 7:38 PM at 7:38 pm said:
      
      Or biased selection, a causal explanation you’ve conveniently left out.
  - Rahul on August 18, 2016 11:09 PM at 11:09 pm said:
    
    @parent010203
    
    I am confused: Why is it wrong to prep for the SHSAT?
    
    I’ve heard students prep for every other test I can think of, GRE, SAT, ACT, GMAT, you name it.
    
    So what, if the kids got guidance on how to approach a “scrambled paragraph” question? It’s like demanding college kids go for a calculus test without ever having seen a trick of how to make a trigonometric substitution.
    
    Since when has it become evil to prep for a test?! The whole argument stumps me.
    
    Reply ↓
    - parent010203 on August 18, 2016 11:20 PM at 11:20 pm said:
      
      Rahul, there is nothing wrong with prepping for the SHSAT. Nearly all students do at least some prep before taking the exam.
      
      There is something wrong with LYING. How can I make that more clear to you? Steve explained quite clearly that Eva Moskowitz said that the kids at Success Academy did no prep for the SHSAT but a few still scored high enough to get in.
      
      Even Steve is now conceding that she most likely blatantly lied. But the question is why would she have to lie? As you said, there is nothing wrong with prepping. UNLESS you are pretending that your charter school is offering a very special education that so far surpasses any public school that no child would ever need to prep for the test.
      
      Ask yourself why a charter chain that is really successful would have to lie?
      
      Do you know another top notch school that blatantly lies? Does Stuy lie and claim that every student scores a 1600 SAT? That every student gets into an Ivy League school but most turn them down? When you are a truly excellent school, you don’t pathologically lie to make your accomplishments seem much better than they are.
      
      You lie when you intend to mislead.
    - Rahul on August 18, 2016 11:29 PM at 11:29 pm said:
      
      So your only point is that Eva Moskowitz is a liar? Fine. I’ll take your word on that. I doubt that’s the substantive issue here nor the focus of Andrew’s article (nor of Steve’s comment, I think).
      
      My point is that it’s perfectly OK to prep students for a test, and to have an environment where educators try to hide the fact that the did prep students (or get called out for making sure their kids were motivated or well-rested on test-day) is in itself bizarre.
    - parent010203 on August 19, 2016 12:19 AM at 12:19 am said:
      
      Yes, the point is that Eva Moskowitz is a liar. You don’t have to “take my word”. You either believe her when she said that no students did any prep or you think she is lying. The choice is up to you.
      
      Let’s assume you think she is lying. Then why wouldn’t she be lying about the reason so many 5 year olds are given out of school suspensions? Why wouldn’t she lie about whether low-performing kids are targeted for removal from her charters? Why wouldn’t she lie about why the attrition rates at her top-performing charter schools are significantly higher than attrition rates at charters that are not nearly as “good” academically and who don’t suspend 5 year olds? According to Ms. Moskowitz and her supporters like Steve, we are supposed to believe that parents of at-risk kids just don’t like top performing charters and their millions in donations as much as they like mediocre charters that get little funding. I doubt you believe that. I doubt you believe that high suspension rates and got to go lists and the type of teaching we saw on the video is irrelevant to a high performing charter school having a higher attrition rate than almost every other charter school. Common sense tells us the opposite should be true. Families who seek out charter schools should be more likely to stay at an excellent and well-funded one than to stay at a mediocre one with little extra funding. And yet the opposite is true.
      
      That’s what gaming the system is all about. That’s why you don’t compare a charter school that doesn’t have to follow rules with a public school that does. You compare a charter school with other charters serving the same kids. You stack rank every charter that serves the same kinds of students that Success Academy does to see how they are doing. Not surprisingly, not a single one comes close to Success. Not even in the same ballpark. Not in the number of kids they suspend. Not in their attrition rates. And not in their test performance rates. There is ONE outlier whose high scores cover up the mediocrity of the others.
      
      Andrew did NOT “call out” Success Academy for making sure their kids are motivated and well-rested on test day. He just pointed out that if one school does that and another school does not, the performance of their students on the day of the test does not accurately reflect what each child learned during the entire year. You are certainly free to argue that it reflects the importance that the school places on test scores and that’s a good thing. It just has nothing to do with how much the student actually learned.
      
      Getting back to my other point — if charter schools were stack ranked separately from public schools, no doubt the other charters would suddenly find religion and decide that these tests alone were useless measurements of a school. Those other charters would point out that they don’t suspend 5 year olds and don’t lose extraordinary numbers of their starting cohorts. Because if charter schools were stack ranked against each other — comparing apples to apples — and test scores alone were what matters, then it’s clear that only Success has the secret sauce to good teaching and there is “no excuse” for every charter not turning over its school to the master.
tristate on August 17, 2016 1:17 PM at 1:17 pm said:

Some say that “Asians are successful because they prepare for the tests” (alternatively, “they can afford tutors to prepare for the tests”), some say that “Asians are successful because of the tough discipline”. Looks like none of these things make you “successful”. Go figure.

I am curious if may be faculty of the Success Academies falsifies tests results – may be not always, but frequently enough? Allows students to use illicit aids when taking the tests? I remember in Georgia that was the problem

Reply ↓
Peter on August 17, 2016 1:56 PM at 1:56 pm said:

I like Andrew and Palko’s article. Their main point is very well taken: When you create a high-stakes testing system you must ensure that the system does not get gamed. Or, more generally, a data-driven system is worth little if the integrity of the data is not safeguarded.
I also agree with Rahul: “If your tests are so easily gamed, change the damn tests!” I remember that the Harvard education researcher David Koretz did some research on gaming of test scores showing that when tests get re-used test score gains over time are illusory (once the test changes, that is, once students have to take a test which is new to their teachers, scores go down appreciably). Koretz also wrote a book about the complexities of educational testing: “Measuring Up – What Educational Testing Really Tells Us” (Harvard UP, 2009)
http://www.hup.harvard.edu/catalog.php?isbn=9780674035218
Steve makes good points too. Maybe one can summarize the discussion this way: Test scores are a function of:
student knowledge,
student motivation to do well on the test,
student test taking skills, and (possibly)
gaming strategies (teaching to the test when tests are re-used, creaming the student population, making sure that academically weak students are not present on the test day, outright cheating where teachers help students with answering – I think all these things have happened in the wake of the push for test-based accountability in US K-12 education)
The tricky thing then becomes: how to interpret the results as indicating relative and absolute knowledge proficiency.

Reply ↓
- Peter on August 17, 2016 2:18 PM at 2:18 pm said:
  
  As gdanning’s comment below reminded me, when writing that “Test scores are a function of …” I assumed that tests are properly constructed from a psychometric point of view. In practice, one cannot necessarily make this assumption
  
  Reply ↓
  - Steve Sailer on August 21, 2016 4:39 AM at 4:39 am said:
    
    A lot of tests work well under original conditions, but can be gamed by an all-out assault by Tiger Mothers.
    
    For example, admissions to Manhattan’s $40k per year private kindergartens was for decades based on IQ scores on the Wechsler IQ test. It worked fine for years, but the Wechsler was designed to be a diagnostic test to be used under conditions where everybody wants to know what the honest results are. It wasn’t designed to be a competitive test that Manhattan’s most cunning and devious parents are trying to beat. Eventually, it was thoroughly compromised and had to be replaced.
    
    Reply ↓
- Rahul on August 18, 2016 11:18 PM at 11:18 pm said:
  
  Re: “teaching to the test when tests are re-used”
  
  I think the more fruitful approach in this eventuality is to target the lazy test makers who are reusing test questions!
  
  Rather than focus on the “teaching to the test” aspect. If your test is not easily predictable no one can teach to the test even if they want to!
  
  Reply ↓
  - Peter on August 19, 2016 2:54 PM at 2:54 pm said:
    
    I think the re-use of whole tests (not just of individual questions) was a decision of the school districts (maybe to save money). I agree that if the test is properly constructed and is not re-used then teaching to test seems not to be a problem.
    
    Reply ↓
    - mpledger on August 19, 2016 11:28 PM at 11:28 pm said:
      
      Would you want art students to be taught to the test or to produce their own art? And then be able to compare their work to another artist’s body of work?
      
      Would you want history students to taught to the test or be able to take multiple sources of information and write their own viewpoint?
      
      Would you want a maths student to be taught to the test or be able to go out in to the world, find a problem and be able to turn it into a mathematical model and solve it?
      
      Standardised tests only test very low level skills. Teaching to the test means that low level skills are taught in place of the more time consuming to teach, higher level skills.
    - Rahul on August 22, 2016 3:20 AM at 3:20 am said:
      
      It’d be interesting to peruse the ACT / SAT / GRE scores of the best mathematicians.
    - Martha (Smith) on August 22, 2016 12:19 PM at 12:19 pm said:
      
      Re GRE scores:
      
      This is not quite what you’re asking, but perhaps relevant.
      
      For several years (many years ago) I was on the panel to evaluate students applying for NSF graduate fellowships. As I recall, GRE scores were not very helpful, since so many students had 800 or very close on the math portion of the general exam, and over 800 on the Advanced Math subject exam.
      
      One thing I do recall being helpful was if a letter of recommendation said, “Can do the hard problems in Herstein” (referring to Herstein’s textbook Topics in Algebra).
    - Rahul on August 22, 2016 1:38 PM at 1:38 pm said:
      
      Indeed. I agree.
      
      My point is that something like GRE or any other standardized test is a necessary but not sufficient condition to be (say) a good engineer / mathematician etc.
      
      i.e. You’ll be hard pressed to find a good grad student who somehow was terrible in standardized tests but great at (more difficult) actual problems.
      
      Ergo, as a mass screening, or a first screen standardized tests are a good tool. i.e. If someone got 700 on a Quant GRE there’s very low chance that you have a gem of a mathematician here.
    - Curious on August 22, 2016 7:24 PM at 7:24 pm said:
      
      Rahul,
      
      I think what you mean to say is that the level of ability required to score at that level is required to be a good engineer/mathematician.
      
      While the score may be sufficient to provide evidence of this level of ability, it is certainly not logically necessary to do so.
Peter on August 17, 2016 2:07 PM at 2:07 pm said:

On test-taling skills:

Kaiser Fung: 5 Things Your Kids Should Do to Ace Their Exams. Dec 2014
http://www.scarymommy.com/club-mid/5-things-your-kids-should-do-to-ace-their-exams/
“The U.S. education system fails at this [teaching kids how to take exams] — I know this both because I have marked a lot of exams and because I learned exam-taking skills elsewhere.”

Reply ↓
- Andrew on August 19, 2016 11:37 PM at 11:37 pm said:
  
  Hey—I didn’t know that Kaiser writes for scarymommy.com!
  
  Reply ↓
gdanning on August 17, 2016 2:07 PM at 2:07 pm said:

I am anything but a fan of wasting valuable time teaching students how to game multiple choice tests (and when I was a high school teacher, I refused to do so, despite occasional pressure — Thanks, tenure!) but I have some issue with this: “It’s not clear why race should be a factor when interpreting one and not the other.” There might well be differences in how the tests are structured or graded that might disproportionately affect students of one race or ethnicity. Eg, in my experience, English Language Learners often stumble over things written in the passive voice. If Test A scrupulously avoided the passive voice and Test B did not, ELL students might do better on Test A than Test B, but a statistician not knowing that would say, “t’s not clear why ELL status should be a factor when interpreting one and not the other.”

Reply ↓
- Seth M. Spain on August 17, 2016 2:41 PM at 2:41 pm said:
  
  This is a really great point. In shop talk, I’ve often heard examples of items that showed differential functioning (DIF) between racial ethnic groups, where sometimes it’s easy to see why (the so-called math question used baseball as its basis, so students from cultures where baseball is not popular had a much harder time understanding it), but then others where it’s just…uhhh, nope we have no idea why this item showed DIF. Could be that the statisticians and psychometricians couldn’t see the problems in language *use*.
  
  Reply ↓
  - Martha (Smith) on August 17, 2016 4:07 PM at 4:07 pm said:
    
    I agree that language use is too often neglected in teaching and testing. In particular, in teaching math and statistics, I have learned to give explicit attention to it — especially pointing out differences between “everyday” and technical use, and in trying to give exercises that use different phrasing. (e.g., giving exercises that ask for the converse of a statement, using lots of different ways of stating an implication for students to give the converse of.)
    
    Reply ↓
- elin on August 18, 2016 1:31 PM at 1:31 pm said:
  
  We’ve used a number of assessments over the years, not high stakes for students but for our department to understand what’s happening. We always find that ELLs perform worse. The language in tests can become tricky as you try to test more challenging concepts. For example even students who know how to calculate percents in contingency tables correctly can get confused by sentences that talk about “percent of xs who are ys” versus “y percent of xs” in a confusing way. Sadly, a lot of time the “test taking skills” classes can actually hurt. For example they are given strategies for word problems that tell them to look for particular words to know the operation required, but in tests with complex language this doe not always work.
  
  Reply ↓
  - Martha (Smith) on August 18, 2016 5:55 PM at 5:55 pm said:
    
    +1
    The “look for particular words” strategy is problematical even in teaching young children to decide which arithmetic operation to use. For example, kids are (too) often taught that “and” tells them to add. So they might add if the problem says,”Andrew had five cookies and gave two to Elin. How many did he have left?”
    
    Reply ↓
  - Steve Sailer on August 21, 2016 4:44 AM at 4:44 am said:
    
    There are two main kinds of English Language Learners:
    
    – Short term ones who really are learning English
    
    – Long term English Language Learners who can speak English fine, but they can’t pass the ELL test for the same reasons they can’t pass math tests: they’re not very smart. California schools are full of kids who have been designated ELL for close to a decade who speak English with no discernible foreign accent, but still score poorly on the ELL exam, as well as all their other exams.
    
    People who make money off ELL programs aren’t excited about calling attention to this second group of ELL students.
    
    Reply ↓
    - Martha (Smith) on August 21, 2016 3:53 PM at 3:53 pm said:
      
      “but they can’t pass the ELL test for the same reasons they can’t pass math tests: they’re not very smart.”
      
      Or perhaps because they don’t give a damn about passing.
    - Rahul on August 21, 2016 3:58 PM at 3:58 pm said:
      
      How would you tell those two apart?
    - Martha (Smith) on August 21, 2016 5:06 PM at 5:06 pm said:
      
      I expect that teasing them apart would be very difficult. That’s a problem with many measurements on people. The upshot is that, to be intellectually honest, we often have to live with a lot of uncertainty — and reserve certainty for those (rare) cases where we have very strong evidence.
      
      And in this case, one of the difficulties is with different interpretations of “smart”.
- Rahul on August 18, 2016 11:15 PM at 11:15 pm said:
  
  @gdanning
  
  Can you give an example of what you mean by “teaching students how to game multiple choice tests”?
  
  How do you mean? In what way can I teach students to score well on tests that is counterproductive to actual learning?
  
  Reply ↓
  - gdanning on August 19, 2016 6:56 AM at 6:56 am said:
    
    Oh,things like this http://www.businessinsider.com/4-ways-to-outsmart-any-multiple-choice-test-2015-6 as opposed to teaching content or academic skills. And, I would not say that learning those tricks is counterproductive to actual learning, but rather that classroom time should be spent on more substantive topics.
    
    Reply ↓
    - Andrew on August 19, 2016 9:23 AM at 9:23 am said:
      
      Gdanning:
      
      I was curious so I followed the link. I might be wrong, but it’s hard for me to believe that these tips will work very well. My guess is that the reasoning required to follow the tips is more difficult than the reasoning required to just solve the problems in the first place. Except I guess the tip to always choose the longest answer: if that really works, that’s indeed an easy one.
    - Rahul on August 19, 2016 10:59 AM at 10:59 am said:
      
      +1 More than gaming tests this sounded like wasting time on strategies that didn’t work.
      
      In any case, if test options are just so easily predictable, we are just creating pathetically bad tests. Some more effort in writing good tests wouldn’t be a bad idea.
    - Andrew on August 19, 2016 11:38 PM at 11:38 pm said:
      
      Rahul:
      
      It’s not easy to write good tests! And then people copy down the test questions, and the test needs to be changed. It’s a big industry all around.
    - Rahul on August 20, 2016 2:29 PM at 2:29 pm said:
      
      Agreed. But better have the test makers work harder to create good tests than us having to conduct a witch hunt accusing teachers of teaching to tests & questioning the legitimacy of good scores due to gaming.
    - parent010203 on August 20, 2016 5:38 PM at 5:38 pm said:
      
      Rahul @2:29pm
      
      Why would you call Andrew’s research a “witch hunt”?
      
      “Success Academy schools have excelled at selecting out students who will perform poorly on state tests and then preparing their remaining students to test well. But their students do not do so well on tests that matter to the students themselves.”
      
      That’s what his research showed. What part of that conclusion is a “witch hunt”? And in terms of the “legitimacy of good scores due to gaming”, what else do you call it if large cohorts of students suddenly disappear from an extremely high performing charter school — tops in the entire city! — that those very same parents specifically sought out for their supposedly great test scores? A happy coincidence? Having nothing to do with “got to go” lists, out of school suspension rates in lower grades (including Kindergarten) that can be as high as 20%, or “model” teachers who are shown on videotape to be “modeling” the ideal way to turn to a child who doesn’t know the right answer into a “bad” kid?
    - Rahul on August 20, 2016 10:05 PM at 10:05 pm said:
      
      No, not referring to Andrew’s work at all.
      
      All I’m saying is that if a standardized test is inherently good we don’t have to second guess performance by trying to hypothesize if teachers coached kids to game the test etc.
      
      We can simply let teachers try their best without having to worry if they are teaching or teaching to the test.
  - parent010203 on August 19, 2016 3:41 PM at 3:41 pm said:
    
    @Rahul’s question: “In what way can I teach students to score well on tests that is counterproductive to actual learning?”
    
    I can answer that from a parent’s perspective having seen “test prep” aligned with the Common Core.
    
    A typical reading passage for 8 year olds might include:
    
    Mary liked having fun. She didn’t want to do things that were meaningful. She enjoyed frivolous pursuits.
    
    Private school students are given the CTP4 exam. And a multiple choice question might read:
    
    “What word best describes the kind of activities Mary like to do?” Answer choices: A. Serious B. Unimportant C. Difficult
    
    But the public school students’ “common core” question now reads:
    
    “What word in the paragraph best helps the reader to understand what the author intended when he wrote that Mary “enjoyed frivolous pursuits.” Answer choices: A. Fun B. Games C. Value.
    
    Now imagine explaining to your 8 year old to put aside all logical reasoning and just try to figure out what the question is really asking, even if it isn’t really asking it in a logical way. Yes, questions have always been poorly written, but these seems intentionally designed to trip up well-educated students who haven’t been taught that the way to answer these questions correctly is to ignore what the sentence actually says and decipher what the person who wrote the question wants you to answer. And the question that the person who wrote the question wants you to answer is: “What word best describes the kind of activities Mary likes to do?” The very question that private schools use to find out whether students have good reading comprehension. Because private schools understand that good reading comprehension is tested by straightforward questions and not convoluted ones that can trip up students who actually comprehend the reading passage perfectly.
    
    The fact that for 2 years straight, the Success Academy 8th graders with “the highest state test scores in the city” didn’t score well on the SHSAT didn’t shock me. Not because they were cheating on state tests to get those high scores. But because years of being test prepped to decipher a poorly written question correctly is counterproductive on an SHSAT test that asks questions requiring logical reasoning. No one really believes that this year the 54 students didn’t do an ounce of test prep for the SHSAT. I suspect the few who did well unlearned years of bad test prep that helps kids score higher on state tests.
    
    That is why the opt out movement against common core state exams was centered in affluent communities that still encouraged their kids to take AP courses and exams, SATs, ACTs, SATIIs, PSATs, etc. and where students received some of the highest scores in the country on those exams. Parents weren’t “afraid of testing their kids”. They didn’t want their elementary school children’s education to be warped in order to make sure that more of the well-educated students could choose the “correct” answer to a convoluted question when given 2 ambiguous choices.
    
    And they didn’t believe for a minute that whether their child could choose the so-called “correct” answer 3 out of 5 times or 4 out of 5 times or 5 out of 5 times reflected how much they had learned during the school year.
    
    If private schools “opted in” to the state tests as they are perfectly free to do, I have no doubt those convoluted questions that tell you nothing about what a child has learned would be gone. The few NYC private schools that opt in don’t have good results, but they don’t especially care nor do their parents. It doesn’t bother them to pay $30,000/year to attend a private school where only half the students are supposedly “proficient” on state exams when they could get a free education at their zoned public school that has a higher “passing rate”. It doesn’t mean their private school is terrible (and a waste of money) and their child’s teachers are bad. It means the private school placed no value in teaching their students to “game” the system. And the parents understand that is much better for their child in the long run. Especially if their student who is deemed “below proficient” on the state tests turns out to be quite above average on the CPT4.
    
    Reply ↓
    - Rahul on August 20, 2016 2:24 PM at 2:24 pm said:
      
      So why aren’t we discussing why the SHSAT has such crappy questions on it?
      
      Isn’t that the core issue here?
    - parent010203 on August 20, 2016 3:19 PM at 3:19 pm said:
      
      The strangely worded type of questions I described above can be found on the state tests given to 3rd through 8th graders. Those state tests are the ones that students at Success Academy take and perform at levels that no other public or charter school comes close to matching. The 8th graders at Success Academy schools perform better than 8th graders in virtually every other school in the city on a test that includes those kinds of “crappy” questions. Those types of questions are not found on the SHSAT.
      
      The SHSAT is a different exam that is given to all students — even the ones coming out of private schools — for admissions to specialized high schools. They don’t ask include questions like “what word in paragraph 3 best helps the reader know the intention of author when he wrote this sentence in paragraph 4?”
      
      So what you meant is “why aren’t we discussing why the state tests has such crappy questions on it”?
      
      People are discussing it. Parents are discussing it. Why do you think the opt out movement grew exponentially after common core tests with these kinds of questions were introduced? There were complaints, but because the common core exams were really about convincing parents that even the best public schools weren’t very good instead of a genuine attempt to figure out if students had educational needs that weren’t being addressed, those complaints were ignored by agencies in thrall to the privatization promoters. It took the opt out movement to get those tests looked at more carefully. Before that, the pro-charter folks delighted in telling affluent public school parents that they were just afraid for their child to take a test and why not come to our charter school where trained teachers will turn their children into scholars who learn exactly how to answer those crappy questions correctly. (Even if they have trouble in a test very similar to the SAT exam that your kids currently do very well on).
      
      Before the opt out movement, those “crappy” tests were held up as clear evidence that charter schools were much better than public schools — even the best public schools. Because only charters could achieve 99% passing rates on them. No one was saying “let’s change the tests”, they were saying “let’s change public schools so that all learning is focused on teaching kids to answer those crappy questions correctly”.
      
      Guess why the opt out movement is highest in suburban communities where parents are the most educated? And seeing how constant test prep for poorly written state tests might result in a mediocre performance on the SHSAT, it seems as if the opt out parents were correct. It was a good thing that their schools didn’t change to model themselves after charter schools who could get 99% passing rates on state tests but students struggled when it came to the SHSAT.
      
      Don’t you agree?
    - Rahul on August 20, 2016 10:13 PM at 10:13 pm said:
      
      Maybe the states should just let ACT or ETS or someone like that prepare its tests?
      
      But this doesn’t sound like “schools ruining standardized tests”. Rather crappy tests ruin students & schools.
    - parent010203 on August 21, 2016 5:12 PM at 5:12 pm said:
      
      Rahul@10:30pm
      
      I am answering the questions you ask. You asked why no one was discussing why the test had such crappy questions on it and I explained that you were wrong, and that was a big reason the opt-out movement has the most traction in affluent suburban communities where most parents are used to standardized tests and realize how terrible these are. Nonetheless, even with terrible tests, the same thing happens. The more affluent the school, the more educated the parents, the better the kids score on standardized tests. Even crappy ones. They may have shockingly low passing rates for these crappy tests, but they are still better than the worse passing rates on schools that serve large numbers of at-risk students.
      
      The only “anomaly” is that certain charter school networks — not most of them — get unusually high passing rates with at-risk students. But Andrew’s point is that since those unusually high (99%!) passing rates are only achieved by charter schools that game the system, using standardized tests to evaluate schools is a useless exercise. These charters have become so good at gaming to get results that the data has become useless to evaluate schools. (Unless you are evaluating them on how good they are at gaming results!)
      
      Models “require careful thought and oversight to prevent gaming and what statisticians call model decay. . . .” And Andrew is pointing out that model decay is a real problem. If you allow gaming to go on unabated and make test prep the focus of education, you might have high passing rates, but it tells you nothing about whether the school is “successful”.
  - mpledger on August 19, 2016 11:41 PM at 11:41 pm said:
    
    Rahul said:
    How do you mean? In what way can I teach students to score well on tests that is counterproductive to actual learning?
    ~
    You can teach kids to rote learn information but they can have no clue what that information means.
    
    Reply ↓
    - Rahul on August 20, 2016 2:19 PM at 2:19 pm said:
      
      Yes, but it’s a mighty stupid test if all its questions are so easily answerable by rote learning alone.
      
      Again, the solution is to target the people creating bad tests. Not students or teachers.
Diana Senechal on August 19, 2016 8:39 AM at 8:39 am said:

Thank you for this excellent article. Having taught for nine years in New York City public schools (and now having left to write my second book), I consider this one of the most lucid analyses of the Success Academy phenomenon that I have seen.

Yes, it makes sense that Success Academy students would have an edge over other students on the standardized tests, where the stakes for students are low, but not on the specialized school exams, where the stakes for students are high. Because everyone is “trying hard” in the latter case, those who come armed with test-taking strategies are less likely to have an advantage.

Yet I wonder whether there might not also be a difference in the tests themselves.

It seems to me that the eighth-grade mathematics exam requires minimal reasoning. (See the released 2015 questions: https://www.engageny.org/resource/released-2015-3-8-ela-and-mathematics-state-test-questions.) As long as you know how to solve the type of problem before you, you can solve it in your head in a second or two.

Example:

Jenny wants to rent a truck for one day. She contacted two companies. Laguna’s Truck Rentals charges $20 plus $2 per mile. Salvatori’s Truck Rentals charges $3 per mile. After how many miles will the total cost for both companies be the same?

A: 4
B: 6
C: 20
D: 60

20 + 2x = 3x; x = 20

In contrast, the problems on the specialized exam (http://schools.nyc.gov/NR/rdonlyres/1FDB8183-E675-42D9-A17C-237C18E4C255/0/SHSAT_StHdbk_201415.pdf) require more steps. They aren’t terribly difficult, but they do require more complex reasoning than the standardized test questions.

Example:

A one-room school has three grades—6th, 7th, and 8th. Eight students attend the school: Ann, Bob, Carla, Doug, Ed, Filomena, George,
and Heidi. In each grade there are either two or three students.
1) Ann, Doug, and Filomena are all in different grades.
2) Bob and Ed are both in the 7th grade.
3) Heidi and Carla are in the same grade.
Based only on the information above, which of the following must be true?

F. Exactly two students are in the 6th grade.
G. Carla and Doug are in the same grade.
H. Exactly three students are in the 7th
grade.
J. Heidi and Ann are in the same grade.
K. Filomena is in the 8th grade.

When you combine “1) Ann, Doug, and Filomena are all in different grades” and “2) Bob and Ed are both in the 7th grade” with the information that there are three grades (6-8), with two or three students in each, you see that H must be true. To solve this, you need to combine the right pieces of information. This takes not only several steps of reasoning but perceptive selection. One could get better at this kind of problem through practice–but it still requires more reasoning on the spot than the standardized test questions generally do.

So here’s an additional hypothesis–which combines well with that of the article: test prep has a greater effect on tests that reward it: namely, tests that expect students not to reason on their own but to recognize certain types of problems and solve them in the way they have been taught.

Reply ↓
- Rahul on August 23, 2016 12:59 AM at 12:59 am said:
  
  ” test prep has a greater effect on tests that reward it”
  
  Isn’t that tautologically true? “Activity-X has a greater effect on tests that reward Activity-X”
  
  Reply ↓
mclaren on August 19, 2016 4:52 PM at 4:52 pm said:

I think one cogent criticism of Andrew’s admirable analysis is that his basic assumption is incorrect. The assumption appears to be that admission to elite schools depends on merit, i.e., scholastic aptitude. But it seems far more likely that admission to elite schools really represents the endpoint of a social sieving process. Students get admitted to elite schools based on social class and signaling, not because of merit.

Andrew is doubtless correct that the charter schools studied did a poor job of getting their high-testing students into elite schools. But all non-elite schools which aren’t prep schools appear to do a poor job of getting their high-testing students into elite schools. The single variable that has the largest correlation with admission to an elite college is the parent’s income. And for obvious reasons: as with George W. Bush, elite colleges eagerly take rotten students with very high social standing (a student whose dad in the head of the CIA, for example) as opposed to high-scoring nudniks from East Armpit, Nowhere whose parents are dishwashers and truck drivers.

Reply ↓
- parent010203 on August 20, 2016 2:14 PM at 2:14 pm said:
  
  Andrew didn’t base his comment on “elite schools”. He based it on a specific set of elite schools in NYC where the sole admissions criteria is performance on a standardized test. And the majority of students who score well on that test for high school admissions — the SHSAT — attended public middle schools that have lower overall passing rates on the state tests than Success Academy did. Often much, much lower. And the majority of the public school students who scored high on the SHSAT are low-income students. The big 3 specialized high schools have a majority of low-income students.
  
  “…the charter schools studied did a poor job of getting their high-testing students into elite schools…” That is a very concerning statement, because this charter school is famous for doing test prep, having slam the exam rallies, posting students’ practice tests results on state tests publicly to shame (or, at the school claims, “motivate”) the poor performers and celebrate the high scoring ones.
  
  It’s absurd that anyone would give any credence to what the leader of this charter school said to excuse her students’ historically very weak performance — that not a single one of those 53 8th graders prepped for the SHSAT. These are 8th graders who were schooled for 8 years in a culture where test prep was king, and the charter school CEO insulted every one of their parents by saying they were too ignorant to think “hey, my child is about to take a high-stakes test that will impact where he goes to high school, maybe I should have him take a look at the exam, take some practice tests, work on areas where he struggles”. They have their child in a test prep culture and don’t think they need to prep for a test that is far more important to that child than the state tests, which are only important to the school? Does anyone believe that?
  
  Reply ↓
- Steve Sailer on August 21, 2016 4:53 AM at 4:53 am said:
  
  Here are the demographics of Stuyvesant H.S., the top of the NYC elite test schools:
  
  “As of the 2014-15 school year, Asian students made up 73% of the school’s population; White students, 20%; Latinos, 3%; Blacks, 1%; and unknown/other, 7%.[38][39]”
  
  So, the student body is overwhelmingly Asian.
  
  The Emperor of China started testing for mandarins about 1400 years ago. Test prepping in China probably started about 1399 years ago.
  
  One reason there are so few blacks at Stuyvesant is that high scoring blacks can often get a scholarship to Dalton or another private school. Why go to a public school with a lot of Asians when you can go to a private school with Michael Bloomberg’s grandkids?
  
  Reply ↓
Jon on August 22, 2016 1:11 PM at 1:11 pm said:

Don’t worry. Having worked at several financial companies, I can assure you that the private sector also struggles with people arbing the incentive structure. Pay customer service people on volume, they will strike lousy deals to increase volume. Pay on some measure of profitability, they will somehow find the weakness in the metric and exploit it. Measure an oversight group by resolving regulatory issues, then they will inflict high costs on other parts of the firm to avoid a slight risk of creating an issue.

Reply ↓
- Rahul on August 22, 2016 1:34 PM at 1:34 pm said:
  
  +1
  
  Pay marketing agents a commission as a percent of total sales & they will bring you lousy margins.
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

How schools that obsess about standardized tests ruin them as measures of success

91 thoughts on “How schools that obsess about standardized tests ruin them as measures of success”

Leave a Reply Cancel reply