Did blind orchestra auditions really benefit women?

You’re blind!
And you can’t see
You need to wear some glasses
Like D.M.C.

Someone pointed me to this post, “Orchestrating false beliefs about gender discrimination,” by Jonatan Pallesen criticizing a famous paper from 2000, “Orchestrating Impartiality: The Impact of ‘Blind’ Auditions on Female Musicians,” by Claudia Goldin and Cecilia Rouse.

We’ve all heard the story. Here it is, for example, retold in a news article from 2013 that Pallesen links to and which I also found on the internet by googling *blind orchestra auditions*:

In the 1970s and 1980s, orchestras began using blind auditions. Candidates are situated on a stage behind a screen to play for a jury that cannot see them. In some orchestras, blind auditions are used just for the preliminary selection while others use it all the way to the end, until a hiring decision is made.

Even when the screen is only used for the preliminary round, it has a powerful impact; researchers have determined that this step alone makes it 50% more likely that a woman will advance to the finals. And the screen has also been demonstrated to be the source of a surge in the number of women being offered positions.

That’s what I remembered. But Pallesen tells a completely different story:

I have not once heard anything skeptical said about that study, and it is published in a fine journal. So one would think it is a solid result. But let’s try to look into the paper. . . .

Table 4 presents the first results comparing success in blind auditions vs non-blind auditions. . . . this table unambigiously shows that men are doing comparatively better in blind auditions than in non-blind auditions. The exact opposite of what is claimed.

Now, of course this measure could be confounded. It is possible that the group of people who apply to blind auditions is not identical to the group of people who apply to non-blind auditions. . . .

There is some data in which the same people have applied to both orchestras using blind auditions and orchestras using non-blind auditions, which is presented in table 5 . . . However, it is highly doubtful that we can conclude anything from this table. The sample sizes are small, and the proportions vary wildly . . .

In the next table they instead address the issue by regression analysis. Here they can include covariates such as number of auditions attended, year, etc, hopefully correcting for the sample composition problems mentioned above. . . . This is a somewhat complicated regression table. Again the values fluctuate wildly, with the proportion of women advanced in blind auditions being higher in the finals, and the proportion of men advanced being higher in the semifinals. . . . in conclusion, this study presents no statistically significant evidence that blind auditions increase the chances of female applicants. In my reading, the unadjusted results seem to weakly indicate the opposite, that male applicants have a slightly increased chance in blind auditions; but this advantage disappears with controls.

Hmmm . . . OK, we better go back to the original published article. I notice two things from the conclusion.

First, some equivocal results:

The question is whether hard evidence can support an impact of discrimination on hiring. Our analysis of the audition and roster data indicates that it can, although we mention various caveats before we summarize the reasons. Even though our sample size is large, we identify the coefficients of interest from a much smaller sample. Some of our coefficients of interest, therefore, do not pass standard tests of statistical significance and there is, in addition, one persistent result that goes in the opposite direction. The weight of the evidence, however, is what we find most persuasive and what we have emphasized. The point estimates, moreover, are almost all economically significant.

This is not very impressive at all. Some fine words but the punchline seems to be that the data are too noisy to form any strong conclusions. And the bit about the point estimates being “economically significant”—that doesn’t mean anything at all. That’s just what you get when you have a small sample and noisy data, you get noisy estimates so you can get big numbers.

But then there’s this:

Using the audition data, we find that the screen increases—by 50 percent—the probability that a woman will be advanced from certain preliminary rounds and increases by severalfold the likelihood that a woman will be selected in the final round.

That’s that 50% we’ve been hearing about. I didn’t see it in Pallesen’s post. So let’s look for it in the Goldin and Rouse paper. It’s gotta be in the audition data somewhere . . . Also let’s look for the “increases by severalfold”—that’s even more, now we’re talking effects of hundreds of percent.

The audition data are described on page 734:

We turn now to the effect of the screen on the actual hire and estimate the likelihood an individual is hired out of the initial audition pool. . . . The definition we have chosen is that a blind audition contains all rounds that use the screen. In using this definition, we compare auditions that are completely blind with those that do not use the screen at all or use it for the early rounds only. . . . The impact of completely blind auditions on the likelihood of a woman’s being hired is given in Table 9 . . . The impact of the screen is positive and large in magnitude, but only when there is no semifinal round. Women are about 5 percentage points more likely to be hired than are men in a completely blind audition, although the effect is not statistically significant. The effect is nil, however, when there is a semifinal round, perhaps as a result of the unusual effects of the semifinal round.

That last bit seems like a forking path, but let’s not worry about that. My real question is, Where’s that “50 percent” that everybody’s talkin bout?

Later there’s this:

The coefficient on blind [in Table 10] in column (1) is positive, although not significant at any usual level of confidence. The estimates in column (2) are positive and equally large in magnitude to those in column (1). Further, these estimates show that the existence of any blind round makes a difference and that a completely blind process has a somewhat larger effect (albeit with a large standard error).

Huh? Nothing’s statistically significant but the estimates “show that the existence of any blind round makes a difference”? I might well be missing something here. In any case, you shouldn’t be running around making a big deal about point estimates when the standard errors are so large. I don’t hold it against the authors—this was 2000, after all, the stone age in our understanding of statistical errors. But from a modern perspective we can see the problem.

Here’s another similar statement:

The impact for all rounds [columns (5) and (6)] [of Table 9] is about 1 percentage point, although the standard errors are large and thus the effect is not statistically significant. Given that the probability of winning an audition is less than 3 percent, we would need more data than we currently have to estimate a statistically significant effect, and even a 1-percentage-point increase is large, as we later demonstrate.

I think they’re talking about the estimates of 0.011 +/- 0.013 and 0.006 +/- 0.013. To say that “the impact . . . is about 1 percentage point” . . . that’s not right. The point here is not to pick on the authors for doing what everybody used to do, 20 years ago, but just to emphasize that we can’t really trust these numbers.

Anyway, where’s the damn “50 percent” and the “increases by severalfold”? I can’t find it. It’s gotta be somewhere in that paper, I just can’t figure out where.

Pallesen’s objections are strongly stated but they’re not new. Indeed, the authors of the original paper were pretty clear about its limitations. The evidence was all in plain sight.

For example, here’s a careful take posted by BS King in 2017:

Okay, so first up, the most often reported findings: blind auditions appear to account for about 25% of the increase in women in major orchestras. . . . [But] One of the more interesting findings of the study that I have not often seen reported: overall, women did worse in the blinded auditions. . . . Even after controlling for all sorts of factors, the study authors did find that bias was not equally present in all moments. . . .

Overall, while the study is potentially outdated (from 2001…using data from 1950s-1990s), I do think it’s an interesting frame of reference for some of our current debates. . . . Regardless, I think blinding is a good thing. All of us have our own pitfalls, and we all might be a little better off if we see our expectations toppled occasionally.

So where am I at this point?

I agree that blind auditions can make sense—even if they do not have the large effects claimed in that 2000 paper, or indeed even if they have no aggregate relative effects on men and women at all. What about that much-publicized “50 percent” claim, or for that matter the not-so-well-publicized but even more dramatic “increases by severalfold”? I have no idea. I’ll reserve judgment until someone can show me where that result appears in the published paper. It’s gotta be there somewhere.

P.S. See comments for some conjectures on the “50 percent” and “severalfold.”

58 thoughts on “Did blind orchestra auditions really benefit women?

  1. “In the 1970s and 1980s, orchestras began using blind auditions. Candidates are situated on a stage behind a screen to play for a jury that cannot see them. In some orchestras, blind auditions are used just for the preliminary selection while others use it all the way to the end, until a hiring decision is made.”

    I hope this also involved turning chairs, “block” buttons, and celebrity coaches…

  2. I have also read somewhere that in the earlier blind auditions, women usually wore high heels, and the sound of them walking to the playing position was pretty distinctive. Supposedly, later on most women wore flats to the auditions, and then they tended to do better.

    Urban myth? I wouldn’t be surprised. But it does highlight that all “blind” auditions might not have been equal.

    • It is not an urban myth. I have always been told to wear flats. Also, none of my auditions have been on carpet. They have always been on stage and no carpet was placed.

  3. Andrew:
    Looking at the numbers 0.011 and 0.006 if you do (.011 – .006)/.011 you get .45 … maybe that’s what they mean? If so, wow just wow. And “several fold” could be the .006*2 is about the same as .011. And the “percentage point”? .011-.006, with some kind of rounding.

    • Anon:

      It can’t be that, because both the 0.011 and the 0.006 have huge standard errors so their ratio is essentially impossible to estimate.

      It’s funny, I don’t usually say this, but in this case . . . I can’t see how this got past the referee process. Not the paper being accepted for publication—I can see that, as it’s an interesting topic and they have some solid data—but where did that 50% come from. Maybe there was an explanation for it in the original version of the paper and then it got cut to save space during the review process?

      • I recently peer reviewed an article with some serious issues in one of their figures (not statistics intensive field, but physical science study). I described my concern in the review, and asked for explanation/clarification (perhaps I misunderstand). 4 weeks later, I see the accepted manuscript. This is so discouraging to young reviewers. If I made a mistake, couldn’t they respond? Or if they responded, shouldn’t the editor send me the response? Maybe I should just contact the editor requesting it. Even is an issue is caught by peer review, it has to be enough to convince the editor.

        • It sure seems to me that since you explicitly described your concern and asked for explanation/clarification, the considerate/courteous/intellectually honest thing for them to do would have been to respond to your concern.

          I do remember once reviewing a paper submitted to a statistics education journal. I expressed concern that the proposed course of instruction in the paper neglected the problem of multiple inference. I don’t recall the exact details of the resulting correspondence with the editors, but I do remember getting a response that (as I recall) asked me to modify my position. It was awkward, but they at least gave me the courtesy of responding to my concern.

      • The 50% number appears unambiguously in the NBER working paper on p. 23. https://www.nber.org/papers/w5903.pdf
        I wonder if some people are reading the published paper and some the working paper? By the way, they don’t exactly explain the source of the 50% in a way that I followed, I simply mean that there literally exists a sentence written by the authors that says that there was a 50% increase, so in that limited sense, mystery solved.

  4. I was in the Boston classical music scene in the late ’60s through the 70s (father ran an instrument repair business*, open Baroque jam sessions every Tuesday night), and the number of women in the major orchestras was tiny. Especially in the 50s and 60s. It was an ugly, recognized problem. I remember the idea of screened auditions being talked about, and being claimed to be effective, back in those days. The 60s (a period that lasted from 1967 to spring 1972) saw a lot of changes, and women players began to have a slightly easier time of it by the mid 1970s. Slightly. So my take is that the amazing effectiveness of blind auditions was an urban legend that statistical folks then looked into. And survived just fine despite the science. Maybe. (One of father’s customers was one of the first women to get hired into the BSO, and is still in the orchestra.)

    FWIW.

    By the way, hasn’t this been discussed here before?

    *: https://pbase.com/davidjl/image/121610792
    https://pbase.com/davidjl/image/110304847 (Warning! CAT content!)

  5. NHST…

    How about: women are more concerned about their appearance than men, and blinding reduces this stress, so they perform better.

    Or: women wear more comfortable clothing when the audition is blind (eg, shoes), allowing them to perform better

    I’m sure people who have actually observed these auditions could come up with a long list like that.

      • Yep. Point is that they had a theory that women were being discriminated against, but “tested” the strawman hypothesis that men and women get selected at exactly the same rates. There are many reasons for the latter being wrong other than gender discrimination.

  6. From Table 5. Percentage of women hired from blind auditions: 2.7 percent. Percentage hired from non-blind auditions: 1.7 percent.
    .027/.017 = 59%. From the text: “All success rates are very low for auditions as a whole, but the female success rate is 1.6 times higher
    (increasing from 0.017 to 0.027) for blind than for not-blind auditions.” ln(.027/.017) = .46, rounded to 50% for journalistic convenience.

    • Jonathan:

      No, it can’t be that, as the 50% is tied to “certain preliminary rounds.” So if it’s Table 5, it would have to be the first two categories: Preliminaries without semifinals and Preliminaries with semifinals. And, indeed, (.286-.193)/.193 = 0.48 and (.200-.133)/.133 = 0.50, so that’s 50%.

      But there’s no way you should put that sort of claim in the conclusion of your paper unless you give the standard error. And if you look at the numbers in that table, you’ll see that the standard errors for these differences are large.

      Or, here’s a quick simulation:

      n_sims <- 10000
      a <- rnorm(n_sims, 0.286, 0.043)
      b <- rnorm(n_sims, 0.193, 0.041)
      r <- (a-b)/b
      

      From the point estimates, the ratio r = 0.48. If you do the simulations, you'll get a median of 0.48 (fine) and a 95% interval of [-0.1, 1.7]. So, yeah, the proportion of women advanced is higher in the data (as we can see from the raw numbers), but that "50% number" is noisy as hell.

      I wonder if the authors got faked out by the random coincidence that the ratio is 0.48 for that first set of numbers and 0.50 for the second set. Given the standard errors, this alignment tells us nothing at all, but I could imagine someone flipping through all these numbers, seeing this pattern, and thinking it represented some larger truth. This sort of thing is why we ask people to put standard errors on their estimates in the first place.

      • The figures you cite are pretty clearly what the authors are referring to.

        The paper says:

        “Using the audition data, we find that the screen increases—by 50 percent—the probability that a woman will be advanced from certain preliminary rounds and increases by severalfold the likelihood that a woman will be selected in the final round.”

        Table 5 is the audition data.
        “Preliminary rounds” are the first two blocks of numbers, and “final round” is the block of numbers labeled “Finals”.

        .286/.193 =~ 1.5
        .200/.133 =~ 1.5
        .235/.087 =~ severalfold.

        They got the result they wanted. The “economic significance” was big and quotable. Statistical significance is a party pooper. Yes, “Semifinals” results go the other way, but hey it’s 3 to 1 good results to bad results.

        Am somewhat surprised that this got published in the American Economic Review, but it is morally right, which may have tipped the scales.

        • This was my guess when I originally read the paper, that these claims came from table 5. But it does seem quite silly when it is spelled out more. It’s not even only ignoring the enormous standard errors, it is also ignoring the values for men.

        • Terry:

          Could be what you’re saying about the results being published because they were perceived to be morally right, and if this were a medical or public health journal, I’d say yes, definitely that was the case. For econ, though, it’s a bit different, as they often (a) look for the “politically incorrect,” and (b) favor explanations in which problems are solved by the market. In this case the authors are claiming that the market did not solve the problem, at least not until the screens were added.

          I guess this relates to the two opposing modes of thought of microeconomists.

        • I think you’re right that in an econ journal you can’t be sure they will go with the PC result. IIRC, economic journals have published papers finding diversity is not always good, and that women on boards do not improve firm performance. You don’t see the same monolithic results you see in some other disciplines.

      • Andrew writes:

        “And, indeed, (.286-.193)/.193 = 0.48 and (.200-.133)/.133 = 0.50, so that’s 50%.”

        This procedure of (A-B)/A is reminiscent of the medical world’s term, “relative risk” which always looks more impressive than “absolute risk”, (A-B). Often, a better sense of what is taking place is captured by “NNT,” the number needed to treat which is the inverse of the absolute risk; for the numbers quoted this is roughly 10 or 15. So to speak, blinding the applicant’s gender results in hardly any benefit to women.

  7. The “severalfold,” I think comes from column (8) of Table A3. There we have a linear probability model, and the linear probability coefficients are female: .0004, blind female: -.078, blind: .123, so we have nonblind-female: .0004, blind female: .0004-.078+.123=.0454. Assuming a really low base rate, that will do it.

  8. I can’t speak for what was going on in the social sciences, but
    let’s not exaggerate the state of statistical ignorance in 2000!
    Taking a point estimate with huge standard error seriously is obviously
    absurd to any decent scientist. It would never have happened among the British
    codebreakers in the 1970’s (or even at Bletchley Park in the 40’s (!)) or in the
    quant fund in which I worked in the 1990’s or in physics…

      • Agree, that’s a laughable thing to say. I was taught that in 1st year undergraduate, and it wasn’t a statistics orientated degree.
        Anyway, you’re wasting your time. The result was the one they wanted and I’m sure it served its purpose. What’s more, no amount of debunking by stats nerds will stop any on-message author referencing it in their book.

  9. I wonder what will happen to blind auditions in the future. As demands for gender equity come to the orchestras, won’t hiring become explicitly gender-conscious? Will blind auditions become a relic of an earlier time when the patriarchy enforced oppressive gender-norms using the discredited narrative of “objectivity”? Is this paper already out of date?

    • This sounds like making a mountain out of a molehill. The real breakthrough in opening up opportunities for women in orchestras came when orchestras started auditioning women for positions other than harpist. (Before that, they had been auditioning women for harpist positions simply because there were not enough male harpists to meet the demand for harpists.)

      • +1 This reminds me of the discussion on the IAT on this blog earlier. I don’t know why people want to look for subtle effects when there are pretty obvious big effects that probably explain the difference. There was a lot of outright misogyny that explained disparities in the past, and misogyny can have lasting effects. Just believing your going to be discriminated against even when you aren’t could continue to have a large effect well after the discrimination ends.

        • Steve said,
          “There was a lot of outright misogyny that explained disparities in the past, and misogyny can have lasting effects. Just believing your going to be discriminated against even when you aren’t could continue to have a large effect well after the discrimination ends.”

          Yes! For example, when I was applying for graduate programs in math to start in fall, 1966, the Princeton graduate catalogue said “Admissions are normally limited to adult males,” so I didn’t bother applying there. The Yale graduate catalogue said that they had a new graduate women’s dorm, and didn’t have a statement like Princeton’s, but also didn’t have any statement indicating that women applicants would be. considered in all fields — and I knew a woman who had applied to the Yale graduate math program the previous year, but had not been accepted. So I ended up applying only to the the two graduate programs that had admitted that woman.

    • You could be right that the snark was over the top.

      But there was a real question in there. Blind auditions are from a time when treating people neutrally was widely considered the right thing to do. Now, we are transitioning to a time when the right thing to do is often thought to be to explicitly consider gender. Under the second way of doing things, a blind audition would hinder the stated objective.

      I wasn’t arguing one way or the other as to which is right or which is wrong. I was just wondering whether, taking the change in attitudes as given, blind auditions will be phased out.

  10. It’s hard to imagine good reasons not to blind.
    Tangentially, I respect Goldin’s work on the gender pay gap, though I only know of it from the lay press. Among other conclusions she’s stated that the discrimination has little or nothing to do with the gap.

    • I agree. I would presume that orchestras want the best sounding person – however they want to define best sounding. Gender, age, ethnicity, what you are wearing, how attractive you are, should have no bearing. Having an unblinded audition in this day and age is just asking for trouble.

  11. Pallesen writes: “It is possible that the group of people who apply to blind auditions is not identical to the group of people who apply to non-blind auditions.”

    I think this is the crux of the issue: there are probably large differences between the groups in blind auditions vs non-blind. One obvious difference that even Claudia Goldin has discussed is that more women show up to audition if the auditions are blind, and so of course more women will succeed. But even if you’re just looking at the same women in blind vs non-blind auditions, it’s reasonable to expect women to perform better in blind auditions simply because they believe that they won’t be discriminated against. If women believe they will be discriminated against — so that they won’t the job no matter what they do — then they will probably play with less confidence, and not invest much time practicing before the audition: the increased effort/time of practicing is not worth it since it will have no effect anyway; if, however, they believe that they won’t be discriminated against, then they’ll probably practice more, and play with more confidence, and thus actually play better. So, whether or not any actual discrimination exists, a curtain could improve the success rates of women. I would be interested to have some objective measure of how well a given woman plays with and without a curtain to try to tease this apart.

    This is all to say: this study isn’t an especially good test of whether discrimination exists in the first place. Setting aside the empirical claim of whether women actually did do better with the curtain vs without, the study takes for granted that if they did do better then that would be evidence of discrimination. I think it’s fairly easy to explain positive effects of a curtain which have nothing at all to do with discrimination, and so we can’t really conclude anything one way or the other.

    • You apparently didn’t read the article

      Women, on average, do worse in blind auditions than in non-blind auditions:

      The value for relative female success is the proportion of women that are successful in the audition process minus the proportion of men that are successful. The values for non-blind auditions are positive, meaning a larger proportion of women are successful, whereas the values for blind auditions are negative, meaning a larger proportion of men are successful. So, this table unambiguously shows that men are doing comparatively better in blind auditions than in non-blind auditions. The exact opposite of what is claimed.

  12. This may be slightly off-topic, but I’d always expect lots of noise in a selection process that attempts to distinguish between exceptional candidates. I would not be surprised if some of those rejected candidates had found much bigger success later (a bit like top grad school admissions, NBA drafts, etc.).
    So, this paper was going to be either an exemplary analysis, or a botched job whose conclusion is headline-friendly.

    • Koray:

      Regarding your last sentence: What struck me about that paper is that the data are weak, in part for reasons you discussed in your first paragraph.

      And one problem with the paper is the expectation, among research articles, to present strong conclusions. You can see it right there: the authors make some statements and then immediately qualify them (the results are not statistically significant, they go in both directions, etc.), but then they can’t resist making strong conclusions and presenting mysterious numbers like that “50 percent” thing. And of course the generally positive reception on this paper would just seem to retroactively validate the strategy of taking weak or noisy data and making strong claims.

      • “And one problem with the paper is the expectation, among research articles, to present strong conclusions”

        I have been wondering where all this strong conclusion-drawing, exaggeration, etc. stems from. I always view it as a sign of “bad” science, and a sign that the scientists writing these type of sentences are not being very “good” scientists. This is because i view being careful, critical, etc. is a core quality of being a scientist, and an aspect of science.

        Whenever i read back my 1st published paper (which basically was my thesis for graduating a research masters degree) i can spot some things i wish i would have done differently (e.g. i used a small sample size) but i am very, very glad i “stuck to my gut instinct” and did things that i felt were in line of being a “good” scientist, and “good” science(regardless of what my professor and/or reviewers said). One of which was trying to be careful in the things i wrote, and concluded.

        To illustrate the above, this is what i wrote as a summary of the conclusion of the paper in the abstract:

        “Our results underscore the idea that it might be fruitful to look for explanations of differences in the accuracy of diagnostic judgments in individual differences between psychologists (such as in thinking styles or decision making strategies used), rather than in experience level.”

        In some way, i find the conclusion section of papers weird. Why should i write what can be concluded from something? Isn’t that something everyone could, and/or should, do themselves?

        • “In some way, i find the conclusion section of papers weird. Why should i write what can be concluded from something? Isn’t that something everyone could, and/or should, do themselves?”

          Good point. But I wonder if this has to do with a phrase I have sometime encountered (but never really understood) called “need for closure”.

        • Quote from above: “But I wonder if this has to do with a phrase I have sometime encountered (but never really understood) called “need for closure”.”

          Ooh, not sure about that but i have thought about whether (certain) social scientists might have a high need for closure.

          From the wikipedia page: https://en.wikipedia.org/wiki/Closure_(psychology)

          “The need for closure is the motivation to find an answer to an ambiguous situation. This motivation is enhanced by the perceived benefits of obtaining closure, such as the increased ability to predict the world and a stronger basis for action.”

          I can totally see how (some) social scientists have some weird combination of a desire to “improve the world” combined with some “need for closure” to all the questions about, and problems, in the world. This might lead to an over-emphasis on the importance of science (or “science”), data, etc. I think there was a recent post about “six signs of scientism” on this blog which could relate to this.

          The wikipedia page mentions a “need for closure” scale. Perhaps it could be interesting to measure “need for closure” in scientists from different fields (e.g. social psychology, biochemistry, etc.). I would guess there could be differences between the types of scientists with regard to “need for closure”.

        • Thanks for the link and discussion. It helps me understand (or at least believe I understand) the concept of “closure”. It does sound like people with a “high need for closure” would be resistant to one of the themes that runs through this blog, namely, that uncertainty and variability are just part of the real world that we have to live with, respect, and take into account. I would guess that people who really like the idea of firm decision rules (as contrasted with decision rules formulated by taking into account multiple aspects of a particular type of situation) are expressing a high need for closure (or at least a high need for certainty).

          I guess maybe I don’t have a high need for closure, since to me uncertainty and variability are just part of reality — and I find it frustrating when someone demands certainty when uncertainty seems more realistic (to me, at least). But I believe in trying to respect others’ otherness, so I try to be tolerant (but I don’t always succeed in being tolerant) of people who tend to stuff things into boxes of certainty.

  13. My feeling when I first heard of this was: This is the fair way to do it. Whether it leads to more female hires is irrelevant. One might just as well ask if it results in more people from states whose names begin with an “A”.

  14. Women are discouraged at college level, in my experience. I was also taught that I had to “play [instrument] like a man”. And I have had people tell me that I do. Also, instances where players in orchestra said they would not play with a woman. Then. They screen out resumes with women’s names perhaps. Then, through all this, women may not be provided opportunities as to professional experience as men to move along in the profession. They have minority orchestral internships, not seen any for women playing instruments men prevail. Then, perhaps times change, now MAYBE more chances. As a man, would be considered “mature” player. As a woman “old”. How much is truly being in right place at right time?

  15. “2000, after all, the stone age in our understanding of statistical errors.” Huh? In my first stat class in 1971 we were taught to not draw conclusions from statistically insignificant results.

  16. From BS King:

    > blind auditions appear to account for about 25% of the increase in women in major orchestras. . . . [But] One of the more interesting findings of the study that I have not often seen reported: overall, women did worse in the blinded auditions.

    How can these both be true? King does try to explain it:

    > This makes a certain amount of sense. If you sense you are a borderline candidate, but also think there may be some bias against you, you would be more likely to put your time in to an audition where you knew the bias factor would be taken out.

    But this doesn’t seem like a sufficient explanation to me. I think this would only create the observed effect if borderline-candidate women are unusually likely to choose blind auditions (as compared to borderline-candidate men + strong-candidate women + weak-candidate women), and I don’t see why this would be true. If women expect bias in auditions, wouldn’t they universally prefer blind auditions? Wouldn’t some men expect bias against themselves, and thus also prefer blind auditions?

    • > this would only create the observed effect if borderline-candidate women are unusually likely to choose blind auditions (as compared to borderline-candidate men + strong-candidate women + weak-candidate women), and I don’t see why this would be true.

      Unfortunately this could easily be the case if borderline-candidate women are more likely to blame bias for their being borderline-candidate.

      This is anecdotal, but I see it in both men and women: Those doing less well in a particular system are more likely to blame aspects of a system than themselves for their failure. This is especially the case if certain biases are offered up as possible reasons.

      • Have Goldin and Rouse responded to the criticism of their 2000 publication? If so, where does it appear? I have not been able to find a response — in particular — to the 2019 WSJ article arguing that the Goldin-Rouse conclusions overstate what their own data show.

        “Blind Spots in the ‘Blind Audition’ Study”
        Christina Hoff Sommers.  Wall Street Journal, 21 Oct 2019: A.19.

Leave a Reply to Steve (a different one) Cancel reply

Your email address will not be published. Required fields are marked *