How do you measure success in an academic field such as astronomy?

David Hogg writes:

This paper is blowing up the astronomy internets for reasons that are perhaps adjacent to statistics methodology but highly relevant to statistics. The goal of the paper is to develop citation metrics that could be used to aid in hiring, but the methodology involved having famous old astronomers rank the career successes of a large collection of other astronomers of a range of ages. That procedure has very rightly pissed off the community, and seems ethically questionable (and maybe compounded by the fact that the famous oldsters are listed by name in Table 1; it’s transparent, at least!). As usual, the paper over-interprets its findings as being about predicting success, when in fact, if the results are about anything, they are about predicting the opinions of famous old people. But your take would be interesting if you’d be willing to go that far inside baseball. If you want links to some of the Twitter storm (which ranges from angry to hilarious to apologetic), I can give links.

I replied that this all seems kinda circular, no? In medical research, you could measure progress in terms of lives saved or made more comfortable. In business-related fields, you could measure progress in terms of commercial success. In social psychology, you could measure progress in terms of number of NPR, Ted, Gladwell, and Nudge appearances, minus the number of appearances in Retraction Watch. But in astronomy, the measure of “future impact of astronomy research careers” is what, exactly? Number of new stars discovered?

I’m not saying this as a diss on astronomy. Pure science is wonderful. But this sort of analysis, just seems kinda circular, no? It’s like the Billboard Top 100 except that nobody outside the field gives a damn. Or am I missing something? And, no, please don’t sent me any twitter links!

Hogg responded:

Yeah, I told you that it was Inside Baseball! But yes and presumably the pundit rankings are based on the very publications that were used in the metrics. I read some of the Twitter threads and they point out typical bad stats methods issues. But I think your point below is the main point, which has to do with what constitutes merit. (Plus a little bit of creepiness about old dudes ranking younger scientists and calling it merit.) I wouldn’t say you are missing anything!

This all reminds me of other stuff I hate, like people arguing about what are the top 10 departments, or gossip about who’s gonna get what job or whatever, that whole attitude in which academia is not about research or teaching but about the building of careers.

19 thoughts on “How do you measure success in an academic field such as astronomy?

  1. Aside from being useless in any meaningful sense, the clear issue here is how this kind of obsession with measuring ‘impact’ is an excellent tool for further marginalizing people of color, women, and others who are systematically underrepresented in the top ranks of these fields.

    • “an excellent tool for further marginalizing people”

      That struck me as well. From the paper:

      “In choosing people to invite, I emphasized senior people who have had extensive histories of leadership and experience in judging science across discipline boundaries.”

      This is an exact echo of the language used in industry to justify the formation of a “red team.” Red teams are formed when the old guard gets uneasy and feels disenfranchised by what the new guard is doing. The purpose is to dial back the clock.

      Also from the paper:

      “I emphasize that the goal of reference (1) and of this paper is to estimate impact accrued, not impact deserved. Historically, some people who made major contributions were, at the time, undervalued by the astronomical community.”

      If you can name the beast, it can’t kill you!

  2. “In social psychology, you could measure progress in terms of number of NPR, Ted, Gladwell, and Nudge appearances, minus the number of appearances in Retraction Watch.”

    Gold

    • Other fields are just as guilty of overselling research in NPR/TED appearances, as well as ending up in Retraction Watch. It’s not a social psychology problem – the root causes are much more systemic.

  3. A paper like this would have been understandable (though still pointless) 15 years ago, when metrics like the h-Index were new and fashionable. Now, however, does anyone actually think these serve any purpose other than letting administrators or administrator wanna-be’s substitute a shallow pseudo-rigor for actual subject-matter understanding when making planning decisions? (They, in my experience, being the only ones who use these “metrics.”)

    Impact is certainly real, but it can only be assessed by actually assessing someone’s work, ideally decades after it’s done.

  4. An interesting update: https://chandra.as.utexas.edu/apology.html
    “The PNAS paper and arXiv preprint have been withdrawn as thoroughly as the publication system allows.”

    I’m not sure what I think of this. On the one hand, I dislike the approach of this (and similar) papers — see above. On the other hand, it’s disappointing that the author is *that* unsure of their own work. (I try to think of all the ways my papers are awful *before* submitting them…)

  5. Re: Parthasarathy’s update: I think it would be a wasted opportunity if the paper ends up just getting buried. I’m sure it’s embarrassing to the author, but better by far would be to append formal responses from marginalized astronomers, with a rejoinder that’s supportive of those criticisms. The data in the article are also a unique opportunity for describing bias at the highest levels of a field. It’s actually a hugely valuable thing to predict systemic bias, and when else are you going to get a bunch of “famous old people” to volunteer their biases?

  6. There seem to be three separate issues:

    1) Is it possible to come up with a reasonable “measure success in an academic field?” Don’t we all agree that the answer is Yes? If not, then the tenure committees at Ivy League schools are wasting a lot of time.

    2) Assuming 1), can we predict in 2021, who will be highly ranked by that measure in 2031 to 2036? This is, obviously, an empirical question, but I would bet that the answer is Yes, to at least some extent. Again, does anyone at this blog disagree?

    3) Assuming 1) and 2) are possible, the main remaining issue is: Does this paper do a good? I don’t know! I have not read the paper. But asking experienced members of field X evaluate other members in field X is fairly common, both outside and inside academia. Right? Again, isn’t that (partially) how tenure committees work?

    • I had a much longer comment written up, but I will just say that the tenure review process as I have witnessed it in a couple of social science fields is absolutely not measuring success in an academic field, except in a weird circular way where you’re a success because you hit the paper/grant/etc metrics you needed to hit to be considered successful by your department and/or peers. Whether the work is good is measured by proxy, if at all.

    • Isn’t part of the problem specifically that the measure of success is based on the preferences of a set of people who almost by definition are ‘insiders’ – old, well-connected, senior academics. Then as a result success in the field becomes circular – the things that are judged as successful are the things that successful people like. Even if one is unconcerned about the potential impact on members of marginalised groups, such a system would presumably disadvantage those who are unlike those senior people in various ways, perhaps not working on the kinds of problems that they favour, or using approaches that are in some sense novel or unorthodox.

      To put it another way, I wonder if this kind of system would be less objectionable if the people judging what constitutes ‘good’ research were a more balanced slice of academic astronomy e.g. rankings of papers were crowdsourced, or derived from a jury or panel that represents a wide range of career stage/ institutional prestige/ academic subspecialty ect.

  7. D:

    As explained in my post above, it seems to me any definition of academic success in astronomy will be more circular than definitions of academic success in more applied fields. I guess the same could be said of pure math: success in pure math is the ability to solve problems that pure mathematicians consider to be important. I guess one way to do this would be to list the problems. In astronomy perhaps there is some implicit list as well. Also, some astronomy is applied in various ways. I don’t know enough about astronomy to go further. I can see why people within the field would be interested in predicting success in 2036, as measured by citation counts, but as an outsider I don’t really care. I’d be more interested in hearing what are the research areas that are considered important, and how these have changed over the years. But, yeah, sure, just because this paper reminds me of stuff I hate, that doesn’t mean it’s useless; it just means that it’s not the kind of thing that interests me.

    • I’m pretty sure I couldn’t predict who’s going to be part of a scientific breakthrough (or lesser advance); perhaps I might have an opinion of who is capable of one.

  8. > any definition of academic success in astronomy will be more circular than definitions of academic success in more applied fields

    I am confused by this. First, do you mean “applied [academic] fields?” If so, I think that the definitions of success in astronomy are very similar to those in more applied fields, like say statistics and political science.

    > success in pure math is the ability to solve problems that pure mathematicians consider to be important.

    Exactly right. And success in (academic) statistics or (academic) political science is exactly the same. And just so, I assume, in most every other academic field. You get tenure at Columbia by writing articles/books which other academics consider to be good and/or influential.

    > as an outsider I don’t really care.

    Me either! Except . . . the reaction to the paper . . . and the comment from Hogg about “ethically questionable” and the show-trial nature of the author’s apology . . . all these things reek of the influence of the woke . . .

    That is, the main complaint is not so much that this paper is a bad attempt to forecast success in astronomy — although it might be that as well! — the main complaint is the very attempt to do so, the very claim that some astronomers are better (at academic astronomy) than other astronomers, is “ethically questionable.” And that way lies madness, and the destruction of all standards . . .

    • D:

      As I wrote above, in medical research, you could measure progress in terms of lives saved or made more comfortable. In business-related fields, you could measure progress in terms of commercial success. In statistics you could measure progress by the use of a method in applied fields. I’m not saying that it’s easy to construct such measures, just that in these applied fields there are some external measures of success, corresponding in some sense to wins in sports. In most of astronomy, I don’t see any clear external measures, only internal measures such as citations within astronomy, peer ratings, or success in solving problems as defined by the field itself.

      I agree that the author’s apology seems a bit over the top, at least to an outsider such as myself who does not know any of the context here.

      • You could use doing good science.

        I.e., making otherwise surprising predictions that turn out accurate along with figuring out methods that generate reliable/reproducible data.

      • Andrew,

        You may be lumping all sub-specialties of astronomy in one basket but I could think of a few cases where progress could be measured, if we define progress as practical use to humanity.
        For example, by mapping more and more celestial objects, navigation of satellites that use those objects as reference points can be improved. In turn we can achieve more precision when measuring climate, earth wobble, etc.
        Precision has increased over time and we have better telescopes and such due to astronomy.
        I’m not sure if that’s what you meant by measuring progress though.

  9. Why is old people judging the merit of young people creepy? That form of merit evaluation has been central to pretty much every civilization ever. The cool thing is that nearly all of us will get old, so we get to be on both sides of it. And the older version of you knows more than the younger version of you. So honestly it seems pretty fair to me.

    • The responses here are to various aspects of all this, and aren’t all consistent. From my perspective, the problem with this study is not the old-people-evaluating part — that’s fine and “expertise” is a real and valuable thing! The problem is in the metrics — their aims, use, and limitations.

Leave a Reply

Your email address will not be published. Required fields are marked *