Skip to content

The 200-year-old mentor

Carl Reiner died just this year and Mel Brooks is, amazingly, still alive. But in any case their torch will be carried forward, as long as there are social scientists who are not in full control of their data.

The background is the much-discussed paper, “The association between early career informal mentorship in academic collaborations and junior author performance.”

Dan Weeks decided to look into the data from this study. He reports:

I [Weeks] think there are a number of problematic aspects with the data used in this paper.

See Section 13 ‘Summary’ of

How can one have a set of mentors with an average age > 200? How can one have 91 mentors?

Always always graph your data!

Now whenever people discuss mentoring, I’m gonna hear that scratchy Mel Brooks voice in the back of my head.


  1. The big problem with this paper (and many like it) is not the speculation, it’s the problem of measurement: the thing the authors want to measure is not the thing they can measure. As I noted in an earlier comment, their measure of “quality” is actually “number of citations” — correlated, perhaps, but definitely not the same thing.

    The same problem holds for quantifying “number of mentors.” How, one wonders, can they get a massive dataset of who mentored whom? The answer is that they don’t. They count co-authors, and equate co-authorship by a senior person as mentorship. This is, as anyone who does science knows, nonsense. Almost certainly, one co-authors papers with one’s mentors. One also co-authors papers with collaborators who are certainly not mentors in any meaningful sense. How does one have 91 mentors? By the authors’ measure, this is easy — write a handful of 20-author papers (not uncommon), and you’ll easily amass “mentors.”

    The “science of science” is a trendy field, for good reason. From the edges, though, it seems plagued by this problem: the things about which one can collect a lot of data are not the things one actually cares about. It’s tempting — too tempting, apparently — to pretend the former equal the latter, maybe tossing in a few words of caution, but still acting as if one’s conclusions are much, much more meaningful than they are.

    Of course, I could write the same thing about the cottage industry of election forecasting based on polling “data…”

    • Mike says:

      +1 from me too. Unfortunately, the twitter outrage storm is not focused on these important and valid critiques of the quality or interpretation of the measurements in the article. The outrage is focused on the direction of the result (worse outcomes with more female mentors). If the outcome had been better in a different direction (better outcomes with more female mentors), one suspects that much of the twitter conversation would be congratulatory. And the measurement problems you highlighted would still be mostly ignored or misunderstood.

      • jim says:

        A “Twitter Outrage Storm”. OMFG. Can you get federal insurance in case that damages your career or reputation, just like you would with a real storm for your house?

        Andrew!!! It’s the new forecasting frontier! Twitter Weather! You could develop a whole suite of attributes to measure the probability of a Twitter Outrage Storm on any of thousands of topics! Dude! Call your company “GMAN”!! IPO that RFN!! Back door into the S&P500 with SPAC, you don’t even need a prospectus, much less an employee or a bank account. Just go on a TED talk!

    • elin says:

      Even the gender measure doesn’t measure gender, it’s based on first names which anyone who has looked even a just the US names corpus knows is highly problematic and should not be used to attribute gender at the individual level. See some of the sources cited here .

      Reading the reviews I can’t believe that it even got and R&R.

    • I’m curious to see what Nature Comm decides to do when their review is complete, since this paper is an interesting case study on how divergent people’s views on what should be done about misleadng intepretations can be. The Science piece you shared cites this conclusion as the problematic part: “Our gender-related findings suggest that current diversity policies promoting female-female mentorships, as well-intended as they may be, could hinder the careers of women who remain in academia in unexpected ways,” and “Female scientists, in fact, may benefit from opposite-gender mentorships in terms of their publication potential and impact throughout their post-mentorship careers.” But since the authors use terms like ‘could’ and ‘may’, these sentences would seem to be making relatively weak statements about these interpretations being compatible with what they find. If they don’t find any big errors in the analysis, will they still leave a warning on it because its not a balanced discussion (ie they don’t acknowledge all the related work on citations and bias and gender that could provide a very different interpretation on what they find)? Or because its deemed too potentially harmful? And then how does this kind of thing affect incentives to report data that believed to go against widely-held values.

      • jim says:

        Maybe I’m missing something – I haven’t read the paper – but for all I can tell the results of this paper are consistent with male bias against women. In other words, the female mentors are less successful because there is a male bias against them.

        Is there anything in this paper to refute that?

        • No, there is nothing that would clearly refute that, so that explanation would be consistent with what they find. What I was commenting on was more that it seems difficult to declare them obviously right or wrong based on the results they present – there would have to be further analysis done to show that the other explanation is a lot more plausible, or major errors found in their analysis (which there may be if they allow for things like average mentor ages of +200, it’s just not clear yet what exactly would change in their results). Technically they are implying some uncertainty in their causal interpretations with terms like ‘may’ and ‘could’, they are just not mentioning what many people (including me) think are more plausible explanations based on prior work. So I find it interesting in that their errors are similar to the ways lots of published research suffers from overlooking important prior work or measurement issues, but with a much higher level of scrutiny due to what they are saying.

          • Elin says:

            It’s the jump from finding an association that basically is in line with what we know about gender inequality in academia to speculating about specific but unnamed formal mentorship programs that for all we know are very different from what happens in the informal o-authorship settings.

Leave a Reply