More on that claim that scientific citations are worth $100,000 each

Earlier today we discussed a stunning claim by scholar and Ted talk performer Albert-Laszlo Barabasi:

It’s possible to put actual monetary value on each citation a paper receives. We can, in other words calculate exactly how much a single citation is worth. . . . in the United States each citation is worth a whopping $100,000.

This number is ridiculous on many levels. As one commenter pointed out, there’s no particular reason to use citations here; you could just as well value papers per page, or per formula, or per word.

Also I have an existing suspicion of anything Barabasi writes, based on what I’ve read by him before, for example his claim that Einstein received only 832 letters in all of 1953. I guess maybe, I dunno, but wasn’t everyone writing letters back then? I want to see the evidence for this claim.

Anyway, I was curious to see more details on where this $100,000 number came from. I don’t have a copy of the above-discussed book, so I did some googling. First thing I found were a bunch of extremely positive reviews! But the reviews only discussed the book’s general messages, including the principle, “Performance drives success, but when performance can’t be measured, networks drive success.” At first I was going to say that describes the author’s career! But that wouldn’t be fair at all. I’d attribute his success not to connections or logrolling but rather to his ability to come up with big ideas, along with his willingness to act as if his claims were supported by evidence, when they’re not. The combination of originality, ambition, and lack of scruple can take you far in social science. Not so much in the physical sciences, I’d think: in the physical sciences, performance can be measured. Anyway, it’s striking how universally positive the reviews are, from so many different sources. People looove his message and his delivery, and nobody seemed really to be interested the details.

I went on Amazon, did a Search Inside This Book for “citations,” and found that the author seems to be completely serious about that $100,000 per citation thing. For example, he writes that a certain physics paper that received 14,000 citations “has had a scientific impact worth an extraordinary $1.4 billion!” [Exclamation mark in the original.] He really is claiming that the formula (sorry, The Formula) applies to individual cases.

Remaining curious where the number came from, I scrolled down to the Notes section of the book and found this:

The estimate on the cost of a citation came from Esteban Moro. According to his unpublished findings, the value of a citation in the United States is slightly above $100,000—so a scientist like Weinberg who, according to Google Scholar, has a paper with 14,000 citations has an impact on science equivalent to roughly $1.4 billion.

This is slightly different because now he’s only giving Weinberg credit for a single one of his papers. He should also get credited for $700 million for his paper that was cited 7000 times, $900 million for his paper that was cited 9000 times, etc.

But let’s continue with the trail. Barabasi’s book was published a few years ago, so maybe the paper by Moro has appeared by now. We’re already at Google Scholar so let’s stay there . . . Moro has published several papers, but I don’t see anything whose title seems to match the above description.

So time to go back to our favorite research tool . . . Google! Searching on *”esteban moro” citations*, I don’t find any published papers or preprints on the dollar value of citations. I found a couple books that Moro co-edited on complex networks that had some papers on citations, but nothing by Moro himself. But it could be that there is such a paper and I couldn’t find it by googling. That happens sometimes. All I could find was this:

But this doesn’t seem right—first, it’s just a twitter post, not quite what we’d usually label as an “unpublished finding”; second, the number is $461k, not $100k; third . . . where exactly does this number come from? What’s the numerator and denominator?

OK, I can try to find out! Google *total r&d spending us* and you get this from NSF:

What about the denominator? Googling *total citations us* gave some interesting links but I couldn’t find any totals. Trying *total scientific citations* didn’t work either. I kept searching but I couldn’t find any aggregates. It was all samples of some sort or another. But then I searched on *what percent of scientific articles were published in the u.s.* and found this post on a market research site stating that 422,808 science and engineering articles were published in the U.S. in 2018.

Perfect! Both our data sources are for 2018. We can now divide: $606,000,000,000 / 422,000 = $1,400,000 per paper. So now we just need to know the number of citations per paper and we’re done. If the average paper gets 3 citations, then divide by 3 and you’ll get Moro’s number. Maybe the trick to getting Barabasi’s $100,000 number is to use research spending rather than R & D spending.

So I guess the real story is that if you start with 606 billion dollars of “value,” then any small piece of it will still be a lot. So maybe the appropriate thing to do, if you want to go this route, is to divide by lots more people. For example, take that $1.4 billion paper and credit it not just to Steven Weinberg, but to everyone who helped write it, including all his high school and college math and physics teachers, all his study partners over the years, everyone else he talked physics with, his parents, siblings (if any), wife, the grocer, the milkman, the janitor in the physics department, the people who manufactured the chalk, etc etc etc. This sounds kinda silly but otherwise I think you’re committing some sort of fallacy of counting or aggregation. Really the calculation is impossible but if you’re gonna do it, you should be more careful. If you throw everything in the numerator and nothing in the denominator you’ll get these big numbers.

To flip it around, you could say that our society is paying $100,000 per citation but that’s not really correct because citations aren’t the the only product, or even the main product, of R&D spending. You might as well divide total pharmaceutical spending in the U.S. by the number of syringes and use that to say we’re paying $X per syringe. Or take the total money being spent on baseball and divide by the number of Cracker Jacks being sold and say we’re paying that much per box of Cracker Jacks.

Just to be clear: there’s nothing inherently wrong with dividing total research spending by total number of citations—it’s just a number—the problem is in the interpretation, and I don’t see any interpretation problems in Moro’s above-linked twitter post. And it’s not Moro’s fault that Barabasi took his back-of-the-envelope calculation, described it as “unpublished findings,” and gave it a ridiculous interpretation in his book. That’s on the author of the book, not the person who did the calculation.

P.S. I realize there’s another counting problem, which is that citations build up over many years. Take 600 billion dollars of spending and divide by 6 million citations and you get $100,000. But there were a lot more than 6 million citations last year. Barabasi had 18,000 all by himself! Obviously this one guy didn’t have 0.3% of all the country’s citations last year. My Columbia colleague Dave Blei had 12,000, the Why We Sleep guy had another 3000, Brian Wansink had another 3000, Marc Hauser has 3500, . . . heck, Steven Weinberg had a few too! There’s a lot of researchers out there, and when we add them up we’ll get a number that’s a lot more than a few million. So I think we’re seeing a major problem with the counting, on top of everything else.

24 thoughts on “More on that claim that scientific citations are worth $100,000 each

  1. I skimmed through a bunch of reviews too and noticed the same pattern. Reviewers often seemed predisposed to endorse a book that espoused “networkism”. I’ve increasingly had the experience of reading forgettable books on the basis of positive reviews that, in retrospect, were motivated by some sort of ideology or tribal attachment.

    (Don’t ask me which books. I’ve already forgotten.)

    • Peter:

      There’s also some fear of looking foolish. If you follow the second link above, you’ll see that when asked to review one of Barabasi’s books, I gave it a largely positive view. In part this is because I do think he has some interesting ideas, but it’s mainly because . . . if you give something a positive review, nobody will mind, but if you give a negative review, you have to be really careful that you didn’t make any mistakes. So there’s a tendency to give a positive review just in case.

  2. It seems hard to make an intuitive estimate of the number of citations each paper will receive, since the numbers vary wildly. How about the number of citations each paper makes? That’s a much more well-behaved statistic. If each paper makes an average of 14 citations, we’re magically at Barabasi’s number.

    Although the interpretation is different. That’s how much it costs to make a citation, not how much a citation is worth. Clearly we should all be citing each other less; we’ll be able to get more research with the same amount of funding.

    • “That’s how much it costs to make a citation, not how much a citation is worth.”

      This is an enormously important point. Barabasi is using the inputs of the process as a measure of value (harkening back to Smith’s and Marx’s labor theory of value), instead of seeing value is a function the output (which, let me emphasize (though many academics might disagree), is not citations).

  3. >What about the denominator? Googling *total citations us* gave some interesting links but I couldn’t find any totals. Trying *total scientific citations* didn’t work either. I kept searching but I couldn’t find any aggregates. It was all samples of some sort or another. But then I searched on *what percent of scientific articles were published in the u.s.* and found this post on a market research site stating that 422,808 science and engineering articles were published in the U.S. in 2018.

    >[…]

    >I realize there’s another counting problem, which is that citations build up over many years

    Wouldn’t it be easier to find the average number of citations each paper makes, rather than the average number of citations each paper receives? The two should be equivalent, and the number of citations made per paper is easier to measure. (It’s fixed over time, it can be found by looking at the paper itself, and it has fewer outliers.)

    • This seems like a reasonable approach to figuring out the average number of citations that a paper will get (which must be equal to the average number of references that a paper cites), although could vary a bit by field. afaik most math papers only cite three or four other papers, while immunology might cite 60 or more. Still, an assumption of around 40 seems reasonable

      • Those two distributions are totally different so I doubt you can substitute or approximate one with the other. The distribution of citations received is extremely skewed with a spike at zero. The distribution of citations given has no zeroes and much less variable.

  4. Note as well that the R&D in the numerator comes mostly from industry where the D part dominates and the R is not really about publishing your discoveries (though it happens to some extent in some industries, in particular in healthcare).

    • Great point! Since D dominates the value creation, the real question is how much of the R actually matters, i.e. has meaningful impact on D? I would suspect very little.

  5. Would it be any better to just take total grant dollars received for a given academic and divide by the total number of citations since the grants started? I suppose you could also keep accumulating results as the career goes by, the answer is the cost per citation, and now you can compare the efficiency of different academics.

  6. “Also I have an existing suspicion of anything Barabasi writes, based on what I’ve read by him before…”

    In The Formula, he writes about the “fact that women earn roughly seventy cents to their male counterparts’ dollar” in the United States. This doesn’t even have support in the most ham-fisted unconditional research (typically that number is closer to 78 cents). And if you do put in at least a tiny bit of effort you can find research by the likes of Claudia Goldin placing the number in the high 90 cents after conditioning properly — you know, when comparing actual counterparts. Somehow no one bothered to take him to task for this.

    Maybe his typical reader thought it had the ring of Truthiness to it and so they didn’t bother with any of that tedious healthy skepticism.

  7. Regarding The Formula applying to individual cases (and this is of course tongue-in-cheek): at least all these never-cited papers are literally worth nothing…and next time I self-cite I better think about the costs to our nations research budget…

  8. Interesting estimates and calculations! If a paper can really “cost” 1.3 M $ on average, then that means that some researchers may get more bang for the buck (e.g., because their papers gobble up only $500,000 per paper on average) whereas others are much less cost-conscious (e.g., as much as $5 M per paper on average). To some extent this may be an issue of scintific subdisciplines. Doing research on a linear accelerator is bound to be more costly than a survey study. No surprises here.

    But WITHIN a given discipline one could derive and apply an efficiency metric (e, akin to the Hirsch index h) to evaluate how frugal a researcher was, relative to others in her field, in acquiring and spending money. Thus, if e is just total cumulative funding / total cumulative publications, then lower resulting numbers (e.g., $30,000 per paper) are better than higher numbers (e.g., $300,000 per paper). One could then derive a further index (let’s call this one e2) by dividing this number by the total number of citations a researcher has accumulated. Then a synopsis of h and e2 would be much more informative than either index alone. A person with a high h and a particularly low e2 would be someone who has achieved a lot of impact without squandering resources. A person with a high h and a high e2 would be just as impactful, but not as much into “lean production” as the former. Low h and high e2 = a hoarder who is unable to transform all the funding into research that has impact. And so on.

    All of this with the caveat that of course indices are only approximations and may miss important points. Just think Peter Higgs here, who has a low h (because he never published much) and an even lower e2 (because he never needed much funding to do theoretical physics).

    The reason why I have been thinking about this issue is that in Germany universities just loooove to hire and retain people who rake in tons of funding money, often regardless of whether such funds are transformed into relevant and impactful research. They behave a bit like factory owners who buy all kinds of fancy, expensive machinery, but without regard for whether this results in sufficient supply of a needed product. An index like e2 could help to rein in this evidence-of-funding mania in hiring, tenure, and retention decisions.

  9. >But WITHIN a given discipline one could derive and apply an efficiency metric

    Not really. Different papers or rather different writing styles have different amounts or different densities of content. You could split your findings over many papers and so on. The number of citations is even more worthless as it indicates popularity (within the field or with a couple other authors) and not quality. The more citations you got the easier for others to find the paper and the more citations the paper will get in the future.

    Also, everybody seems to think that they are judged by others by the number of citations of their work, even though they don’t judge individual works by others by their number of citations. This is rather weird.

    >The reason why I have been thinking about this issue is that in Germany universities just loooove to hire and retain people who rake in tons of funding money

    Since it’s also easier to get funds if you have gotten funds in the past.

  10. “To flip it around, you could say that our society is paying $100,000 per citation…”

    Hey, neat, that’s what I said yesterday. (“I find it useful to flip the frame around… it is also consistent with the cost of a citation being $100,000.”)

  11. Very few science papers cost anywhere close to $1.4M. In my physical chem lab, which is on the expensive side, I probably average ~$60k/paper. The ridiculous cost/paper and cost/citation numbers seem to result from the $600B of total R&D funding including at lot of applied/development work that does not lead to papers or citations. If you take the DoD system (the biggest funder) where 6.1 = basic research, 6.2 = applied research, 6.3 = advanced development, the spending increases roughly an order of magnitude from 6.1 to 6.2 and again from 6.2 to 6.3. The fraction of basic research generating papers is higher at other agencies, but for all but NSF applied research and development account for >half the funding. https://en.wikipedia.org/wiki/Science_policy_of_the_United_States

  12. When I started work at Lawrence Berkeley National Laboratory, almost thirty years ago, there was a lot of emphasis on papers published in refereed journals. Publishing “white papers” — official Lab publications that were internally reviewed — was OK but when annual review came around it was really only journal articles that counted for anything. A common expectation was that you’d be an author of about three papers, and principal author of one of them. My first few years, I was funded by grants that explicitly had the goal and/or requirement of developing and publishing new results or methods, and that’s what the people who evaluated me at LBNL were also interested in, so that was fine.

    But post-2001 I was working on stuff that had potential relevance to terrorist attacks — how quickly do anthrax spores get tracked around a building, how helpful is it to have good vs bad air filters, that sort of thing. (I had gotten started in this general area a few years earlier, when I was working on ways of measuring air circulation and mixing in buildings, and measuring the spatial distribution of indoor pollutants, without a terrorism-specific motivation, but after 9/11 and the subsequent anthrax attacks the focus of the funders shifted to be more terrorism-specific). The funders of the new work cared a lot more about developing specific capabilities for real-world implementation than they did about scientific publication. So, for example, a grad student worked with me to develop a reasonable model for predicting how much of a toxic outdoor plume would eventually penetrate into buildings (as a function of building size and air exchange rate and so on), and this was eventually integrated into a software tool used by the National Atmospheric Release Advisory Center, which makes forecast maps of risks from things like industrial accidents and fires. We were allowed to publish papers based on this sort of work but the funders didn’t really care if we did or not…except that it takes time and effort to write a paper, and that’s time and effort that we could spend working on more of the stuff they did care about, so there was a feeling that we had to cheat a little bit in order to find the time to write the papers.

    And about ten years after that I was working in a different area — energy efficiency in buildings, specifically how to use time-series data on electricity usage to quantify various aspects of building performance and to detect faults or energy-saving opportunities — and in a few cases we were specifically told _not_ to write any papers, because the funders didn’t want us “wasting our time” on that stuff. Some of these funders wanted a piece of software, or a website with specific functionality, or whatever, and they saw scientific publications as a waste of time. Arguably, in that context they may have been right.

    And — a more general issue that cuts across many disciplines — it takes a lot of time and effort to write a paper and get it reviewed, and then when it comes out it’s usually only available to the small number of people who have a subscription to the journal…it might be better for everyone if people wrote up their discoveries on blogs, or just wrote their internally-reviewed white papers (which, at least at LBNL, are freely available online) and skipped peer-reviewed publications for almost everything. I do think peer review can be valuable, and, all else equal, I’m more likely to ‘believe’ a paper if it has been reviewed than if it hasn’t, but (as amply documented on this blog) lots and lots of crap makes it past peer review.

    All of which is a long-winded way of agreeing with what is obvious to everyone (except maybe Barabasi) right from the start: the idea that the only metric of success for government-funded R&D is the number of citations that are generated is ridiculous.

Leave a Reply

Your email address will not be published. Required fields are marked *