Skip to content

Post-publication peer review: who’s qualified?

Gabriel Power writes:

I don’t recall that you addressed this point in your posts on post-publication peer review [for example, here and here — ed.]. Who would be allowed to post reviews of a paper? Anyone? Only researchers? Only experts?

Science is not a democracy. A study is not valid because a majority of people think it is. One correct assessment should trump ten incorrect assessments.

In other words, how do we qualify the tradeoff between “fewer, deeper reviews” and “many, shallower reviews”? When does the former work best? The latter?

I think everyone should be allowed to post reviews. But this does raise the question of what to do if trolls start to invade. I guess we can address this problem when we get that far. Right now, the problem seems to be not enough reviews, not too many.


  1. Andrew, wouldn’t you have some time to look at the recent paper ? It has generated much media and public attention, but the absolutely crucial statistical technique in the methodology section (how the authors define “cluster as meaningful and non-spurious”) is without references to statistical literature. It seems to me that the heuristic could identify “clusters” also in the situations without any cluster structure, e.g., for some unimodal multivariate data densities.

    • Andrew says:


      You can divide the population of humans into 1 cluster, or 7 billion, or any number in between! See for example here.

      • Sure, but this is the opposite of how it is presented in that extremely popular paper in Nature Human Behavior. They identified exactly 4 “clusters” by a very suspicious technique of their own invention in a data set that looks featureless (except for some minor ripples in the almost perfectly multivariate normal density), and hundreds of newspapers take it as a fact. Why are real statisticians (the authors of the paper are not statisticians) so silent?

        OK, I know that you must be extremely busy; thank you very much for the answer above. I appreciate what you are doing.

        • Andrew says:


          I’ve not looked at this particular paper in detail. In general, I’m suspicious of statistical analyses that report that there are D dimensions of some attitude or behavior, or that there are K kinds of people out there. I usually think of the number of dimensions or the number of clusters as a descriptive choice: for example, if you want to divide the population into 4 clusters, here is a useful division. Elsewhere in the paper they discuss finding 13 clusters, but I think that with enough data they should be able to find hundreds of clusters. It all depends on what you’re looking for.

          It’s not so much that I find their specific technique suspicious (although I don’t see a good reason for them to use the so-called BIC), but rather that any such approach has to be seen as a form of description of a particular dataset. In parts of their paper, the authors recognize this; for example, “different scales will lead to different factor scores—even when measuring the same five personality domains, two questionnaires might use different items. This issue will be exacerbated when considering alternative representations of the space of personality traits, for example, the 30 facets of the FFM, the 6 domains of the HEXACO inventory or the 27-dimensional SAPA Personality Inventory.” But then elsewhere they go for a more Platonic attitude that there is a universal truth they are aiming for: “An empirically justified taxonomic system of personality types offers a coarse-grained abstraction on the distribution of personality traits across individuals, in analogy to the distinction between different groups of elementary particles (for example, fermions or bosons) in physics or different species in biology. Such a classification is potentially useful in applied contexts, such as in clinical settings related to psychopathology or vocational settings.” I don’t buy the analogy to fermions and bosons at all.

          Also I’m bothered by this: “our key technical insight reveals that even state-of-the-art clustering algorithms will only find the correct solution by searching for a larger number of clusters than what could exist in the data and will fail to identify the abundance of mostly spurious clusters . . .” At a mathematical level, this statement seems reasonable to me—indeed, when I talk with students who want to do clustering or mixture modeling, I typically recommend fitting more clusters than you think you’ll need, and then check that you don’t need the extra clusters—see Section 22.4 of BDA3, where we write, “The number of clusters H can be viewed as an upper bound on the number of mixture components, as some of these components may be unoccupied. For example, if we set H = 20 it can still be the case that all n items in the study are allocated to a small number of the 20 components available, for example 3 or 4.” The part I don’t like is the idea of there being “the correct solution” in an applied problem in which the number of clusters is entirely a property of the number of people in your sample, and the density of data that is available on each person.

          In answer to the question, “Why are real statisticians so silent?”, I guess it’s that nobody asked us!

  2. Anonymous says:

    “Science is not a democracy. A study is not valid because a majority of people think it is.”

    Recent events have lead me to fear that this is exactly where (at least psychological) science is heading.

    I think they are already well on their way to, next to the still always popular “appeal to authority”, add the “appeal to the majority” to their discussion tactics (also see

    Also see Binswanger “Excellence by nonsense”-paper and specifically the chapter on “crowding out” individuals, linked to here:

    (Side note: the link to the paper in that blog post does not work for me anymore, but you can read the paper here as well

    I also predict many upcoming proposed “improvements” (also concerning peer-review) will all involve things that give power, attention, and influence to both “experts” and “the majority”/”large groups” instead of giving power, attention, and influence to (the importance of) sound reasoning, facts, evidence, etc.

  3. Eric Rasmusen says:

    I just can’t resist being a bit smarmy. When I saw the title of this post I thought it would be about the Ted Hill double retraction kerfuffle. In light of that, maybe the more relevant question is “Secret Post-Publication Emails Arguing for Retraction: Who’s Qualified?”

    • Andrew says:


      No problem. Snark is encouraged in this part of the internet! Seriously, though, the above post was on the queue and was written about 6 months ago. It’s just that these issues keep coming up, over and over, in different forms. And, in response to your (hypothetical) question: First, I think that everyone’s qualified to send emails to anybody. I get emails from strangers all the time. Second, I would prefer if such comments were made in public, and for that it would help to have more and better forums for post-publication review. You can’t, for example, attach post-publication criticism to a paper on Arxiv and not everyone has access to a blog with wide readership. I suppose it’s not that difficult to raise a quick fuss on twitter if you use enough hyperbole and hashtags, but that works against sober review. For these reasons I’m supportive of efforts to make post-publication review easier and more accessible, which I hope will reduce the chances of having future debacles such as what happened to the Hill and Tabachnikov paper. (And, already, the existing Arxiv, for all its imperfections, served us well by preserving all the uploaded versions of that article.)

  4. gec says:

    Perhaps this is a function of my coffee being slow to kick in—and perhaps this idea already exists—but I could imagine some kind of system like this:

    1) Each registered user is able to write a review of a posted manuscript, with their name being explicitly/publicly attached to the review.

    2) In addition, each user is able to up/down vote other reviews, similar to reddit. But unlike reddit, those votes are not anonymous; the user’s name is attached and represents an explicit public endorsement of the review. Votes may, of course, be accompanied by comments (“I agree with X but not Y”), and comments may be written without the need for an up/down vote. “Commenting” is how authors would initially respond to reviewers, I imagine.

    3) Each user maintains a “bank” of positively-scored reviews they have written. Posting a manuscript “costs” N reviews which are then deducted from the bank (people say you should review 3 times as many papers as you plan to submit, so maybe N=3). In the case of multiple authors, the cost must be divvied up, perhaps just one author paying or maybe 3 authors each pay 1. This would allow, e.g., students to publish with their advisor paying the cost.

    4) Revised manuscripts could be posted as new versions of the manuscript (with review/comment strings appropriately assigned to the different versions), if the commenting system is not sufficient to respond to reviews. But submitting a revision also costs something, less than the original, but still maybe 1-2 reviews since revisions, in turn, need to be reviewed too.

    Obviously this system would not be perfect, in particular step 2, which is designed to mitigate trolls, probably wouldn’t be enough and would need editors/moderators to intervene. And the revision system would need more thought. And this system could still be gamed, either to overwhelm a manuscript with bad reviews (that then get endorsed by friends of the reviewer) or to generate an explosion of manuscripts that inflates the apparent productivity of a small community (where someone gets their friends to review, then those friends use that to pay for their own publication which then gets reviewed by the first person, etc. etc.). But as noted many times in various places on this blog, these problems already exist in the peer review system, they are just more hidden, so perhaps opening up the box is a good start?

    • Andrew says:


      These could be good ideas. I think the real limiting factor is the effort required, first in setting up such a system, then in getting enough people to buy into it to contribute these reviews, then in getting continued participation, and then in keeping it running smoothly.

      It could be possible—just to take two examples, both Arxiv and CRAN work well, have the trust of the scientific community, and are set up so that it is possible to use them without being overwhelmed by junk. I recently wrote about a new system that some people are implementing; maybe this will also catch on.

      Or maybe some existing commercial system such as Facebook or Google will figure out how to monetize the system (link biology papers to ads for pharmaceuticals? link psychology papers to ads for self-help books? link political science papers to campaign ads? etc), and everyone will be publishing there. I have no idea.

  5. Nick says:

    In traditional publishing, reviewers serve a gatekeeper function. I’m genuinely uncertain about what peer review even means in a post-publication model. Will we be citing articles with a number in parentheses that represents the percentage of positive reviews?

  6. Matthew Poes says:

    I think a common problem with the current peer review process is that a lot of papers are reviewed by peers at the journal who do not have sufficient statistical or methodological knowledge to sus out the problems. A lot of papers are published using inappropriate statistics, data coding, data, etc. The ideas and writing could be fine, the study design may also be fine, but the statistics that they hinge everything on are not. I do think that post-publication peer review could help. I’ve written plenty of emails to journals pointing out errors in papers that never should have been published. Nothing ever comes of it and nobody is ever the wiser.

Leave a Reply