“We can keep debating this after 11 years, but I’m sure we all have much more pressing things to do (grants? papers? family time? attacking 11-year-old papers by former classmates? guitar practice?)”

Someone pointed me to this discussion by Lior Pachter of a controversial claim in biology.

The statistics

The statistical content has to do with a biology paper by M. Kellis, B. W. Birren, and E.S. Lander from 2004 that contains the following passage:

Strikingly, 95% of cases of accelerated evolution involve only one member of a gene pair, providing strong support for a specific model of evolution, and allowing us to distinguish ancestral and derived functions.

Here’s where the 95% came from. In Pachter’s words:

The authors identified 457 duplicated gene pairs that arose by whole genome duplication (for a total of 914 genes) in yeast. Of the 457 pairs 76 showed accelerated (protein) evolution in S. cerevisiae. The term “accelerated” was defined to relate to amino acid substitution rates in S. cerevisiae, which were required to be 50% faster than those in another yeast species, K. waltii. Of the 76 genes, only four pairs were accelerated in both paralogs. Therefore 72 gene pairs showed acceleration in only one paralog (72/76 = 95%).

In his post on the topic, Pachter asks for a p-value for this 72/76 result which the authors of the paper in question had called “surprising.”

My first thought on the matter was that no p-value is needed because 72 out of 76 is such an extreme proportion. I guess I’d been implicitly comparing to a null hypothesis of 50%. Or, to put it another way, if you have 76 pairs, out of which 80 were accelerated (I think I did this right and that I’m not butchering the technical jargon: I got 80 by taking 72 pairs with only one paralog plus 4 pairs with two paralogs each), it would be extremely extremely unlikely to see only four pairs with acceleration in both.

But, then, as I read on, I realized this isn’t an appropriate comparison. Indeed, the clue is above, where Pachter notes that there were 457 pairs in total, thus in a null model you’re working with a probability of 80/(2*457) = 0.087, and when the probability is 0.087, it’s not so unlikely that you’d only see 4 pairs out of 457 with two accelerated paralogs. (Just to get the order of magnitude, 0.087^2 = 0.0077, and 0.0077*457 = 3.5, so 4 pairs is pretty much what you’d expect.)

So it sounds like Kellis et al. got excited by this 72 out of 76 number, without being clear on the denominator. I don’t know enough about biology to comment on the implications of this calculation on the larger questions being asked.

Pachter frames his criticisms around p-values, a perspective I find a bit irrelevant, but I agree with his larger point that, where possible, probability models should be stated explicitly.

The link between the scientific theory and statistical theory is often a weak point in quantitative research. In this case, the science has something to do with genes and evolution, and the statistical model is was that allowed Kellis et al. to consider 72 out of 76 to be “striking” and “surprising.” It is all too common for a researcher to reject a null hypothesis that is not clearly formed, in order to then make a positive claim of support for some preferred theory. But a lot of steps are missing in such an argument.

The culture

The cultural issue is summarized in this comment by Michael Eisen:

The more this conversation goes on the more it disturbs me [Eisen]. Lior raised an important point regarding the analyses contained in an influential paper from the early days of genome sequencing. A detailed, thorough and occasionally amusing discussion ensued, the long and the short of which to any intelligent reader should be that a major conclusion of the paper under discussion was simply wrong. This is, of course, how science should proceed (even if it rarely does). People make mistakes, others point them out, we all learn something in the process, and science advances.

However, I find the responses from Manolis and Eric to be entirely lacking. Instead of really engaging with the comments people have made, they have been almost entirely defensive. Why not just say “Hey look, we were wrong. In dealing with this complicated and new dataset we did an analysis that, while perhaps technically excusable under some kind of ‘model comparison defense’ was, in hindsight, wrong and led us to make and highlight a point that subsequent data and insights have shown to be wrong. We should have known better at the time, but we’ve learned from our mistake and will do better in the future. Thanks for helping us to be better scientists.”

Sadly, what we’ve gotten instead is a series of defenses of an analysis that Manolis and Eric – who is no fool – surely know by this point was simply wrong.

In an update, Pachter amplifies upon this point:

One of the comments made in response to my post that I’d like to respond to first was by an author of KBL [Kellis, Birren, and Lander; in this case the comment was made by Kellis] who dismissed the entire premise of the my challenge writing “We can keep debating this after 11 years, but I’m sure we all have much more pressing things to do (grants? papers? family time? attacking 11-year-old papers by former classmates? guitar practice?)”

This comment exemplifies the proclivity of some authors to view publication as the encasement of work in a casket, buried deeply so as to never be opened again lest the skeletons inside it escape. But is it really beneficial to science that much of the published literature has become, as Ferguson and Heene noted, a vast graveyard of undead theories?

Indeed. One of the things I’ve been fighting against recently (for example, in my article, It’s too hard to publish criticisms and obtain data for replication, or in this discussion of some controversial comments about replication coming from a cancer biologist) is the idea that, once something is published, it should be taken as truth. This attitude, of raising a high bar to post-publication criticism, is sometimes framed in terms of fairness. But, as I like to say, what’s so special about publication in a journal? Should there be a high barrier to criticisms of claims made in Arxiv preprints? What about scrawled, unpublished lab notes??? Publication can be a good way of spreading the word about a new claim or finding, but I don’t don’t don’t don’t don’t like the norm in which something that is published should not be criticized.

To put it another way: Yes, ha ha ha, let’s spend our time on guitar practice rather than exhuming 11-year-old published articles. Fine—I’ll accept that, as long as you also accept that we should not be citing 11-year-old articles.

If a paper is worth citing, it’s worth criticizing its flaws. Conversely, if you don’t think the flaws in your 11-year-old article are worth careful examination, maybe there could be some way you could withdraw your paper from the published journal? Not a “retraction,” exactly, maybe just an Expression of Irrelevance? A statement by the authors that the paper in question is no longer worth examining as it does not relate to any current research concerns, nor are its claims of historical interest. Something like that. Keep the paper in the public record but make it clear that the authors no longer stand behind its claims.

P.S. Elsewhere Pachter characterizes a different work of Kellis as “dishonest and fraudulent.” Strong words, considering Kellis is a tenured professor at MIT who has received many awards. As an outsider to all this, I’m wondering: Is it possible that Kellis is dishonest, fraudulent, and also a top researcher? Kinda like how Linda is a bank teller who is also a feminist? Maybe Kellis is an excellent experimentalist but with an unfortunate habit of making overly broad claims from his data? Maybe someone can help me out on this.

33 thoughts on ““We can keep debating this after 11 years, but I’m sure we all have much more pressing things to do (grants? papers? family time? attacking 11-year-old papers by former classmates? guitar practice?)”

  1. The claim of fraud relates to supplemental material and parameters for a paper that may or may not be mathematically unsound. The main issue is that there was either error or something didn’t work, and supplemental material for the paper was changed apparently fundamentally changing the analysis or going against other claims in the paper. This was done without any notice to readers. I think this was poor practice at best. Given that Lior has charged fraud, Kellis has of course been ultra-defensive about it. That is a summary of the other issues (hopefully accurate in spirit- though all the details are in the earlier posts and comments and also summarized in a comment of Lior’s recently in light of the p value discussion)

    • That’s a key feature of a lot of these affairs: Once someone isn’t tactful at broaching an issue and labels things strongly (“fraud”) the other person gets ultra defensive. Ultimately it becomes something like a trial by media.

  2. Perhaps if being wrong were not a Very Bad Thing.

    It might not be the same everywhere, but scientists I have interacted with in US plant science definitely view being wrong as Very Bad. A retraction is career threatening. A correction not much better.

    Publication of something that is simply wrong (not fraudulent, not even incompetent), is, per my understanding, a catastrophic demonstration of un-employable, un-fundable, (attach more negative phrases), idiocy.

    However, almost by definition, every scientist publishes things that are wrong.

    This is all enshrined in many facets of science. Talks, seminars, and manuscripts portray science as a perfectly logical procession of experiments that all return usable data which carefully support conclusions. All of the failure, wrongly conceived experiments or mistakes in data collection are removed.

    A major component of science, as practiced, is about appearing to never be wrong. It is no surprise scientists are so highly allergic to their work be criticized.

    • This is also why academics is so dysfunctional – we learn by making mistakes. If mistakes are career-threatening, then once we become academics we stop learning, apparently.

  3. Andrew wrote:
    “So it sounds like Kellis et al. got excited by this 72 out of 76 number, without being clear on the denominator.”
    The “denominator problem” mentioned reminds me of the point Gigerenzer often mentions: relative risk vs. absolute risk. Drug companies often trumpet the former when a disease is rare and ignore the latter. That is, a medication or treatment may hardly affect the overall population yet seem to have a dramatic influence by ignoring the proper denominator.

    • Richard:

      Yes, I agree with everything at that link. Again, one way to see this is to ask where to draw the line. Is it only papers in Science and Nature that deserve the sort of deference that the authors of those papers are demanding? Or any top journal (Psychological Science, say)? Or any journal at all, including more specialized outlets such as the Journal of Theoretical Biology (notorious for publishing that beauty-and-sex-ratios paper)? Or any journal at all, including the journals that publish 90% of submissions? Or Arxiv postings? Or scrawled lab notes . . .?

  4. No bunny cites 11 year old papers. We cite reviews that cite 11 year old papers which makes digging out the ossified doubly tough, but easier than in the old day when the library had moved them to trailers in East Nowhere that they have to unpack to get the volume you need. Another reason to like on line access.

    • I have always had a bit of a liking for reading the original article. And not just the abstract. I have seen papers quoted that had no link at all with the issue being addressed, papers whose actual results contradicted the summary in the review article and the paper’s abstract, etc.

      A favourite quote on this:
      One of the things I have learned from reading secondary sources on historical cooking is that you should never trust a secondary source that does not include the primary, since you have no way of knowing what liberties the author may have taken in his “interpretation” of the recipe.
      David Friedman http://www.daviddfriedman.com/Medieval/To_Milk_an_Almond.pdf

  5. It is too easy to get a theory “accepted” in general. People simply need to demand and expect precise a priori predictions before considering an idea to be more than preliminary. Usually in biology/medicine the truth is so far away from that ideal it is sad.

    Recently I looked into the Cas9-CRISPR technique (eg https://www.ncbi.nlm.nih.gov/pubmed/26121415) which is supposed to allow cutting out arbitrary genome segments. There is much hype surrounding this method the last few years but afaict they have no quantitative model (why should it only work for x% of cells?, why is proliferation slowed the amount it is?, etc).

    The supplements of that paper show the modifications they are claiming to cause appear already present in low amounts at baseline. Take a culture consisting of 3.5% “premodified” cells (the sensitivity they claim for their assay) and have your treatment suppress proliferation of the other 96.5%. Then after 7 doublings I’d think your culture will consist of 80% “modified cells” without cutting any DNA at all. The simplest model is that N(d)=N0*2^d where d is number of divisions and N0 initial # of cells. However if only a subset of cells are dividing and we only want the proportion like that:

    d=0:7; m=.035*2^d + .965
    (.035*2^d)/m

    We’d need to account for confluence (eventually there is no more room in the dish for more cells) and other factors to make any predictions but that seems like a start. My point is that they have nothing like this that even gives a general idea of what is thought to be happening. Even if my simple idea above is wrong, there is no reason to be confident in their explanation. At this point, if it is some experimental artifact, too much money and hype has been thrown around to admit it.

      • Can you link to one that rules out my explanation? I just read another one where they comment on the mysterious presence of “modified” cells in the control groups: http://www.pnas.org/content/early/2015/07/21/1512503112.abstract

        If these cells already exist in the culture and they are giving a treatment that selects for them, it can easily account for the majority or all of the modified cells they detect 4 days later. That is two papers where they do not mention this possibility. You cannot just stop at getting a “significant” result and jump to the conclusion your explanation is correct, which is what appears to be going on here. I am sure people can think of others as well. What other explanations have they ruled out?

        • There are literally hundreds of papers showing Crispr gene editing in many different organisms. It is at this point a totally established technique being used in labs in pretty much every biology department in the world. Some examples:

          http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3714591/
          http://www.genetics.org/content/195/3/1187.abstract
          http://link.springer.com/article/10.1007%2Fs11103-014-0263-0
          http://www.cell.com/cell-reports/abstract/S2211-1247(15)00262-4

          Getting Crispr to work in a new organism is a major undertaking, and the fact that it doesn’t work in some cases does not make the whole field “hype.” That is just dumb.

        • Thanks, I will check out those specific papers. But I’m not sure you understand my point. My point is that (from my reading of two papers) it appears they are using similar methods and getting the same result, which is great. However, afaik these results can also be explained via a different, less interesting, mechanism. Just because many people replicated the result does not mean they have interpreted it correctly.

          This is the whole ” just because the null hypothesis is false doesn’t mean your theory is true” issue that is covered on this blog extensively. I am not saying that they *have* misinterpreted the data, only that the alternative explanation I raised above has not been ruled out or even mentioned in what I have read. If you know of any papers that do that experiment, please let me know.

        • Phil, in the first paper (Basset et al. 2013) there is more evidence for what I am claiming. Look at table 1. First, there is an inverse relationship between survival of the embryos and % modified adults (consistent with a selection effect).

          Second, in contrast to the other two papers I mention, they report 0% modified under control conditions. However, there are no sample sizes or error bars reported there. Look at table two, and we see that under one condition they observed 0% modification using the method of table 1, but a more sensitive method detected 4% of modified embryos. So the assay used to measure controls was not sensitive enough to rule out the pre-existing presence of modified cells. Thus, the possibility of a selection effect remains standing.

          I really don’t care specifically about this CRISPR/Cas9, it is only meant to highlight the common logical error being made. Just because you observe a difference between two groups does not let you conclude it is for your preferred reason.

        • Looked at the others. Waajiers et al appears to be missing the negative control, but also reports the treatment is toxic. Gao eta al. contains a negative control but it is of no use to us because we are considering the presence of small numbers of mutant cells. We need to know what the sensitivity of this assay is to small amounts of mutated DNA. Kistler et al has a similar problem, we only see gels with one control example. Once again though (table S3), we see an inverse correlation between percent mutated and survival.

          But maybe I missed it or got confused? I am not perfect. What experiment in those papers do you think rules out my alternative explanation?

        • “I just read another one where they comment on the mysterious presence of ‘modified’ cells in the control groups:”

          Where is this comment in the paper?

  6. “Not a “retraction,” exactly, maybe just an Expression of Irrelevance? A statement by the authors that the paper in question is no longer worth examining as it does not relate to any current research concerns, nor are its claims of historical interest.”

    Perhaps another sign that the paper and ink based model of scientific publication, that is still preserved as journals have moved online, is a counter-productive relic.

    It seems that there should be some onus on authors to periodically review their past work and unequivocally comment on which papers they still stand behind, and which ones are better to just be forgotten.

    With today’s technology, this can even be done on a “line item” basis.

    • To me this raises the question of whether we want an empirical paper to represent the likelihood or the posterior. I think it possible to have both, but it would be bad for the process of knowledge generation if we retroactively censored results that did not match current understanding. Of course, if the result is found to be based on faulty assumptions then that is fine, but I think there is probably a broad grey area.

  7. Some probably irrelevant background here:
    Manolis Kellis is a computer scientist working on genomic research. Interestingly, Eric Lander (one of the founding fathers in this country on large-scale genomic research) was Kellis’s phd advisor and was in the thesis committee of Pachter when he did his phd in math at MIT. More interestingly, Pachter’s advisor, Bonnie Berger, was also in Kellis’s thesis committee.

  8. I am not a biologist or geneticist or statistician, but whatever. Reading that discussion (instead of writing a grant for a paper about how guitar practice helps with family time) I think I understand why 72 out of 76 was a “wow” moment regardless of the denominator. There was, or maybe still is, a biological theory that duplicated genes cannot evolve however they want (Pachter’s and others null). Natural selection will allow only 2 models of behavior — either one copy is allowed to evolve quickly (and gain new function) while the other will maintain the old sequence and function or both of the copies will evolve at the same rate as long as changes in them are complementary. The third possibility that neither copy will evolve fast is simply a class of its own, not something that provides the base rate.
    Of course, I make no attempt to judge whether that hypothesis was reasonable at the time, remains reasonable now, was correctly implemented (there were accusations of research degrees of freedom/forking-paths garden choices in the original paper) etc. etc. Just a small reminder that it is not always obvious what is a reasonable null hypothesis.

  9. Not sure just what you’re trying to say (so I’m not sure whether my comments support or are contrary to the point you are trying to make), but
    “Natural selection will allow only 2 models of behavior — either one copy is allowed to evolve quickly (and gain new function) while the other will maintain the old sequence and function or both of the copies will evolve at the same rate as long as changes in them are complementary,” seems like an unrealistically simplistic view of natural selection.

    • Well, as I’ve said, I am not a biologist, but this is essentially the gist. The first idea goes by “neofunctionalization” and the second one by “subfunctionalization”.

    • And, yes, the idea that 2 copies of a gene evolve in a correlated manner seems to be pretty reasonable, because whatever they do, they must maintain the old functionality. But why there should be a dichotomy, I don’t know.

      • “the idea that 2 copies of a gene evolve in a correlated manner seems to be pretty reasonable, because whatever they do, they must maintain the old functionality.”

        My understanding is that the usual assumption is that *one* copy of the gene must retain the old functionality, so the other can evolve without loosing the functionality for the organism as a whole.

        (I’m not a biologist, but I did for several years participate in a biology seminar where we read and discussed papers of interest to the students. Participants from non-biology fields were welcome, especially if they could contribute by clarifying questions of math, statistics, CS, etc., and especially if they didn’t mind serving on some biology Ph.D. committees.)

        • I think you are describing neofunctionalization. I guess, the idea is that as soon as a loss-of-function mutation occurs in one copy the other copy of the gene is locked by selection, but the mutated copy is free to mutate further and acquire a new function. Subfanctionalization happens if the protein the gene encodes has several functions. A mutation can diminish one function in one copy, but enhance another and since that moment 2 copies will diverge by developing specialized proteins.

  10. Two thoughts:

    1) Sociologist Harry Collins, in his in-depth near-participant-observer long-term studies of gravitational wave astronomy distinguishes two doctrines of publication: “methodological individualism” and “methodological collectivism.”

    The former, which Collins finds prevalent in American science, holds that each publication stands on its own as a statement of truth and the that authors must stand behind and defend its findings. Under this norm, it is not OK to publish results that are suggestive or interesting but not conclusive. The individualists believe especially that publishing striking results that are not secure can mislead the community and divert others’ research down blind allies.

    The latter (“collectivism”), which Collins found prevalent in Italian research, holds that each publication is just part of a giant field-wide meta-analysis, and it is wrong to withhold data just because they are hard to understand or appear anomalous. In particular, the collectivists believe that withholding these data may confuse other research groups finding similar things and believing their own findings to be outliers.

    It seems that opponents of criticizing old papers must be individualists (although the converse need not hole, i.e. individualists might be all in favor of criticizing old papers but anyone shielding old papers could not be a collectivist).

    2) The problem of public criticism of other researchers is not restricted to empirical publications. Any time there are potential social impacts on a critic for calling attention to errors in others’ work, whether that work be theoretical, empirical, published in a journal, or in working paper form then such criticism is likely to be muted. In my opinion, one of the advantages of a field developing competing semi-dogmatic “schools” is that these provide for more criticism of bad ideas as a member of school A gets social support for taking down the work of a researcher in school B.

Leave a Reply to Anoneuoid Cancel reply

Your email address will not be published. Required fields are marked *