Being polite vs. saying what we really think

We recently discussed an article by Isabel Scott and Nicholas Pound entitled, “Menstrual Cycle Phase Does Not Predict Political Conservatism,” in which Scott and Pound definitively shot down some research that was so ridiculous it never even deserved the dignity of being shot down. The trouble is, the original article, “The Fluctuating Female Vote: Politics, Religion, and the Ovulatory Cycle,” had been published in Psychological Science, a journal which is psychology’s version of Science and Nature and PPNAS, a “tabloid” that goes for short sloppy papers with headline-worthy claims. So, Scott and Pound went to the trouble of trying and failing to replicate; as they reported:

We found no evidence of a relationship between estimated cyclical fertility changes and conservatism, and no evidence of an interaction between relationship status and cyclical fertility in determining political attitudes.

The thing that bugged me when Scott and Pound wrote:

Our results are therefore difficult to reconcile with those of Durante et al, particularly since we attempted the analyses using a range of approaches and exclusion criteria, including tests similar to those used by Durante et al, and our results were similar under all of them.

As I wrote earlier: Huh? Why “difficult to reconcile”? The reconciliation seems obvious to me: There’s no evidence of anything going on here. Durante et al. had a small noisy dataset and went all garden-of-forking-paths on it. And they found a statistically significant comparison in one of their interactions. No news here.

Scott and Pound continued:

Lack of statistical power does not seem a likely explanation for the discrepancy between our results and those reported in Durante et al, since even after the most restrictive exclusion criteria were applied, we retained a sample large enough to detect a moderate effect . . .

Again, I feel like they’re missing the elephant in the room: “Lack of statistical power” is exactly what was going on with the original study by Durante et al.: They were estimating tiny effects in a very noisy way. It was a kangaroo situation.

Anyway, I suspect (but am not sure) that Scott and Pound agree with me that Durante et al. were chasing noise and that there’s no problem at all what had been found in that earlier study (bits of statistical significance in the garden of forking paths) with what was found in the replication (no pattern).

And I suspect (but am not sure) that Scott and Pound wrote the way they did in order to make a sort of minimalist argument. Instead of saying that the earlier study is consistent with pure noise, and their replication is also consistent with that, they make a weaker, innocent-sounding statement that the two studies are “difficult reconcile,” leaving the rest of us to read between the lines.

And so here’s my question: When is it appropriate to make a minimalist argument, and when is it appropriate to say what you really think?

My earlier post had a comment from Mb, who wrote:

By not being able to explain the discrepancy with problems of the replication study or “theoretically interesting” measurement differences, they are showing that the non-replication is likely due to low power etc of the original study. It is a rhetorical device to convince those skeptical of the replication.

I replied that this makes sense. I just wonder when such rhetorical devices are a good idea. The question is whether it makes sense to say what you really think, or whether it’s better to understate to make a more bulletproof argument. This has come up occasionally in blog comments: I’ll say XYZ and a comment will say I should’ve just said XY or even just X because that would make my case stronger. My usual reply is that I’m not trying to make a case, I’m just trying to share my understanding of the problem. But I know that’s not the only game.

P.S. Just to clarify: I think strategic communication and honesty are both valid approaches. I’m not saying my no-filter approach is best, nor do I think it’s correct to say that savviness is always the right way to go either. It’s probably good that there are people like me who speak our minds, and people like Scott and Pound who are more careful. (Or of course maybe Scott and Pound don’t agree with me on this at all; I’m just imputing to them my attitude on the scientific questions here.)

41 thoughts on “Being polite vs. saying what we really think

  1. From a practical point of view it’s quite easy decision. If the reviewer says it’s not valid to state your opinion, then you need to use a rhetorical device or some other obfuscation strategy (or just remove the opinion from the publication).

  2. (the same one in the post)

    I don’t think this rhetorical device is inconsistent with saying what you mean, nor do I think that it necessarily means understating the case. Walking the reader through possible explanations and finding them lacking is an important rhetorical tool to show the skeptical reader that the prior result appears to be noise.

    • Mb:

      What you wrote above, I agree with.

      But what about Scott and Pound’s statement, “Lack of statistical power does not seem a likely explanation for the discrepancy between our results and those reported in Durante et al”? This seems just flat-out false. Lack of statistical power seems to me a very likely explanation for the discrepancy.

        • Martha:

          That could be, but in any case I think it’s incorrect of them to dismiss the possibility that the two studies are completely consistent with each other, with all differences attributable to random noise.

        • It is not incorrect at all. Under the assumption that the effect is real, the original study obviously doesn’t have a problem with power as it found the effect (however unlikely that was to happen a priori), and they are saying their replication study was sufficiently powered, so it is “weird” that they failed to replicate this effect. You could continue and say, under the assumption the effect is not real, the original study committed a Type I error, so an unusual/weird event also occurred. The “difficult to reconcile” refers to, in my mind, the fact that under each H, something weird has occurred. It is a very diplomatic way of arguing. They may be implying that the only way to remove the weirdness is to allow for the possibility that the Type I error was actually much higher than nominal (i.e., p-hacking).

  3. Let’s not dance around the issue at hand. We do a professional disservice to understate what we believe – professional courtesy does not require us to permit work to appear more respectable than it really is. However, our obligation is not to overstate our results either. Failure to replicate can mean many things, and there is a fine line between believing somebody has studied and reported noise and feeling we have sufficient “proof” to that effect. It is difficult to define where that line is and it may well differ between people. Our reputations should partially result from how we make that personal determination.

    To make this more concrete, when I try to replicate someone’s results and am not able to do so, how far I am willing to go depends largely on how much I have been able to access the same data they used (for experimental studies, this would be more involved since replication usually means collecting new data). When I have the same data as they used and cannot either replicate their results or find that the results depend critically on which forking path they took, I am inclined to be blunt about what I believe to be true. When I can’t really investigate it that fully, then I tend to be more “polite” and reserved. I may still rail against the inability to get access to the data, particularly if I have tried and the authors have not cooperated.

    • Dale:

      I agree about not being overconfident, and there would be no good reason for Scott and Pound to say, “The replication was unsuccessful, therefore the claims in the earlier study are wrong.”

      But I think it is correct to say that the replication and the earlier study are both consistent with a null or very small effect. That is, I disagree with Scott and Pound’s statement that their results are “difficult to reconcile with those of Durante et al.” They’re not difficult to reconcile at all! To say that the two studies are difficult to reconcile is to create a mystery where none exists.

  4. I think the issue is that you’re taking into account how the sausage is made, whereas I think the tendency at least for psychology is to take previous papers at face value. If authors write that they had a theory that made certain predictions and they ran the analyses they describe (and only those analyses), and they aren’t described as exploratory or speculative or whatever, then there’s no particular reason to argue with the outcome. In that situation, it is hard to reconcile why results would differ. But you find it natural to reconcile them because you doubt the theory and think they ran (or could have run) other tests.

    So then I guess the question is why should we take papers at face value, and presumably that’s been answered by the empirical evidence of what happens when you say that someone did something other than what they wrote in their paper: whether you call it forking paths or researcher degrees of freedom or whatever else, they get upset and feel like they have been accused of lying or making up the results. At the very least, you’re telling them that their big, possibly career-making research is blatantly wrong. No one takes well to that, and science is supposed to be a place of decorum (right?), so why start a fight?

    • Related to Alex’s last two sentences: “Know your audience” is a maxim of good communication. It may be that Scott and Pound realized that stronger phrasing would prompt defensiveness from the audience they were trying to reach, so that their reserved approach might have a better chance of getting the message across.

      • Martha:

        Yes, I suspect you’re right. This is the dilemma I was raising in my post: to what extent is it a good idea to say something false (that the two studies are “difficult to reconcile”), if you think that, for your intended audience, that false statement will be more palatable than the truth?

        • Scott and Pound may have been operating on the assumption (perhaps valid) that the situation was indeed difficult to reconcile for the audience they were trying to reach, and that acknowledging that difficulty (whether cognitive or emotional or both) would be their best bet for promoting further thought in that audience.

          Or, to try to make my point another way: “difficult” is a variable, not a constant, so the truth or falsehood of the statement, “the two studies are difficult to reconcile” depends on the (neglected) question “for whom?”. The statement is false for you, but you would need very strong evidence to generalize that to all people.

  5. I get this quite a lot when I choose to use the word “plagiarism”. I get told (and it’s often well-meaning advice) that this is a bad rhetorical strategy. I usually stick to my guns because I often don’t have some other “larger” point. My argument is rooted essentially in the fact that a passage has been plagiarised. I would feel strange saying simply that it’s “difficult to reconcile” how so-and-so referenced (or failed to reference) a particular text with the facts about that text itself.

  6. Interesting. I read “difficult to reconcile” as meaning something along the lines of “these two things cannot be simultaneously true, so one of them is false.” What you call a “reconciliation” is actually a rejection of one of the two conclusions, not a synthesis.

    That is, you can reconcile two conclusions if both can be true. If researcher A does a study on mice and finds X, then researcher B does the same thing to humans and finds Y (or not X or whatever), you can “reconcile” the conclusions by saying that X is true of mice but Y is true of humans. When you do the same study the same way and get different conclusions, you can’t “reconcile” them because one of the two conclusions is false.

    I hardly think that is a “minimalist” argument. It would perhaps be clearer to say “these things are difficult to reconcile and we’re pretty sure that our conclusion is the correct one,” but I think any paper implicitly asserts that its conclusions are the correct ones (why else would you write it?)

    • I see what you are saying, but if you only look at the statistical results and not at the conclusions the two groups make, there isn’t a problem with reconciling two things. Two facts are both true: one under-powered study found statistically significant effects, another one with a larger sample and better research design did not. These two facts are both consistent with the same truth that there is no measurable effect in the world. Nothing hard to reconcile there.

      The quote says “Our results are therefore difficult to reconcile…” But it is only the conclusions that can’t be reconciled, not the results of the statistical tests. Those are easily reconciled by thinking about statistical power and research design.

  7. One important (and obvious) consideration regarding when and when not to employ these kind of rhetorical devices relates to the consequences of using them.

    To get away from ovulopolitics ™, I like Thomas’ plagiarism point. Say I’m a referee and I find that some paper I’m refereeing blindly has two full sentences and a bunch of framing taken from another paper that may or may not be the same author. The consequences of me writing “Dear Editor, this paper is plagiarized” may be much different than me writing “Dear Editor, a couple of sentences from this paper seem to be taken from another paper, along with a bit of the motivation”. In one case, there might be an obligation by the associate editor to notify the head editor, and it starts a whole chain reaction, one that might not even be warranted given the infraction (lazy patch-writing by someone who doesn’t speak English well is different than copying full paragraphs or appropriating key insights without attribution). In the other case, the associate editor may just send a note to the author saying “be more careful” along with their rejection, which would be hurtful and embarrassing to the author, but an appropriate level of censure.

    On the other hand, if there are no consequences, then go ahead and say what you want. When a good student who can take criticism writes an awful paragraph in a draft, I’ll write back “this is absolutely terrible, re-read this to yourself and then never write like that again.” But I would only send that to students who can handle criticism because the consequences are different for those that can’t: in one case, the student learns to improve their writing and their critical reading eye; in the other case, you hurt someone’s feelings for no reason and probably make them withdraw from you and a vicious cycle starts. This ends up meaning I am much harsher on good students than weak ones, but I’m not sure that is bad (and obviously, you balance that out in other ways).

    Since Andrew thinks professors should be adults not just capable of dealing with criticism but also grateful for engagement with their work, it makes sense that he wants them to speak more plainly. But I think that, to some degree, ignores the obvious mitigating factor: the effect of plain-speaking isn’t always superior to the effect of dissembling.

    • Though it is totally different to write an email to an editor who then has to deal with the author than to write directly to the author.

      That said they are no dissembling. They are saying “this is our conclusion show us how we are wrong.” And there is really almost always a good reason to use understatement with a wallop behind it than to use overstatement. Just like damning with faint praise, giving the people you are arguing against a list of points they might want to use against you and then addressing those points is a technique for writing persuasively that is pretty standard.

    • The other issue is the difference between intentions versus actual results. Sometimes purely academic criticism of a junior professors publication, harshly worded, can have an impact on their careers wholly unintended by the criticizer.

  8. Andrew,
    I think that you are just running into cultural differences, both in discipline and nationality.

    My training is in Psychology and the approach does not look unusual or unlikely. Not necessarily what one might do but not unlikely. I remember, reviewing a psychometric test, writing something along the lines that “while one does not want too much inter-correlation among subtests, some might be desirable”. My readers got the point.

    Plus, look at the authors’ affiliation “Division of Psychology, Department of Life Sciences, Brunel University London, Uxbridge UB8 3PH, United Kingdom”.

    Again a cultural difference in how one would word things.

    I’m Canadian, and both from a psychology and language point of view, I read those conclusions as pretty damning. “difficult to reconcile”, when I read it, meant in US English something like “not a hope in hell they actually found something”.

    I remember, reviewing a psychometric test, writing something along the lines that “while one does not want too much inter-correlation among subtests, some might be desirable”. Just a different way of wording things.

  9. It would have been interesting to read a pre-tenure version of your blog.

    That is, they are a Lecturer and a Senior Lecturer, while you are a Tenured Professor at Columbia. So that might make a difference in strategic communication.

  10. I’ve had good success in facilitating effective workplace dialog using Chris Argryis’ (from Harvard) action science. In “Strategy, Change and Defensive Routines” by Chris Argyris, Pitman Publishing Inc., 1985, pp. 262-263, he describes commonly held values in two ways–what he terms the old way and the new way.

    Under honesty, the old way is “Tell others no lies or tell other all you think and feel.” The new way is “Create conditions that make it more likely that one can reveal to others (and hear from others) without distorting what would otherwise be subject to distortion.”

    Under respect for others, the old way is “Defer to others and do not confront others’ reasoning or action.” The new way is “Attribute to others a high capacity for self-examination without loss of their sense of effectiveness or their exercise of self-responsibility and choice.”

    That table also includes integrity, strength, and help and support.

    Argyris is no touchy-feely soft guy; his approach may come across as very direct and very intense. Still, I’ve been part of a couple of groups where that became the norm, and I think it was rewarding and productive for all involved.

    • Bill:

      I’m all for better communication, and I’m sure I could use a lot of tips myself.

      But there is a tradeoff here. If someone writes something false, it can mislead, right?

      For example, when Scott and Pound write, “Our results are therefore difficult to reconcile with those of Durante et al.,” I worry that people will read this and think there is some puzzle or unexplained mystery: why did the two studies differ? But from a statistical standpoint, there is no mystery at all, and I think it would be much more accurate to say, “Our results and those of Durante et al. are consistent with null or very small effects, along with natural variation in the data.”

      I suspect Scott and Pound wrote what they did to make their claims less aggressive, and I can see the reason for this—I’m not saying they made the wrong decision—but there is a cost in communication when you publish false statements. That was the point of my post: not that lay-it-all-out-on-the-table honesty is always the best communication strategy, but that I worry that excessively diplomatic communication can lead to confusion. There do seem to be people out there who seem to think that failed replications in psychology require some special explanation, beyond just the simple story of sampling variability.

  11. I hit send too soon. In other words, I think the choice can be better framed as not between politeness and forthrightness but between effectiveness and ineffectiveness. I think politeness is often taken to be Argyris’ old way of respect, which gets us nowhere. If the opposite to politeness is “the old way” of honesty, we may not be much more effective if it ends up shutting down the other party, driving them into a huddled mass of defensiveness sitting in the corner.

    In a way, human communications shares aspects of (engineering) communications theory: how can we encode a message (design a statement or a conversation) to maximize the probability of our essential message being heard. That probably requires feedback, too, to find out what message was received so we can make any needed corrections–what Argyris might call balancing advocacy and inquiry.

    Thanks for bringing this one up, Andrew; it is important. If you’re so inclined, check out Argyris’ writings, too. You might find yourself in tune with them.

    • “.. human communications shares aspects of (engineering) communications theory: how can we encode a message (design a statement or a conversation) to maximize the probability of our essential message being heard. That probably requires feedback, too, to find out what message was received so we can make any needed corrections–what Argyris might call balancing advocacy and inquiry.”

      Yes. And I think that the real problem with what Scott and Pound said was not that it was false (as Andrew claims), but that it was ambiguous (and therefore potentially misleading). So if Scott and Pound had had more feedback (possibly they did have some feedback), they might have worded their statement differently. One first draft of such a wording (borrowing from some of what Andrew said above and undoubtedly subject to considerable improvement) might be: “Our results may seem to many to be difficult to reconcile with those of Durante et al. However, from a careful statistical standpoint, there is no conflict: Our results and those of Durante et al. are both consistent with null or very small effects, along with natural variation in the data.” (And reference to further explanation of the last point would be needed, since many researchers have only a superficial understanding of statistical inference.)

  12. Andrew,

    Oh, I’m not for ignoring the problem. I crave clear communications that let me understand clearly what was said. And I relish what I see as your and Argyris’ alignment on respect: you wrote as if they could have you and they examine their reasoning without collapsing into a puddle of uselessness.

    Is it possible that the challenge comes especially in writing when the other party reacts more with defensiveness than curiosity? If you wrote some of those words about something I wrote, I might take them as an opportunity to learn (and that’s likely good for both of us), or I might take them as a condemnation both of the value of my reasoning and of the value of my ability to reason and, eventually, as the value of my worth. Even at that point, I could explain why I thought you just screwed my future career and ask your reaction. I suspect you might explain that you weren’t screwing my career but engaging in a professional dialog even as you very much think I had written something that was fundamentally wrong. That might lead to something good. Or I could merely blast back. That would likely not lead to anything good.

    I guess I figure that I need to convey the hard truths as I see them (sometimes for me they turn out not to be so true, and I need to be able to hear that, too!), but I also need to figure out how fast I can convey that without overrunning the receiver’s ability to hear.

    Does that make any sense?

  13. One question I have for Andrew is a thought-experiment: if it were your sister who wrote a paper with a claim you found as absurd as the articles you criticize, would you go after her with the same no-filter approach you take with people you don’t know? If not, why not?

    • Shravan:

      I don’t “go after” people who make research mistakes, I just point the mistakes out. If my sister published a paper that was consistent with a pattern in noise, yes, I would say so directly. I would’t consider that as “going after” her, I’d consider it as being helpful.

      • That’s a perfect situation then. I suspect that if I were to be helpful in pointing people’s mistakes out publicly, I would, for example, not want to write a slate article about something that my sister wrote, but I would want to write one if it was someone I was not affiliated with in any way.

        • Shravan:

          My Slate article discussed several published papers that had a particular statistical flaw. Had my sister written a paper with such a flaw, I might well have mentioned it in that article.

        • Yes, I understand. When I said you were going after people, I was looking at it from the perspective of people like Tol and Tracy. Judging from their responses on your blog, they seemed to feel pretty persecuted; they certainly didn’t thank you for the helpful advice. I hope you can see that.

        • Shravan:

          I was responding to your question, “if it were your sister who wrote a paper with a claim you found as absurd as the articles you criticize, would you go after her with the same no-filter approach you take with people you don’t know? If not, why not?” My response remains that I don’t think that pointing out statistical flaws in a published article is “going after” or “persecuting” anyone. Part of being an effective scientist is being able to make use of criticism. My sister’s work gets criticized from time to time and she can handle it.

    • I know this wasn’t directed towards me, but for the record, I would title that article “Why I’m Smarter than my Brother”. Then again, that jerk won a National Academy of Sciences award for his friggin Maters thesis. I will never forgive him.

Leave a Reply

Your email address will not be published. Required fields are marked *