Difficulties in communication, the reader is always right, etc etc

From comments to my recent 538 post, I’ve learned the following:

1. I don’t understand logarithms.

2. I really don’t understand logarithms.

3. I don’t know how to use Wikiipedia.

4. I don’t know that “percent” means “divided by 100”.

5. I don’t know the difference between correlation and R-squared.

My first reaction is to respond in a snippy and sarcastic way, but when it comes to writing, the reader is (almost) always right: When someone misunderstands something I wrote, this tells me I was being unclear.

Points 1-5 above are all wrong, but they are all reasonable conclusions to be drawn from a casual reading of my post–and when writing a blog post, we certainly can’t demand more than a casual reading. On the other hand, someone familiar with my work (for example, a regular reader of my regular blog) would probably want more evidence before jumping to the conclusion that I don’t know about logarithms, correlation, R-squared, and so forth.

The real message for me is that, when communicating outside this blog and outside of technical venues (scientific journals and the like), I need to be redundant around possible points of confusion.

For example, in the above-linked post, I noted a nonlinear pattern in a graph. I could tell right away, just based on my background knowledge, that the logarithmic transformation, which was involved in some of the calculations, was not causing the nonlinearity. So I didn’t even bother mentioning it. This would be OK for general audiences who probably wouldn’t think about this issue anyway–the log transformation betting essentially irrelevant hare, and so I wouldn’t even bring it up. And it would be OK in class–if any student happened to focus on the log, I could redo the graph then and there, to demonstrate how it all worked. But on the blog, people can read it, think what they want, and jump to their own conclusions. If I don’t want to be misunderstood, I would need to put in extra sentences rounding the sharp corners, as it were, anticipating mistakes that the readers might make. It’s not clear it’s worth the effort–I’m doing this all for free, after all–but I think that’s what it takes.

Now for the details

In case people care . . .

1 and 2. The log transformation doesn’t have a big effect on the shape of that curve, because all the data fall in the range of a factor of 2 on income. This was an important enough point that I added a parenthetical note to the blog entry (“rounding the sharp corner”) and most of the later commenters seem to have gotten the point.

3. Yes, I looked at Wikipedia and several other sources, many of which were linked in the post. As I’d explained there, different sources gave different formulas, and it wasn’t clear exactly how these numbers were created. I had actually played around with the formula on the Wikipedia page but couldn’t find all the numbers, then I saw an official-looking document that had a much different formula. I’d thought the general point was clear in my blog entry, but I guess another sentence would’ve helped with this.

4. Percent means “dvided by 100.” 0.86 = 86%, -.10 = -10%, etc. Correlations can be anywhere between -100% and 100%. I guess if I really wanted to explain this, I could’ve said “the correlation is .86, or 86%,” although I have to admit that seems like overkill to me.

5. If the R-squared were 86%, then the correlation would be sqrt(.86) = .93. The numbers on that second graph are highly correlated, but no way do they have a correlation of .93. Decades of statistical analysis make this clear to me at a glance.

The funny thing about all the above mistaken criticisms is that they require knowledge, they’re not completely ignorant comments. For example, you have to know something about statistics and mathematics to know that you’re not supposed to write a correlation of .86 as 86%, or to have heard about phrases such as “the regression explains…” or to think about the log transformation.

It’s important to communicate to this middle range of people, who know enough rules to over-think things and get confused. For one thing, these people are already thinking about the problem, so the most difficult step has already been taken.

P.S. Yeah, sure, maybe I shouldn’t read the comments in the first place. But I’d like to improve my writing!

P.P.S. I came across another one, this time a commenter to Megan McArdle’s blog who helpfully added a link that he hadn’t noticed was in my original post (but, as I’d explained but he unfortunately also didn’t notice, presented much different numbers (with different rankings) from those used in the map that got the discussion started).

This last commenter made a mistake that I’ve been noticing a lot lately, which is to find a single piece of documentation on the web and assume it’s correct, without checking it against other numbers out there. I thought I’d made this point clear in my post, but perhaps in the future I should use bold font so people don’t miss the point. Or maybe I’m just too thin-skinned: with dozens of comments spread across three different blogs, I shouldn’t be so upset that one person made this mistake.

10 thoughts on “Difficulties in communication, the reader is always right, etc etc

  1. As commenter #5 I can just say… well, maybe I should have thought twice before posting that.
    Or asked someone with better knowledge of statistics. Whatever it is, I apologize for thinking that you could mistakes.

    Thanks for that lesson,

  2. 'that you could mistakes'?
    Wow, this whole affair becomes worse and worse for me. I meant to say 'that you could make such a basic mistake'.

  3. I think it's very valuable that you engage with how to communicate in a better way – I'd really wish more "quants" would care as deeply about being understandable and understood.
    That said, blog audiences are not just pretty short in their attention span, they are (I believe) also most likely to leave comments if they can come up with something short and snarky. So what you're seeing is – in many ways – selection bias: The many people who get what you're writing aren't commenting…

  4. I can agree with

    "the reader is (almost) always right: When someone misunderstands something I wrote, this tells me I was being unclear,"

    but I'm not so sure how representative these commenters are of your readership in general. You may irritate many readers with "the correlation is .86, or 86%," while just making a single commenter happy by doing so.

  5. I thought some of the points above at a first glance. However, I knew you wouldn't do some basic mistakes and didn't comment at all.

    I think it is nice that we try to make ourselves more clear!

  6. someone familiar with my work (for example, a regular reader of my regular blog) would probably want more evidence before jumping to the conclusion that I don't know about logarithms, correlation, R-squared, and so forth.

    I enjoyed this line for some reason.

  7. Yes and no regarding the reader's points mean you didn't communicate perfectly. As an example, on a law school admissions board, a person (not me) said no one gets into Northwestern's law school out of undergrad. He may have used the word "impossible," which I would think is obviously a conversational over-statement. That word turned into a series of comments which harped on the inaccuracy of that simple, of the moment choice of wording because about 3% of a class comes in from undergrad. This isn't academic or other rigorous writing; these are blog posts. If people don't cut you slack, that isn't your fault.

    My only suggestion is that 538 have a link to a bio or to this blog so you could then say, "Did you check my bio before you wrote that I don't know stats?"

    BTW, you must have noticed that Justin Wolfers says the correlation he found in the international data is "a massive 95 percent!" Guess he can't divide by 100 either.

  8. andrew, you have a lot of self control. I almost always just post my snippy and sarcastic comment — and then regret it 15 minutes later.

Comments are closed.