Thinking about this beautiful text sentiment visualizer yields a surprising insight about statistical graphics

Lucas Estevem set up this website in d3 as his final project in our statistical communication and graphics class this spring.

Screen Shot 2015-12-10 at 5.11.07 PM

Copy any text into the window, push the button, and you get this clean and attractive display showing the estimated positivity or negativity of each sentence. The length of each bar is some continuously-scaled estimate of the sentiment, and the width is proportional to the length of the sentence.

But what’s it for?

This is great. And it also leads to the surprisingly subtle question: What’s the use of this tool?

The most obvious answer is, Duh, you use it to visualize a text sentiment analysis.

But I don’t think that’s the right answer. To see why, we must first ask ourselves why we want to estimate text sentiments in the first place. Why would someone want this tool? It’s not so it will help us read texts. No. I think the reason you’d want to estimate the sentiments in sentences of text is if for some reason you want to be classifying a large number of documents and getting a quick summary of each. In which case, what do you get out of a visualization? It won’t be particularly useful as part of a big loop.

No. What you get out of visualization is model checking, as described in my 2003 article on the Bayesian foundations of exploratory data analysis. The value of a display such as the one above

An interactive display is particularly valuable because we can try out different texts, or even alter the existing document word by word, in order to reverse-engineer the sentiment analyzer and see how it works. The sentiment analyzer is far from perfect, and being able to look inside in this way can give us insight into where it will be useful, where it might mislead, and how it might be improved.

Visualization. It’s not just about showing off. It’s a tool for discovering and learning about anomalies.

P.S. It would also be good to have a link to the source code of the sentiment analyzer and also a document explaining how it works and giving details on the data that were used to train it.

18 thoughts on “Thinking about this beautiful text sentiment visualizer yields a surprising insight about statistical graphics

  1. Very cool! Tested “In the beginning was the Word, and the Word was with God, and the Word was God.” and it was classified as negative! A bad start? I had better reverse-engineer this…

  2. Good as a graphics demo. The “by line” mode is easier to follow — otherwise it’s a bit tricky to figure out which sentence is which bar.

    It’s a graphics project, and the text analyzer seems to give some counterintuitive results; it likely is word based. For example, In StefanP’s example, it may be that “God” in a sentence is negative because “Oh, my God” would generally occur in a negative context.

    Similarly, running a corporate blog post through this shows
    “drastically increase ROAS and eliminate waste.” [ROAS is return on advertising spending]
    as a strong negative, probably because of the words “drastically” and “waste”.

    BUT: the sentiments are generally on target and the graphic illustration is easy to see. Since that’s the point of the graphic design, I’m impressed. In the corporate blog post I clearly see the expected structure: starting off negative to set up the seriousness of the problem to be solved, followed by generally positive statements about the wonderful corporate solution.

    “there is no such thing as too much data”
    also shows up as a negative sentiment.

    Makes sense to me. I cringe whenever I hear “there is no such thing as too much data”. There absolutely is. Too much data often leads to too little data cleaning and too little thinking about appropriate analyses and the biases that can occur in big data contexts.

  3. Neat.

    Wonderful green ideas sleep very nicely.

    Harsh green ideas sleep furiously.

    Certainly “a [fun] tool for discovering and learning about anomalies.”

  4. Nice post! I’d love to see it underline the text with a colored line. (Perhaps: no underline for neutral, dashed red underline is semi-negative, solid red underline is negative, etc.). Then in the graph on the right, have a cumulative sentiment indicator (the sum of the sentence sentiments to that point). This might make it easier to understand.

  5. As an analyzer of patient satisfaction surveys, here’s my wish: not just the sentiment, but the subject of the sentiment. What are patients complaining about, what are they pleased about? Can the associated causes be extracted from the text, as well as the sentiments? That would be a great quality improvement tool!

    • There’s no reason this shouldn’t be doable. Some basic frequencies of words/phrases on specific questions should give you some idea what topics come up most often.

      On the more advanced side, you can work something up to automatically code/tag surveys into specific categories, perhaps starting with a small batch of hand-coded training data.

  6. For a bit of fun, try these:

    Trump will make America great again.

    Hillary Clinton would be the first female president.

    Bernie Sanders is a socialist.

    Surprised the first and third are positive, the second is negative.

  7. Strange results. [img][/img]
    Politics – positive
    Conservative, liberal, socialist – all negative
    Science, statistics, psychology – they all score negative

    OK, now let’s look at the legendary “Sex & drugs & rock ‘n roll”.
    Sex – positive
    Drugs – negative
    Rock ‘n roll – slightly positive

    Our visualizer has a failry conservative musical taste:
    Jazz- positive
    Classical – positive
    Rock ‘n roll – – slightly positive
    Dance- negative
    Rap – negative

  8. Sentiment is not a property of sentences. It’s only a property of sentences in context. “it’s loud and splashy” might be good for a Bollywood movie, but not for a blender. There’s a really cool paper on this topic:

    And then, sentences aren’t either positive or negative, but can be made up out of clauses that are both. For example, “I love the steering, but hate the transmission” in a review of a new sports car (or more likely, something more nuanced, like “very slight oversteering in corners, and a balky first to second shift that takes a while to compensate for” that comments on subaspects of performance). And it’s not just spans of text, because you can say things like “I liked but Mitzi hated that curry”, where it’s “I liked that curry” and “Mitzi hated that curry”. So at the very least, we should run two binary classifiers, one for positive and one for negative sentiment and allow all four combination of +/- positive and +/- negative.

    Also, we rarely have the problem of labeling sentences with sentiment. If you’re a marketing director trying to estimate sentiment about your product, you’re much more interested in figuring out what people are saying about what aspects of your product. And maybe you care aggregate what they like it or not, but you absolutely don’t care about correctly classifying a vague Tweet. If you’re trying to build program trading by, say, classifying sentiment of Tweets, you care about the overall opinion more than getting each tweet right. The reason this is important is that you may use a biased classifier to determine the base rate by de-biasing based on known (really estimated) sensitivity and specificity of your classifier. Again, the actual message content doesn’t matter so much.

    • Perhaps the nifty visualization tools will help non-linguists realize the futility of trying to analyze affect and connotation from abstracted text, without a working simulation of a theory of mind, intention, and speech acts.

      • I think we’re a long way off from a decent theory of any of those things, but we can still make better NLP tools by taking some of their lessons to heart, like that sentences only have maning in context (if you seriously consider the analytic/synthetic distinction, you’ll realize very few sentences, and almost no sentences of interest, are truly synthetic).

        • Same thing in statistics?

          I think we’re a long way off from a decent theory of any of those things, but we can still make better STATISTICAL tools by taking some of their lessons to heart

        • I meant to say that most of the field of statistics also appears to be working on point estimation. But I may be very biased from both application-oriented work and the last five years of talks in the Columbia stats dept.

  9. What is this calibrated against?

    e.g. If I analyse the text of this blog-post itself I get a max negative sentiment for this line: “The length of each bar is some continuously-scaled estimate of the sentiment, and the width is proportional to the length of the sentence.”

    How do we explain this?

    Are there recognized corpus of text by which to test tools like this?

  10. It’s a very interesting tool but it does poorly to distinguish “bad sentiment” from “nuance”, “introspection”, and “detail”. I took the first couple of pages from Crime and Punishment, Oliver Twist, Candide and Steppenwolf. Candide, also titled “Optimism” did best, apparently evoking good sentiment. However, the text chosen was rather dull (Voltaire intentionally aimed at superficiality) relative to the other books mentioned. The passage from Steppenwolf was about 50/50 good/bad sentiment while the passage from Crime and Punishment was over 90% bad sentiment. When you examine the content, Crime and Punishment is deeply introspective while Steppenwolf is rather banal or matter-of-fact in the beginning pages. The passage from Oliver Twist evoked bad sentiment just about as much as the passage from Crime and Punishment. However, there was hardly anything introspective in it. Instead the passage had lots of detail, parenthesis and lateral thinking.

Leave a Reply

Your email address will not be published. Required fields are marked *