From Elle O’Brien comes this amusing example of a classification problem using data downloaded from the internet, with lots of detail on how to download and work with the data. Could be good for your statistics and data science classes.
And then, just for fun, check out her Komar-and-Melamid-style chocolate cookie recipes.
Isn’t it interesting that your astrological sign predicted you would be humorous long before you were born?
How would you even have an astrological sign before you’re born?
Rsm:
No more trolling, please. This blog is not the place for that. For trolling you can go on to twitter, reddit, 4chan, etc.
In general it seems like there are a lot of underexplored research opportunities for using Reddit data.
We have this rich dataset of millions of conversations that is just sitting there underutilized.
One interesting question for instance would be to try to predict what makes some comments get more interactions than others (upvotes, replies, downvotes, etc).
Is it possible to “engineer” a way of writing comments that gets a lot more engagement than normal?
Ethan:
Yes, good point: these are interesting questions!
“Is it possible to “engineer” a way of writing comments that gets a lot more engagement than normal?”
On Reddit? :)) Don’t need software for that.
Andrew wrote, “Could be good for your statistics and data science classes.” Noting that Control F claims the word “asshole” appears 41 times in Elle O’Brien’s article, the average administration might look unkindly on such “with it” assignments. If you disagree with my last sentence, let Andrew know how things went when you have had the courage of her convictions.
It’s kind of charming that in 2022 someone would think the use or the word “asshole” in this context should be prohibited or even strongly discouraged. I don’t doubt that you’re right, Paul — there’s surely some fuddy-duddy administrator who thinks that’s over the line. Fortunately, there’s an easy solution: replace “asshole” with “scoundrel” everywhere. Problem solved!
Perhaps replace it with the British spelling – “arsehole”.
There’s another blog post on this topic here: https://www.nathancunn.com/2019-04-04-am-i-the-asshole/. I am interested in inter-rater agreement and ran an analysis using the method describe here: https://www.researchgate.net/publication/281652365_A_GEOMETRIC_APPROACH_TO_CONDITIONAL_INTER-RATER_AGREEMENT The results are here: https://furman.box.com/s/c7k03w8ef9ehfsxn1ws6xgzd9mrkhae6
TLDR; the voters on Reddit aren’t voting independently, but if we make that assumption anyway, the agreement is way too high to be chance, e.g. on the all-important “you are-” vs “you are not-” distinction. So some combination of independent thinking and considering what others have written leads to a working consensus.