Category Archives: Text Mining

Classifying and visualizing with fastText and tSNE

Previously I wrote a three-part series on classifying text, in which I walked through the creation of a text classifier from the bottom up. It was interesting but it was purely an academic exercise. Here I’m going to use methods … Continue reading

Posted in Machine Learning, Statistics, Text Mining, Uncategorized, Visualizations | 1 Comment

More on the Bechdel Test

I gave some theoretical insights on the Bechdel test in a previous post, but silly me, of course there is real data! The Cornell Movie-Dialogs Corpus contains conversations between characters in 617 movies. Conversations in this corpus are already separated, … Continue reading

Posted in Text Mining | Leave a comment

Subreddit Map

Reddit describes itself as the “front page of the internet”, and given how many users it has, that’s not too far off. It’s divided into subreddits, which can have either broad or narrow topics. These subreddits are (mostly) user-created, with … Continue reading

Posted in reddit, Social Media, Text Mining | 2 Comments

Properties of angry speech

Note: This post contains profanity Sit down if you’re standing: There’s a lot of angry speech on the internet. There’s a lot of regular speech too. For exact meaning, the order and context of words is critical, but for general┬átone … Continue reading

Posted in Social Media, Text Mining | Leave a comment

I before e

Despite the fact that English is an absolutely terrible language and nobody should speak it, people still do. So to cope with the many irregularities and near impossibility of getting anything right, people try to come up with catchy rhymes, … Continue reading

Posted in Text Mining, Uncategorized | Leave a comment