Author Archives: jsilter

Estimating active reddit users

I’m always curious about how much activity subreddits have, and how the comments are representative of the userbase. It’s well known that the majority of people are lurkers, who just view content but don’t vote or comment. Some subset of … Continue reading

Posted in reddit, Statistics | Leave a comment

The comments of the few outweigh the comments of the many

The Pareto Principle for businesses states that 80% of sales come from 20% of customers. Social media has the same skew; the majority of content comes from a minority of users. I’ve always been curious just how skewed this activity can be. … Continue reading

Posted in Uncategorized | Leave a comment

More on the Bechdel Test

I gave some theoretical insights on the Bechdel test in a previous post, but silly me, of course there is real data! The Cornell Movie-Dialogs Corpus contains conversations between characters in 617 movies. Conversations in this corpus are already separated, … Continue reading

Posted in Text Mining | Leave a comment

Some musings on statistics

A) Beware of The Wrong Summary Statistics SlateStarCodex had a pretty interesting post entitle “Beware of Summary Statistics“, showing how they can be misleading. This isn’t exactly new, there are famous examples of how just looking at the mean and standard deviation greatly … Continue reading

Posted in Statistics | Leave a comment

Subreddit Map

Reddit describes itself as the “front page of the internet”, and given how many users it has, that’s not too far off. It’s divided into subreddits, which can have either broad or narrow topics. These subreddits are (mostly) user-created, with … Continue reading

Posted in reddit, Social Media, Text Mining | 2 Comments