Category Archives: Statistics

Classifying and visualizing with fastText and tSNE

Previously I wrote a three-part series on classifying text, in which I walked through the creation of a text classifier from the bottom up. It was interesting but it was purely an academic exercise. Here I’m going to use methods … Continue reading

Posted in Machine Learning, Statistics, Text Mining, Uncategorized, Visualizations | 1 Comment

Estimating active reddit users

I’m always curious about how much activity subreddits have, and how the comments are representative of the userbase. It’s well known that the majority of people are lurkers, who just view content but don’t vote or comment. Some subset of … Continue reading

Posted in reddit, Statistics | Leave a comment

Some musings on statistics

A) Beware of The Wrong Summary Statistics SlateStarCodex had a pretty interesting post entitle “Beware of Summary Statistics“, showing how they can be misleading. This isn’t exactly new, there are famous examples of how just looking at the mean and standard deviation greatly … Continue reading

Posted in Statistics | Leave a comment