Fathers day was a little over a week ago. A few days before, the greatest hashtag in the world appeared: #EndFathersDay. The rise and fall happened in a remarkably short time:
To be fair I only captured data until saturday afternoon, but it’s pretty clear that all the action was over by then.
At face value, the meaning of the tag was that fathers day is a tool of the patriarchy, or that men are evil and don’t deserve to be celebrated. Most people seemed to know better, and realize that it was started by trolls . But once the radfems saw it, they started repeating it, right?
Theory A: Radfem starts -> Other radfems retweet -> #EndFathersDay now trending -> Hijinks ensue
Theory B: Troll account(s) starts -> Other radfems are duped, retweet -> #EndFathersDay now trending -> Hijinks ensue
I decided to investigate the sentiment behind the hashtag. See, all “trending” means is that people are using the hashtag a lot. Twitter doesn’t filter out all the tweets that go “#EndFathersDay is stupid”, they still count. So I wanted to see how many of these “#EndFathersDay” tweets seemed like they actually were in support of ending fathers day.
Total recorded tweets: 67218
Total unique accounts: 39973
Most of these are retweets, possible with an emoticon or “lol wut?” tacked on. Because stuff like that is hard on text classifiers, I decided to remove them from the dataset.
Total tweets, retweets excluded: 22214
Total unique accounts, retweets excluded: 15744
I trained a classifier (see Methods) and applied it to the non-retweet dataset. Since I required a pretty high confidence to decide on a classification, many tweets were excluded. Any tweet which was classified as “For” I manually reviewed; most were false positives.
Tweets For: 0.95% (83)
Tweets Against: 99.05% (8774)
Accounts For: 0.47% (35)
Accounts Against: 99.54% (7403)
First thing to notice is that the percentage tweets/accounts “for” is very small. I used percentages here because I only analyzed a subset of the data, so percentages are more informative.
Second thing to notice is that the “for” accounts were more active than the “against”. An average of ~2 tweets per account, the “against” accounts are ~1.2. So the fact that these accounts were a bit more vocal might skew the casual observer.
The accuracy of the machine-classified results is certainly questionable. Since “for” tweets were so rare, even a small error rate could skew the results. We can look at only the manually coded results. The training set was ~400 early tweets:
Tweets For: 95
Tweets Against: 310
Accounts For: 30
Accounts Against: 255
25% of the tweets are “for”. These were the first tweets. I also looked at the last 200 tweets I found NONE which supported ending fathers day. We also see here that the “for” accounts had 3 tweets on average, compared to ~1.2 for the “against”.
Most of the “for” tweets are early on. All it takes is a few people to get the ball rolling, once an offensive hashtag is trending it gets sustained by all the people who are against it:
A shockingly small number of people were on the straightforward side of this tag. If we generalize the 0.5% number, out of the total ~40,000 accounts which participated, only 200 were “for”. Definitely possible they were all (or mostly) trolls.
I went through the first ~500 or so and coded them as either “for” or “against”. There were different variations of “against”, lots of people said things like “grabbing the popcorn” or “is this serious”. So basically, “against” was anything other than “for”.
This coded set was used on a maximum entropy classifier from the outstanding nltk library. Reserving 30% of the coded set for testing, I saw about an 85% accuracy rate. Not bad. I tried to get it reliably above 90%, but no such luck.
I trained the classifier 5 times, randomly choosing a test set each time, and used it to determine the probability of each class for each tweet, giving a total of 5 classifications (and probabilities) for each tweet.
To determine a “consensus” classification, I required that at least 4 of the classifications agree with a minimum probability of 0.8. The value of 0.8 was chosen based on the following plot:
Here, a “positive” means classifying a tweet as “for” (this was the more rare condition so I used it as the focus). One can where the true positive rate starts to drop off. Since I wanted to keep as much data as possible, I picked a cutoff value right where the true positive rate starts to drop (0.9). The false positive rate is plotted as 0.001 but that’s just for graphing convenience, all I can really say is it’s less than ~0.002 (the number I manually coded).