Exaggeration of Science

Communicating scientific results to the public is difficult, even with the best intentions. There are all kinds of subtleties in any study which don’t make it into media coverage. Furthermore, caveats about interpreting results get lost along the way. A recent study,”The association between exaggeration in health related science news and academic press releases: retrospective observational study” [1], looked at the correlation between exaggerated claims in press releases and subsequent media coverage. As part of that study, they examined the media coverage of about 500 health-related articles, as well as press releases put out by universities themselves. 

that this dataset has another potential use. One can look at the press releases. This removes the element of the media, and just focuses on how scientific institutions themselves are (mis)representing their work. That’s what I did here. Spoiler alert: The problem is systemic and I didn’t see any specific villains.

And lest I be accused of exaggeration myself, I should point out some major limitations. First and foremost, I’m relying on the coding that was done by the paper above. Second, my author-based results are based on web-scraping and there likely are at least a few errors (A small number of mistakes won’t affect the overall statistics but it does mean one should double-check before picking on a specific author). And lastly, all that I’m measuring here is correlation between universities/authors and exaggerated press releases. As Goldacre pointed out, press releases don’t have listed authors, so we can’t know who exactly is responsible for writing them; we certainly can’t know if misleading statements were intentional or unintentional.

Continue reading

Facebooktwittergoogle_plusredditpinterestlinkedinmailby feather
  1. [1]Sumner Petroc, Vivian-Griffiths Solveiga, BoivinJacky, Williams Andy, Venetis Christos A, DaviesAimée et al. The association between exaggeration in health related science news and academic press releases: retrospective observational study
  2. [2]Goldacre Ben. Preventing bad reporting on health research
Posted in Science Publishing | Tagged , | 2 Comments

Early Ebola Intervention

As I’ve alluded to in previous posts, I’m a big believer in being rational about charity. Ideally, one has several independent randomized controlled trials on which to decide how cost-effective an intervention. But sometimes that just isn’t possible. Disease outbreaks are a perfect example. Each one is different, and by the time one is able to study the situation, so much damage has been done.

It’s now believed that the current Ebola outbreak in West Africa started in Dec. 2013. Ebola had never been seen in this part of Africa before, so there was no reason to expect it.  The only way this could’ve been stopped at that point is if the entire continent of Africa were educated on recognizing Ebola, and had stockpiles of testing supplies. To say that’s unrealistic is a dramatic understatement.

In March 2014 was when the first cases were confirmed, and when MSF declared an outbreak [1]. This is the point where a massive intervention would be most efficacious. The number of cases was <100. Say each of those cases, and 10 of their closest friends were tested, so maybe 1,000 tests. At $100/test[2] that’s $100,000, which would’ve ended up saving at least 5,000 lives (and counting!). Even if each of those individuals needed to spend a night in quarantine, that’s likely another $100, so we’re up to $200,000 for 5,000 lives, or $40/life.

This assumes that the quarantine capacity already exists, and that rapid testing facilities are available. One could imagine the cost increasing 10-100x (up to $20 million).  Still $4,000/life saved seems pretty attractive. The reason it’s so cost-effective is that the intervention needed to happen early, before it could possibly be justified. If CDC had gotten involved with the required material and personnel, and some rather harsh mandatory testing and quarantine procedures were used, many lives could have been saved. And then people would be questioning why so much money was spend (and civil liberties violated) over something which turned out to be a non-issue. Was it worth it to buy that expensive umbrella when you didn’t end up getting wet?

Granted, I have the benefit of hindsight in the other direction. After seeing that this outbreak was so terrible, it’s easy to say that somebody should’ve done something earlier. All previous Ebola virus outbreaks died out after a few hundred fatalities; so throwing lots of money at it early on could’ve seemed premature. Especially when you consider that even now, deaths caused by HIV/AIDS and Malaria are on par with those caused by Ebola this year [3]. It’s difficult to prepare for some unknown, low-likelihood emergency when the day-to-day problems are so large.

Which is why the CDC, WHO, and international community should’ve gotten involved much earlier. For a long time only Doctors Without Borders was doing anything substantial[4], and they just didn’t have the resources needed. The US recently committed 3,000 troops, and up to $500 million[5]. Pay a little now or pay a lot later.

-Jacob

 

Facebooktwittergoogle_plusredditpinterestlinkedinmailby feather
  1. [1]http://www.msf.org.uk/article/guinea-ebola-epidemic-declared
  2. [2]http://www.bostonglobe.com/opinion/2014/10/11/stop-ebola-epidemic-must-able-diagnose-quickly/LFWpKNwHTGPqfcWRyKOKqK/story.html
  3. [3]https://i.imgur.com/At2nqgB.png
  4. [4]http://newsinfo.inquirer.net/613145/doctors-without-borders-ebola-out-of-control#ixzz3IAPjJjiL
  5. [5]http://time.com/3380545/u-s-to-commit-500-million-deploy-3000-troops-in-ebola-fight/
Posted in Public Health | Leave a comment

Anatomy of a hashtag

Fathers day was a little over a week ago. A few days before, the greatest hashtag in the world appeared: #EndFathersDay. The rise and fall happened in a remarkably short time:

time_histogram

To be fair I only captured data until saturday afternoon, but it’s pretty clear that all the action was over by then.

At face value, the meaning of the tag was that fathers day is a tool of the patriarchy, or that men are evil and don’t deserve to be celebrated. Most people seemed to know better, and realize that it was started by trolls [1].  But once the radfems saw it, they started repeating it, right?

Theory A: Radfem starts -> Other radfems retweet -> #EndFathersDay now trending -> Hijinks ensue

Theory B: Troll account(s) starts -> Other radfems are duped, retweet -> #EndFathersDay now trending -> Hijinks ensue

I decided to investigate the sentiment behind the hashtag. See, all “trending” means is that people are using the hashtag a lot. Twitter doesn’t filter out all the tweets that go “#EndFathersDay is stupid”, they still count. So I wanted to see how many of these “#EndFathersDay” tweets seemed like they actually were in support of ending fathers day.

Total recorded tweets: 67218
Total unique accounts: 39973

Most of these are retweets, possible with an emoticon or “lol wut?” tacked on. Because stuff like that is hard on text classifiers, I decided to remove them from the dataset.

Total tweets, retweets excluded: 22214
Total unique accounts, retweets excluded: 15744

I trained a classifier (see Methods) and applied it to the non-retweet dataset. Since I required a pretty high confidence to decide on a classification, many tweets were excluded. Any tweet which was classified as “For” I manually reviewed; most were false positives.

Tweets For:  0.95% (83)
Tweets Against: 99.05% (8774)

Total: 8857

Accounts For: 0.47% (35)
Accounts Against: 99.54% (7403)

Total: 7438

First thing to notice is that the percentage tweets/accounts “for” is very small. I used percentages here because I only analyzed a subset of the data, so percentages are more informative.

Second thing to notice is that the “for” accounts were more active than the “against”. An average of ~2 tweets per account, the “against” accounts are ~1.2. So the fact that these accounts were a bit more vocal might skew the casual observer.

The accuracy of the machine-classified results is certainly questionable. Since “for” tweets were so rare, even a small error rate could skew the results. We can look at only the manually coded results. The training set was ~400 early tweets:

Tweets For: 95
Tweets Against: 310

Accounts For: 30
Accounts Against: 255

25% of the tweets are “for”. These were the first tweets. I also looked at the last 200 tweets I found NONE which supported ending fathers day. We also see here that the “for” accounts had 3 tweets on average, compared to ~1.2 for the “against”.

"For" tweets by time

Most of the “for” tweets are early on. All it takes is a few people to get the ball rolling, once an offensive hashtag is trending it gets sustained by all the people who are against it:

cycle_of_terrible

A shockingly small number of people were on the straightforward side of this tag. If we generalize the 0.5% number, out of the total ~40,000 accounts which participated, only 200 were “for”. Definitely possible they were all (or mostly) trolls.

 

Methods

I went through the first ~500 or so and coded them as either “for” or “against”. There were different variations of “against”, lots of people said things like “grabbing the popcorn” or “is this serious”. So basically, “against” was anything other than “for”.

This coded set was used on a maximum entropy classifier from the outstanding nltk library. Reserving 30% of the coded set for testing, I saw about an 85% accuracy rate. Not bad. I tried to get it reliably above 90%, but no such luck.

I trained the classifier 5 times, randomly choosing a test set each time, and used it to determine the probability of each class for each tweet, giving a total of 5 classifications (and probabilities) for each tweet.

To determine a “consensus” classification, I required that at least 4 of the classifications agree with a minimum probability of 0.8. The value of 0.8 was chosen based on the following plot:

accuracy_rates_log

Here, a “positive” means classifying a tweet as “for” (this was the more rare condition so I used it as the focus). One can where the true positive rate starts to drop off. Since I wanted to keep as much data as possible, I picked a cutoff value right where the true positive rate starts to drop (0.9). The false positive rate is plotted as 0.001 but that’s just for graphing convenience, all I can really say is it’s less than ~0.002 (the number I manually coded).

 

 

Facebooktwittergoogle_plusredditpinterestlinkedinmailby feather
  1. [1]  4chan /pol/ thought it would be a hoot to spread this hashtag. But the very first tweets predate that 4chan post, so who knows who really started it 
Posted in Social Media, Twitter | Leave a comment

Properties of angry speech

Note: This post contains profanity

Sit down if you’re standing: There’s a lot of angry speech on the internet. There’s a lot of regular speech too. For exact meaning, the order and context of words is critical, but for general tone one can get pretty far just by looking at word choice.

There were two text corpora analyzed here:

0. The “Blog Authorship Corpus” [1]

1. The text from an internet rant site.

“Rant” sites are an anonymous way for people to vent their anger online, the theory being it is cathartic to express that anger in a safe setting. It also provides a nice dataset of text which is guaranteed to have much more anger than normal.

Methods: The blog corpus had formatting stripped anyway. I lowercased all the text, removed punctuation, and simply counted all the unique words. No spelling correction or stemming was done. Words in the Python Natural Language Toolkit (nltk) “stopwords” corpus were removed.

Wordclouds from each are shown below:

Word Cloud from Blog Corpus

Word Cloud from Blog Corpus

Word Cloud from Rant Site

Word Cloud from Rant Site

It’s a bit hard to interpret these, so here are bar charts from the top 20 words:

Top 20 words from Blog Corpus

Top 20 words from Blog Corpus

Top 20 Words from Rant Site Corpus

Top 20 Words from Rant Site Corpus

See the difference? It took me a minute too. For the most part they are very similar. “fuck”, “fucking”, and “hate” show up in the rant corpus and not in the blog corpus, leading to the unsurprising conclusion that people use the words a lot more when they’re angry.

To better illustrate the difference, I ranked each word, took the difference between ranks, and plotted those which had the highest difference. For instance, if “fuck” was the 10th most commonly used word in the rant site, and 200th most commonly used word in the blog corpus, it would have a difference of 190. The words with the highest rank difference are shown below.

Difference in Rank Between Rant Corpus and Blog Corpus

Difference in Rank Between Rant Corpus and Blog Corpus

The top differences are colloquialisms. “be” would be stripped out as it is a stopword, but “bee” would not. This could reflect a difference in the userbase of these two sites rather than the emotional state of the users. Both corpuses contained “people” in their top twenty words, but “peeps” was much more frequently used in the rant site corpus.

Conclusions? I was surprised at how similar these two corpora were. It could be that bloggers are angrier than I give them credit for, or it could be that the hallmarks of angry speech are subtler than I expected. Or that people don’t bother to find creative ways to say “stupid fucking shit”.

-Jacob

Facebooktwittergoogle_plusredditpinterestlinkedinmailby feather
  1. [1]

    J. Schler, M. Koppel, S. Argamon and J. Pennebaker (2006). Effects of Age and Gender on Blogging in Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs. (pdf)

    http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm

Posted in Social Media, Text Mining | Leave a comment

Lifetime of Fortune 500 Companies

I often find myself discussing what the future world will look like. I might wonder out loud what will come after Facebook or Google or whatever. Frequently the response is “nothing”. As in, these companies are so huge that they must simply last forever.

Well my quantitative brain simply will not abide by that. Taking to the extreme, I highly doubt Google will still exist in 1,000 years. It might, but I doubt it. There are some companies [1] which are seriously old, and these seem to be small-ish companies sustained by family tradition.

To examine the big company -> last long argument, I went to the Fortune 500 list. Money magazine has data from 1955 [2]. For technical reasons I only examined data from 1955 to 2007. The basic question I was looking at is how long will a company that’s on the list one year stay on the list. That plot, for various starting years, is shown below.

Number of companies still on list

Number of companies still on list, by starting year

Number of companies still on last as a function of time after start

Number of companies still on last as a function of time after start

These plots are very similar, the second uses an x-axis of relative time instead of absolute so that the different years can be more easily compared. I’m not sure what happened in 1994-1995, so that massive drop might be an artifact or a change in methodology.

I should also mention that a firm changing their name, or undergoing a merger, would have the appearance of them disappearing for the purposes of this graph. So this metric is more about the changing of business landscape than whether a company still exists or not. For example, Twitter being bought by Facebook would be much different than Twitter going out of business, but both are much different than Twitter continuing to exist as an independent entity.

We see that the “half-life” of companies is about 25 years. A decent amount of time, but still much less than the average lifetime of a human being. Think about that next time somebody complains about big businesses not offering pension plans.

One might expect the larger firms to survive longer. I’m sure this is true, but if GM became the new Kia I think that counts as a big change in the business landscape. This next graph collects the rank of each company in 1955, and organizes them into quintiles, and demonstrates how long those companies remain in at least that quintile. So GM, ranked #1 in 1955, would need to stay in the top 100 (it had a rank of #3 in 2007, so it passed that test).

How many companies are at their starting quintile or higher

How many companies are at their starting (1955) quintile or higher

This methodology will slightly advantage those companies with a lower starting rank, as they just need to stay on the list and still have plenty of growth. However, we see that reality has a way of penalizing smaller companies. The top quintile has the longest half-life, about 27 years, while the lowest is only 5. Whether this is due to smaller companies being more likely to fail, or simply be acquired, I don’t know.

In summary, the market is a harsh mistress. Everybody knows that nothing lasts forever, but forever can be a pretty short time.

-Jacob

Facebooktwittergoogle_plusredditpinterestlinkedinmailby feather
  1. [1] http://en.wikipedia.org/wiki/List_of_oldest_companies 
  2. [2]http://money.cnn.com/magazines/fortune/fortune500/
Posted in Economics | Leave a comment