Plotting Hubway trip data

Hubway opened up in Cambridge this summer, a bike-rental service. $5 for a half-hour trip, buying a yearly membership gets you unlimited half-hour trips. Not a bad idea. I’ve used the service exactly once, but it was handy.

They just released 2 years worth of trip data (and station status data, # occupied and all that, but as of this writing I haven’t made use of it) and asked the community (that’s me!) to visualize it in creative ways. My creativity is limited, but I do like tinkering, especially things unrelated to my day job.

The web app is here. If you’re not interested in the tech details of building it, stop reading now.

The most important element is the Google Maps API. I don’t have much to say about it because it’s all positive; very easy to use, does everything that I needed, and good performance. Which is great, it would’ve been hard to build this app without a good mapping API.

Google also has an experimental product called Fusion Tables, which is basically an easy-to-use database. Even for the non-programmer it can be useful, they have built-in facilities for generating interactive charts which can then be embedded in a page. Even geographic data can sometimes be done automatically; data can be “geocoded” so you enter “35 Binney Street” and Google figures out where it is. I was lucky enough to have latitude and longitude data.

Because I customized the visualization I used their API and did the drawing myself, but it was a huge time-saver not to have to set up a database and server. I have my own hosting so it wouldn’t have been the biggest deal, but still, one fewer thing to worry about.

Google explicitly and repeatedly states that Fusion is still experimental, and I can understand why. In terms of reliability, I’d guess it’s at about 90-95%; during testing a fair amount of requests get server errors. Usually on the initial page-load, which is just about the worst time. This is tolerable for my app, which doesn’t cost or make me any money, but for a more serious application it would just not be good enough. I imagine they’re working on it.

Getting data from a fusion table follows basic SQL syntax, which is something most programmers have experience with, and even if they don’t the simple features (which is all Fusion supports) are intuitive. One pretty nifty feature is location-based filtering, which makes it dead simple to look for the items in your database closest to a given location. I guess I could have done something like figure out which Hubway stations are within 100 meters of a Dunkin Donuts….but I didn’t.

One thing that really irked me was how it handles times. Which is to say, it doesn’t. Their help pageĀ on DATETIME formats lists a bunch of date formats, no mention of time is made. I tried several formats but each time Fusion seemed to treat it as a string. Which isn’t so bad, yyyy-mm-dd has has the same order chronologically as lexicographically, ditto for 24-hour HH:MM. The only snafu is dealing with time intervals around midnight, which I ended up excluding from the app. I sent some feedback about this to Google, and they actually responded asking for clarification, hopefully I provided enough detail and they’ll address it. I was rather surprised they have location based queries but not time-based queries.

The other feature they are sorely lacking, although at least this is properly documented, is the boolean OR operator. So I can’t search for “start_time < 00:15 OR start_time >= 23:45”. They have an IN operator (which I used for days of week), and one could hack something up that way, but I couldn’t bring myself to generate an IN clause that long. Call me irrational if you must.

Other than that, the Google APIs worked pretty well. The population data took a bit of finagling to get it into the right format [1] but it wasn’t too bad. Adding the population and bike path overlays was extremely easy. The population overlay can be styled in the Fusion GUI, since this isn’t the main data I decided not to give the user controls over it.

Next time: Ideas for ways to analyze this data that I never implemented.





  1. [1] Maps takes all their numbers in order latitude, longitude and in KML it’s longitude, latitude. I wish Google had been a bit more consistent.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *