For the last couple of days I have been wondering what I could do to save on my commute time. Everyday I travel ~25km, and in Bangalore traffic it takes aways 90mins of my time. Of course with experience we learn which route is better and when traffic is lighter, but it would be good if I had data backing and help save few more minutes.
I wanted to be able to answer questions like:
- Can I start a little late on Mondays than Friday?
- What is the optimum time to start my commute?
- Which route is good for which day?
- If it rained an hour back, how does it affect my commute?
- How is the traffic different for different months?
- At what rate is the traffic slowing me down every month?
- When should I take out my car instead of motorcycle?
… and more
I started thinking of doing a small hobby project – a small analytics platform for helping me collect data and analyze. Initially thought of building a Rasperry Pi data logger, but decided to start with a cheaper version using my android phone. There are existing android apps that track your location and can plot a map or speed chart – But I wanted raw data so that I have the flexibility to come up with my own queries.
I wrote a native android app that I can start when I start my commute every day and stop at the end. The app takes the location data from the device GPS and stores it in it’s embedded SQLite database. Later whenever my device is connected to the internet, the app allows me to post the collected data to a cloud storage.
I wanted to store collected data in a Cloud Database so that data sync is available easily. I wrote a very simple REST API that allows posting JSON and hosted the server in Heroku. This API receives logged data from the mobile device and in turn persists in MongoHQ who generously gave out 512MB of free storage. For now I felt this is sufficient and wanted lower cost than scalability.
My requirement here is to be able fetch data from my Mongo cloud, slice and dice it on the fly, be able to plot charts and do some statistical analysis in an interactive way. I used Python and it’s a great fit here – it has a mongodb client, matplotlib for plotting charts, almost no learning curve compared to R, IPython and notebook interfaces along with plenty of modules for data analytics.
I took a short ride out in the night – so there wasn’t traffic. But I did halt a minute in-between and came back. There is still some scope for calibrating the logging frequency as I found out my logger showed 1.2km in total against my bike showing 1.6km. But quite a good test run to start with as it did capture my halt!
The plan is to keep collecting data, try some predictive analytics with this and keep exploring insights.
If you would like to collect and play with your commute data feel free to fork my project on GitHub: