Key Word: Cluster Analysis, Python, Google Maps API
This project aims to analyze taxi data in New York City. It uses cluster anaysis to identify the locations with most pick-ups, and the locations generating most lucrative trips. The results are presented using google maps API. It can help taxi drivers to determine where they should wait for the passengers.
NYC cab data is available from the NYC Taxi & Limousine Commission’s Trip Record Data site: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml.
In the demonstration code, March 2016 'Green cabs' data downloaded from above link is used.
Use k-means cluster analysis to identify:
- Pick-up locations with most pick-ups.
- Pick-up locations of Most lucrative trips.
Here we define lucrative trips as those generating the highest fare for least amount time spent.
Code:
- cluster_analysis.py --> location.csv
- cat heatmap-start.txt > heatmap.html
- python latlng.py location1.csv >> heatmap.html
- cat heatmap-end.txt >> heatmap.html
- open heatmap.html
The interactive output can be found in googlemap repository.
Reference: https://github.com/parrt/msan692/blob/master/notes/sfpd.md