salami162/gps-fun
GPS for Fun! How to Cluster Location Data to Find Popular Destinations
Women Who Code - CONNECT Conference Workshop
What does this repo have?
- Data - A list of (lat,lng) coordinates around San Francisco. Look under
gps-fun/data. - A Frontend Page - To visualize the raw data and clustered results on the map.
- Scripting entries - Run clustering algorithms to generate clusters and find the most representative center of a cluster. Centers can be visualized on the frontend page.
Setup
The following steps will help you setup a development environment.
-
Install pip
$ sudo easy_install pip
-
Install virtualenv
$ sudo pip install virtualenv
-
Clone this repository
$ git clone git@github.com:salami162/gps-fun.git
$ cd gps-fun
-
Activate virtual environment
$ virtualenv venv
$ source venv/bin/activate
-
Install the required python packages
$ pip install -r requirements.txt
-
Find Help/Available options
$ python manage.py --help
-
Running the server
python manage.py runserver
This will launch a server on localhost at port 5000. Hit up the index page at http://localhost:5000/
The Start/Stop toggle button on the top right corner is meant to start polling for changes in the trained clusters. Before you hit it the first time, make sure you've run atleast one round of clustering, so as to generate a ./data/trained_output.csv. To run one, see the next step.
-
Running KMeans
python manage.py kmeans -c 4 -src './data/wwc_conf_dataset_tiny.csv' -dest './data/trained_output.csv'
Given a csv file of locations, generates clusters and outputs the cluster centers into another csv file. The above command will output 4 clusters, with the lat/lng of the centers in ./data/trained_output.csv
Recommended Reading
Here's some reading you can do to help familiarize yourself with Clustering, k-means clustering and Hierarchical clustering.
Documentation
sklearn clustering links to docs for the python package that implements various clustering algorithms.