Clustering-Geolocation-Data-Intelligently

My learning outcomes and followup of a well instructed Coursera guided project by Ari Anastassiou.

We were provided with taxi rank location data of North American Region and had to solve a problem of defining the key clusters of these taxis where service stations for all taxis operating in that area can be built.

Project Outline

Task 1: Exploratory Data Analysis

Task 2: Visualizing Geographical Data

Task 3: Clustering Strength / Performance Metric

Task 4: K-Means Clustering

Task 5: DBSCAN

Task 6: HDBSCAN

Task 7: Addressing Outliers

Skills Developed

Visulaization
Machine Learning
Clustering
Data Analysis
Map BuildingVisualizing

Task 1: Exploratory Data Analysis

Understanding the problem and data provided through basic data analysis and visualizations.

Checking for duplicate and empty data cells
Removing the redundant data
Finally plotting the cleared data

Task 2: Visualizing Geographical Data

Trying various interactive means to further improve my learnings about the data.

Plotting the data on the world map with the co-ordinates provided

Task 3: Clustering Strength / Performance Metric

Evaluating the strength of a clustering algorithm.

Calculating the silhouette score
Plotting the graph for various blobs

Task 4: K-Means Clustering

Gaining the theoretical knowledge about k-means clustering algorithm and implementing it for our data.

Visualizing the K-means on sample data
Calculating the best silhouette score for our data
Plotting the data on the basis of the algorithm

Task 5: DBSCAN

Gaining theoretical and practical knowledge of Density-Based Spatial Clustering of Applications with Noise(DBSCAN).

Calculating the best silhouette score for our data
Plotting the data on the map for density based approach

Task 6: HDBSCAN

Gaining theoretical and practical knowledge of Hierarchical DBSCAN or HDBSCAN to alleviate constraints of classical DBSCAN.

Calculating the best silhouette score for our data
Plotting the data on the map for density based approach

Task 7: Addressing Outliers

Addressing outliers classified by various density-based models

Using K-neighbour classifier and calculating its silhouette score
Comparing Hybrid and K-Means Approaches

Outcome

After completing this project I am able to do basic data manipulations required for any data processing field throughly and through various visual means.
Further I got a more deep insight on how various clustering algorithms differ from each other and how I can evaluate their strength on basis of various data.
Lastly this project provided a good insight to how some real world problems can be solved using these means.

digamjain/Clustering-Geolocation-Data-Intelligently