digamjain/Clustering-Geolocation-Data-Intelligently
My learning outcomes and followup of a well instructed Coursera guided project by Ari Anastassiou.
Clustering-Geolocation-Data-Intelligently
My learning outcomes and followup of a well instructed Coursera guided project by Ari Anastassiou.
We were provided with taxi rank location data of North American Region and had to solve a problem of defining the key clusters of these taxis where service stations for all taxis operating in that area can be built.
Project Outline
Task 1: Exploratory Data Analysis
Task 2: Visualizing Geographical Data
Task 3: Clustering Strength / Performance Metric
Task 4: K-Means Clustering
Task 5: DBSCAN
Task 6: HDBSCAN
Task 7: Addressing Outliers
Skills Developed
- Visulaization
- Machine Learning
- Clustering
- Data Analysis
- Map BuildingVisualizing
Task 1: Exploratory Data Analysis
Understanding the problem and data provided through basic data analysis and visualizations.
- Checking for duplicate and empty data cells
- Removing the redundant data
- Finally plotting the cleared data
Task 2: Visualizing Geographical Data
Trying various interactive means to further improve my learnings about the data.
- Plotting the data on the world map with the co-ordinates provided
Task 3: Clustering Strength / Performance Metric
Evaluating the strength of a clustering algorithm.
- Calculating the silhouette score
- Plotting the graph for various blobs
Task 4: K-Means Clustering
Gaining the theoretical knowledge about k-means clustering algorithm and implementing it for our data.
- Visualizing the K-means on sample data
- Calculating the best silhouette score for our data
- Plotting the data on the basis of the algorithm
Task 5: DBSCAN
Gaining theoretical and practical knowledge of Density-Based Spatial Clustering of Applications with Noise(DBSCAN).
- Calculating the best silhouette score for our data
- Plotting the data on the map for density based approach
Task 6: HDBSCAN
Gaining theoretical and practical knowledge of Hierarchical DBSCAN or HDBSCAN to alleviate constraints of classical DBSCAN.
- Calculating the best silhouette score for our data
- Plotting the data on the map for density based approach
Task 7: Addressing Outliers
Addressing outliers classified by various density-based models
- Using K-neighbour classifier and calculating its silhouette score
- Comparing Hybrid and K-Means Approaches
Outcome
After completing this project I am able to do basic data manipulations required for any data processing field throughly and through various visual means.
Further I got a more deep insight on how various clustering algorithms differ from each other and how I can evaluate their strength on basis of various data.
Lastly this project provided a good insight to how some real world problems can be solved using these means.






