Sardhendu/Cluster-Analysis
{Python}: An efficient and robust way of clustering categorical datasets.
This code showcases a robust and efficient way of clustering categorical data.
- Order-Purchase:
-
crt_dataset_models: This module contains all the important codework including building the similarity matrix for categorical attributes. The module contains code for finding the best centroids to avoid random choice and uses k-means in an efficient way for faster retrieval.
-
feature_selection: This module uses the concept of imformation gain to understand which feature are more valuable. However this module doesnt play a role in the process (used only for analyzing).
-
main: This modules creates/stores the dataset, builds/stores the model and stores the output clusters into the disk for further analysis.
-
Please note that the code was only used for analysing the algorithm with different dataset taken form UCI repository, the code is not in its most optimized form.