GitHunt
HA

hanfei1986/Impute-missing-data-with-KNNImputer-and-IterativeImputer

When signaficant amount of data are missing, what can we do? Impute the missing data with mean or median? Actually, Scikit-Learn provides two powerful imputers, KNNImputer and IterativeImputer, which can do this work effectively.

Impute-missing-data-with-KNNImputer-and-IterativeImputer

When signaficant amount of data are missing, what can we do? Impute the missing data with mean or median? That will be a diaster. Actually, Scikit-Learn provides two powerful imputers, KNNImputer and IterativeImputer. The former imputes missing data using the mean value from n_neighbors nearest neighbors found in the training set, and the latter is inspired by R's MICE package and imputes missing values by modeling each feature with missing values as a function of other features in a round-robin fashion.

Before imputation, there are significant amount of "Cost", a few "Weight", and many "Ingredient Number" data missing in the dataset.

image

After imputation, all the columns are filled.

image

Let's have a look at the imputation effect. Amazing!

image

Languages

Jupyter Notebook100.0%

Contributors

Created August 10, 2023
Updated October 31, 2023