GitHunt
SH

shreeyajoshi2013/Prediction-of-Electricity-Consumption

Prediction of Electricity consumption in Household Units by using Random Forest Regressor

Prediction of Electricity Consumption

Goal of this project is building a data model that predicts electricity consumption, located in the KWH field in the dataset.
This dataset contains information of energy costs and usage for heating, cooling, appliances and other end uses, from a sample of housing units.

The dataset taken from link.

(Number of Rows: approx. 12000,

Number of Columns: approx. 940)

Built with

  • Google Colab

Highlights

  • Random Forest Regressor

Libraries used

  • Pandas
  • Numpy
  • Matplotlib
  • Seaborn
  • Scikit-learn

What is being done?

  1. Data understanding
    • Data exploration
  2. Data preparation
    • One-Hot Encoding the categorical columns
    • Handling NaN values
    • Removing the unneacesary columns
    • Assumptions and considerations:
    • Columns starting with 'Z' are the imputation flags for other variables. So are to be removed as they will not contribute in the prediction.
    • Columns with thermal unit other than KWH are assumend to be not helpful. Hence are removed.
    • Columns which show the total consumptions of elements' electricity usage are redundant as the individual contributions by those elements are already present in the data. Hence are removed for avoiding data redundancy.
  3. Data Analysis
    • Finding the correlation of features with output variable and visualizing
  4. Random Forest Regressor
    • Using GridSearchCV for selecting optimal hyperparameters for the model
    • Choosing important features by calculating feature importances

Conclusion

There are about 14 features from the entire dataset that are found to be contributing the most towards the consumption of electricity, and are found after several steps of data cleaning, processing and feature engineering.

Random Forest Regressor is giving fair output for prediction of the consumption in Kilo Watt Hour (KWH ) with R2 score of 0.875. With more data exploration and manipulation, more optimised prediction can be obtained.

Further Tasks

Other models such as Neural Networks can be used for the prediction.

The features can be dugged deep with more EDA and by using libraries such as FeatureSelector to further improve the model and working more on feature importance.

References

Languages

Jupyter Notebook100.0%

Contributors

Created August 13, 2022
Updated May 31, 2025
shreeyajoshi2013/Prediction-of-Electricity-Consumption | GitHunt