Abrar2652/IEEE-CIS-Fraud-Detection-Project
This is the first project to be completed in Upskill ISA Intelligent Machines. The project was done after the end of the competition. The XGBClassifier used in this model obtained 0.950844 public scores on Kaggle.
IEEE-CIS-Fraud-Detection-Project
Description
This project is a part of the Machine Learning Course provided by Upskill Income Sharing Agreement program with Intelligent Machines.
An already ended competition dataset has been selected as this project where different machine learning models were benchmarked. The data contains real-world e-commerce transactions from Vesta. It contains a wide range of features from device type to product features. The competitors were to develop a machine learning model to predict if the transaction is fraud or not fraud. This project targets to improve the efficacy of fraudulent transaction alerts for millions of people around the world, helping hundreds of thousands of businesses reduce their fraud loss and increase their revenue
Important Links
Getting Started
The main challenge of this project is the gigantic amount of features and it's difficult to remove the unnecessary features where we don't know which factors to consider while choosing features. Training the machine learning models on these all features will waste a lot of time and obviously won't obtain better score. The main starting point should be data exploration, data cleaning, dealing with the null values, feature engineering.
Dependencies
Programming language: Python
Libraries: NumPy, Pandas, Matplotlib, Seaborn, scikit-learn, XGBClassifier
Environment: Kaggle Notebook
Executing program
- Go to nbviewer to run jupyter notebooks if it fails to open on Github
- Copy and Paste the URL of the .ipynb file in the blank of the nbviewer: https://github.com/Abrar2652/IEEE-CIS-Fraud-Detection-Project/blob/main/ieee-cis-fraud-detection.ipynb
Help
If you face difficulties running the model on your local machine or Google Colab Notebook, then check if you are running the Kernel on CPU or GPU. If you're running on CPU, change the runtime to GPU. I ran this notebook with 4 GB RAM, 2.4 GHz Intel(R) Core(TM) i3 CPU. I faced a lot of difficulties including sudden shutdown due to overheating, running out of my resources, etc. Kaggle environment worked well for me.
Authors
Md. Abrar Jahin
License
This project is licensed under the [Apache License 2.0] License - see the LICENSE.md file for details
Acknowledgments
StackOverflow, Towards Data Science articles, Data Exploration and Feature Engineering Techniques of Kaggle Grandmasters, DataCamp