lmo272/instacart-product-recommender
This repository contains the prototype of a product recommender based on data from online grocer Instacart. It was created as a group project for the Machine Learning Course for MSc Business Analytics at Nova School of Business and Economics.
π₯ Instacart Product Recommender
βΉοΈ General Information
The project was part of the 2487-S2 Machine Learning course for the MSc in Business Analytics taught at Nova School of Business and Economics. The topic and scope of the project could be freely chosen by the students, based on given datasets.
π¨βπ» Group members
- Frederik SΓΈegaard - 44898
- Lennart Max Oser - 44379
- Niclas Frederic Sturm - 45914
π‘ About the project
This repository contains the prototype of a product recommender based on data from online grocer Instacart.
The goal was to first identify a business problem faced by e-commerce comapnies such as Instacart, second explore the avaialble data to get an understaning of what we can work with and then finally prototype a product recommendation engine based on the products in the basket of a used. In addition to the jupyter notebooks, we also created a Command Line Interface (CLI) to play around with our built recommendation engine. On top of that, we also created an API to demonstrate how such an engine could be used as a Microservice within a company (i.e. Instacart).
π Files overview
We divided the project in total of 6 parts numbered from 0 to 5. Additionally, there is a data folder which has to be created following the instructions below. Here you find an overview of the strucure:
βββ 0_Introduction # containing the business to ML problem part
βΒ Β βββ 0_Introduction.ipynb
βββ 1_Exploratory_Data_Analysis # classical EDA based on the six available data sets
βΒ Β βββ 1_exploratory_data_analysis.ipynb
βββ 2_Clustering # containing the feature engineering, a PCA and the actual clustering alorithm
βΒ Β βββ 2_clustering.ipynb
βββ 3_Item2Vec # containing the Item2Vec alogrhitm and the testing of the recommender engine
βΒ Β βββ 3_0_Item2Vec.ipynb
βΒ Β βββ 3_1_Recommendation_Testing.ipynb
βββ 4_Command_Line_Interface # containting the python file for CLI handling
βΒ Β βββ CLI_Specification.md
βΒ Β βββ recommend_me_something.py
βββ 5_Recommender_API # contatining the API
βΒ Β βββ API_Specification.md
βΒ Β βββ engine
βΒ Β βΒ Β βββ recommender_engine.py
βΒ Β βββ recommender_api.py
βββ data # data folder with all the requried data files
βΒ Β βββ aisles.csv
βΒ Β βββ departments.csv
βΒ Β βββ order_products__prior.csv
βΒ Β βββ order_products__train.csv
βΒ Β βββ orders.csv
βΒ Β βββ products.csv
βΒ Β βββ sample_submission.csv
βββ environment.yml
βββ README.mdπ» Usage
In order to run the code in the same environment as we did please create a virtual environment running the command conda env create -f environment.yml.
After doing so, you should be able to choose the new environment called instacart in your preferred IDE.
To download the data run the following steps:
- In your CLI run
mkdir dataor manually create a folder calleddata - Run
cd datain your CLI to get in the right directory - Now run the following command to download the data
kaggle competitions download -c instacart-market-basket-analysis. If you prefer to manually download the data click here - Extract the zip files using the CLI or what ever method you prefer