GitHunt
NI

Niraj-Khatri/Video_Game_Reviews

This project showcased the ETL process of big data. Raw data about Amazon video games reviews was collected from a site, placed into an AWS database, and queried against using Pyspark and SQL to find out whether Amazon vine reviews influenced customer feedback.

Pyspark-AWS Project

The goal of this project was to extract Amazon product review data, clean the data, and load the data to a Postgres database using AWS RDS. Afterwards, I did analysis on the data to determine whether a certified Amazon vine reviewer provided more helpful reviews than a non-vine reviewer.

ETL

I extracted Amazon video game review data from the following site:
https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Video_Games_v1_00.tsv.gz
Extract


I cleaned the data and created 4 tables to do future analysis with: customers, products, reviews, and vines.

Cleaning


I created a AWS RDS instance and used an SQL script to create the 4 tables in Postgres.
AWS

Postgres


With PySpark, I loaded the data tables to Postgres.

Upload

Data Analysis

I wanted to analyze the Amazon video game data to determine if Amazon vine reviewers provided more helpful reviews.


First, using the vine table, I filtered out reviews that had less than 50% of the helpful votes and reviews with less than 20 total votes.

Filter


Next, I calculated the number of vine reviews and non-vine reviews in the filtered data set.

Vine


Finally, I wanted to look at top products (5 stars). I filtered out the data set for five star reviews only and calculated the percentage of 5 star reviews among vine and non-vine reviews.

5Stars

Conclusion: Vine reviewers gave a product 5 stars half the time a vine review was found helpful. This is 10% more compared to non-vine reviews. This may suggest 5 star reviews are found more helpful overall since they instill confidence in the reader to buy the product.

Languages

Jupyter Notebook100.0%

Contributors

Created August 20, 2021
Updated October 5, 2024
Niraj-Khatri/Video_Game_Reviews | GitHunt