ViinayKumaarMamidi/Airflow_GCP_GCS_to_BQ_Looker_Data_Engineering_Project
This Repo contains details about Running Airflow on GCP Cloud VM Instance and Building end to end Data Engineering Project using multiple GCP services, Thanks
In this Project, I have used multiple GCP services to perform ELT on Global Health data CSV file and loaded the file into GCS bucket, utilized Airflow deployed on VM instance and loaded the CSV file into Staging dataset table.
Then transformed the data into multiple tranformed tables by splitting the raw data into multiple tables by country type.
Created views by modifying the column names and filtering the data as per the requirements and populated view for each country table
Used Looker, Connected to Big Query views and build India health data report and enabled publishing and email notifications and received email with report in the PDF format
Resource: Vishal Bulbule resources has been used to understand and perform end to end data engineering project, Thanks Vishal
GCP Services used:
- GCS: Google Cloud Storage
- BQ: Big Query
- VM Instance: To Install Airflow and perform ELT
- Looker: For Reporting and Scheduling
GCS Bucket Details
Big Query Datasets and Tables Information
Complete Airflow GCS to BQ Tables and View DAG:
Python Script URL: https://github.com/ViinayKumaarMamidi/Airflow_GCP_Data_Engineering_Project/blob/main/Airflow_GCS_to_BQ_Tranformation_DAG_Script.py
Looker Report
Looker PDF Report URL: https://github.com/ViinayKumaarMamidi/Airflow_GCP_Data_Engineering_Project/blob/main/India_Health_Data_Report.pdf
Looker Report Subscriptions/Scheduling
Scheduled the report to be sent daily at 4 PM EST
