GitHunt
VI

ViinayKumaarMamidi/Airflow_GCP_GCS_to_BQ_Looker_Data_Engineering_Project

This Repo contains details about Running Airflow on GCP Cloud VM Instance and Building end to end Data Engineering Project using multiple GCP services, Thanks

In this Project, I have used multiple GCP services to perform ELT on Global Health data CSV file and loaded the file into GCS bucket, utilized Airflow deployed on VM instance and loaded the CSV file into Staging dataset table.
Then transformed the data into multiple tranformed tables by splitting the raw data into multiple tables by country type.
Created views by modifying the column names and filtering the data as per the requirements and populated view for each country table
Used Looker, Connected to Big Query views and build India health data report and enabled publishing and email notifications and received email with report in the PDF format
Resource: Vishal Bulbule resources has been used to understand and perform end to end data engineering project, Thanks Vishal

GCP Services used:

  1. GCS: Google Cloud Storage
  2. BQ: Big Query
  3. VM Instance: To Install Airflow and perform ELT
  4. Looker: For Reporting and Scheduling

Data Flow Architecture
image

GCS Bucket Details

Source_GCS_Bucket_1_Million_Records_CSV_File

Big Query Datasets and Tables Information

BigQuery_Tables_Views_Information

Complete Airflow GCS to BQ Tables and View DAG:

Python Script URL: https://github.com/ViinayKumaarMamidi/Airflow_GCP_Data_Engineering_Project/blob/main/Airflow_GCS_to_BQ_Tranformation_DAG_Script.py

Final_GCS_Bucket_To_BigQuery_Tables_to_BigQuery_Views_Airflow_DAG_Flow

Looker Report

Looker_Reporting_Web_UI

Looker PDF Report URL: https://github.com/ViinayKumaarMamidi/Airflow_GCP_Data_Engineering_Project/blob/main/India_Health_Data_Report.pdf

Looker Report Subscriptions/Scheduling
Scheduled the report to be sent daily at 4 PM EST

Looker_Report_Email_Attachment

Languages

Python100.0%

Contributors

Created April 20, 2025
Updated May 1, 2025