GitHunt
RA

Ratnesh-181998/GCP-Data-Engineering

🚀 Master GCP Data Engineering! Covers GCS, BigQuery, Dataproc, Dataflow & Airflow. Build 6+ industrial projects: Flight Booking pipelines, Real-time Uber alerts & Fraud Detection using PySpark, Medallion Arch & CI/CD. 🛠️ Tech: Python, SQL, Spark, Beam & Streaming.

🚀 GCP Services for Data Engineering with Projects

Welcome to the comprehensive repository for GCP Services for Data Engineering with Projects. This course is designed to take you from a beginner to an advanced level in Google Cloud Platform (GCP) data engineering, covering everything from fundamental services to building complex, industrial-scale data pipelines.

image image

🛠️ Tech Stack & Tools

🌩️ Cloud Platform & Core Services

Google Cloud
Google Cloud Storage
IAM
GCP Pub/Sub

📊 Big Data & Analytics

BigQuery
Dataproc
Dataflow
BigTable
Looker Studio

⚙️ Processing & Orchestration

Apache Spark
Apache Beam
Apache Airflow
Apache Iceberg
Apache Hive

💻 Infrastructure & Serverless

Cloud Run Functions
Compute Engine
Terraform
Docker

🐍 Languages & Libraries

Python
SQL
Pandas
PyTest

🚀 CI/CD & Others

GitHub Actions
Cloud Build
Secret Manager

📝 Detailed Tech Stack Breakdown

Category Technologies
☁️ Cloud Platform Google Cloud Platform (GCS, IAM, Pub/Sub, Secret Manager)
🗄️ Data Warehouse & NoSQL BigQuery (SQL), BigTable, Cloud SQL (MySQL)
⚙️ Processing Frameworks Apache Spark (PySpark), Apache Beam, Apache Iceberg, Hive
📈 Compute & Analytics Dataproc (Serverless), Dataflow, Looker Studio
🔄 Orchestration Apache Airflow (GCP Composer), GCP Workflows, Cloud Scheduler
🛠️ Serverless & DevOps Cloud Run Functions, Docker, Terraform, Cloud Build, GitHub Actions
🐍 Languages & Libraries Python, SQL, Pandas, PyTest

📚 Modules Breakdown ( Live Content's Coming Soon )

🔹 Module 1: GCP Foundations & Serverless

  • Fundamentals: GCS (Google Cloud Storage), IAM (Identity and Access Management).
  • Compute: GCP Compute Engine, Cloud Monitoring & Logging.
  • Serverless: GCP Cloud Run Functions, Cloud Build.
  • Messaging & Orchestration: GCP Pub-Sub, Cloud Scheduler.
  • Key Labs:
    • GCS Bucket management via CLI & Python.
    • Setting up Compute Engines and monitoring health.
    • Implementing HTTP & Event-driven Cloud Run Functions.
    • CI/CD with Cloud Build & GitHub.
image

🔹 Module 2: Databases, Streaming & BigQuery

  • Storage & DB: Cloud SQL (MySQL), Secret Manager, BigTable.
  • Data Processing: GCP Dataflow, Apache Beam (Batch & Streaming).
  • Analytics: BigQuery Architecture (Capacitor, Colossus, Dremel).
  • Modern Architecture: Medallion Architecture (Bronze, Silver, Gold layers).
  • Key Labs:
    • CDC (Change Data Capture) streams from BigTable to Pub-Sub.
    • Dataflow FlexTemplates for custom pipelines.
    • Advanced BigQuery operations: Partitioning, Clustering, Time Travel.
    • Geolocation & AI-assisted analysis with Gemini.
image image

🔹 Module 3: Big Data Processing & Orchestration

  • Managed Clusters: GCP Dataproc (Hadoop, Yarn, Spark, Hive).
  • Serverless Spark: Dataproc Serverless.
  • Orchestration: GCP Composer (Managed Apache Airflow).
  • Key Labs:
    • Incremental data ingestion (SCD Type 2) in BigQuery.
    • Spark Structured Streaming with Iceberg on GCS.
    • Designing complex Airflow DAGs for automated workflows.
image

🔹 Module 4: No-Code Pipelines & Workflows

  • ETL/ELT: GCP Data Fusion (Wrangler Transformations).
  • Orchestration: GCP Workflows.
  • Key Labs:
    • Building no-code pipelines for Fintech & HackerNews data.
    • Orchestrating Cloud Run Functions & Dataproc jobs using GCP Workflows.
    • Handling event-driven triggers for SCD2 operations.
image image

🏗️ Industrial Projects

This repository includes several end-to-end industrial projects:

# Project Name Tech Stack
1 ✈️ Flight Booking Data Pipeline Airflow, PySpark, Dataproc Serverless, BigQuery, CI/CD with GitHub Actions
2 🌦️ Weather Forecast Data Processing Open Weather API, Composer, PySpark, Dataproc Serverless, BigQuery
3 🚗 Uber Car Idle Realtime Alerts Pub-Sub, Dataflow (Apache Beam), BigQuery, Cloud Run Functions
4 💳 Credit Card Fraud Alert Pipeline Airflow, Dataproc Serverless, Python, BigQuery, PyTest, Looker Studio
5 🏆 Realtime Game Leaderboard Pub-Sub, Dataflow, BigQuery, Cloud Run Functions, Scheduled Queries
6 📺 YouTube Wide Trending Engagement GCS, Composer, PySpark, Dataproc, Iceberg, BigQuery
image

🚀 Getting Started

  1. GCP Account: Set up a GCP Free Tier Account.
  2. SDK: Install and initialize the Google Cloud SDK.
  3. Python: Ensure Python 3.9+ is installed.
  4. Clone the Repo:
    git clone https://github.com/Ratnesh-181998/GCP-Data-Engineering.git
    cd GCP-Data-Engineering

📞 CONTACT & NETWORKING 📞

💼 Professional Networks

LinkedIn
GitHub
X
Portfolio
Email
Medium
Stack Overflow

🚀 AI/ML & Data Science AI/ML 1620+ Problem Solved

Streamlit
HuggingFace
Kaggle

💻 Competitive Programming Including all coding plateform's 5000+ Problems/Questions solved

LeetCode
HackerRank
CodeChef
Codeforces
GeeksforGeeks
HackerEarth
InterviewBit


📊 GitHub Stats & Metrics 📊

Profile Views

GitHub Streak Stats


Typing SVG

Footer Typing SVG

Ratnesh-181998/GCP-Data-Engineering | GitHunt