RA
Ratnesh-181998/GCP-Data-Engineering
🚀 Master GCP Data Engineering! Covers GCS, BigQuery, Dataproc, Dataflow & Airflow. Build 6+ industrial projects: Flight Booking pipelines, Real-time Uber alerts & Fraud Detection using PySpark, Medallion Arch & CI/CD. 🛠️ Tech: Python, SQL, Spark, Beam & Streaming.
🚀 GCP Services for Data Engineering with Projects
Welcome to the comprehensive repository for GCP Services for Data Engineering with Projects. This course is designed to take you from a beginner to an advanced level in Google Cloud Platform (GCP) data engineering, covering everything from fundamental services to building complex, industrial-scale data pipelines.
🛠️ Tech Stack & Tools
🌩️ Cloud Platform & Core Services
📊 Big Data & Analytics
⚙️ Processing & Orchestration
💻 Infrastructure & Serverless
🐍 Languages & Libraries
🚀 CI/CD & Others
📝 Detailed Tech Stack Breakdown
| Category | Technologies |
|---|---|
| ☁️ Cloud Platform | Google Cloud Platform (GCS, IAM, Pub/Sub, Secret Manager) |
| 🗄️ Data Warehouse & NoSQL | BigQuery (SQL), BigTable, Cloud SQL (MySQL) |
| ⚙️ Processing Frameworks | Apache Spark (PySpark), Apache Beam, Apache Iceberg, Hive |
| 📈 Compute & Analytics | Dataproc (Serverless), Dataflow, Looker Studio |
| 🔄 Orchestration | Apache Airflow (GCP Composer), GCP Workflows, Cloud Scheduler |
| 🛠️ Serverless & DevOps | Cloud Run Functions, Docker, Terraform, Cloud Build, GitHub Actions |
| 🐍 Languages & Libraries | Python, SQL, Pandas, PyTest |
📚 Modules Breakdown ( Live Content's Coming Soon )
🔹 Module 1: GCP Foundations & Serverless
- Fundamentals: GCS (Google Cloud Storage), IAM (Identity and Access Management).
- Compute: GCP Compute Engine, Cloud Monitoring & Logging.
- Serverless: GCP Cloud Run Functions, Cloud Build.
- Messaging & Orchestration: GCP Pub-Sub, Cloud Scheduler.
- Key Labs:
- GCS Bucket management via CLI & Python.
- Setting up Compute Engines and monitoring health.
- Implementing HTTP & Event-driven Cloud Run Functions.
- CI/CD with Cloud Build & GitHub.
🔹 Module 2: Databases, Streaming & BigQuery
- Storage & DB: Cloud SQL (MySQL), Secret Manager, BigTable.
- Data Processing: GCP Dataflow, Apache Beam (Batch & Streaming).
- Analytics: BigQuery Architecture (Capacitor, Colossus, Dremel).
- Modern Architecture: Medallion Architecture (Bronze, Silver, Gold layers).
- Key Labs:
- CDC (Change Data Capture) streams from BigTable to Pub-Sub.
- Dataflow FlexTemplates for custom pipelines.
- Advanced BigQuery operations: Partitioning, Clustering, Time Travel.
- Geolocation & AI-assisted analysis with Gemini.
🔹 Module 3: Big Data Processing & Orchestration
- Managed Clusters: GCP Dataproc (Hadoop, Yarn, Spark, Hive).
- Serverless Spark: Dataproc Serverless.
- Orchestration: GCP Composer (Managed Apache Airflow).
- Key Labs:
- Incremental data ingestion (SCD Type 2) in BigQuery.
- Spark Structured Streaming with Iceberg on GCS.
- Designing complex Airflow DAGs for automated workflows.
🔹 Module 4: No-Code Pipelines & Workflows
- ETL/ELT: GCP Data Fusion (Wrangler Transformations).
- Orchestration: GCP Workflows.
- Key Labs:
- Building no-code pipelines for Fintech & HackerNews data.
- Orchestrating Cloud Run Functions & Dataproc jobs using GCP Workflows.
- Handling event-driven triggers for SCD2 operations.
🏗️ Industrial Projects
This repository includes several end-to-end industrial projects:
| # | Project Name | Tech Stack |
|---|---|---|
| 1 | Airflow, PySpark, Dataproc Serverless, BigQuery, CI/CD with GitHub Actions | |
| 2 | 🌦️ Weather Forecast Data Processing | Open Weather API, Composer, PySpark, Dataproc Serverless, BigQuery |
| 3 | 🚗 Uber Car Idle Realtime Alerts | Pub-Sub, Dataflow (Apache Beam), BigQuery, Cloud Run Functions |
| 4 | 💳 Credit Card Fraud Alert Pipeline | Airflow, Dataproc Serverless, Python, BigQuery, PyTest, Looker Studio |
| 5 | 🏆 Realtime Game Leaderboard | Pub-Sub, Dataflow, BigQuery, Cloud Run Functions, Scheduled Queries |
| 6 | 📺 YouTube Wide Trending Engagement | GCS, Composer, PySpark, Dataproc, Iceberg, BigQuery |
🚀 Getting Started
- GCP Account: Set up a GCP Free Tier Account.
- SDK: Install and initialize the Google Cloud SDK.
- Python: Ensure Python 3.9+ is installed.
- Clone the Repo:
git clone https://github.com/Ratnesh-181998/GCP-Data-Engineering.git cd GCP-Data-Engineering
📞 CONTACT & NETWORKING 📞
💼 Professional Networks
🚀 AI/ML & Data Science AI/ML 1620+ Problem Solved
💻 Competitive Programming Including all coding plateform's 5000+ Problems/Questions solved
📊 GitHub Stats & Metrics 📊
On this page
Contributors
Apache License 2.0
Created February 22, 2026
Updated February 23, 2026