GitHunt
TH

ThomasShikalepo/sql-data-warehouse-project

Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics

πŸ“Š Data Warehouse and Analytics Project

Welcome to the Data Warehouse and Analytics Project repository! πŸš€
This portfolio project showcases a complete end-to-end data warehousing and analytics solutionβ€”from raw data ingestion to business intelligence reporting. It follows industry best practices in data engineering and analytics.


πŸ—οΈ Data Architecture

This project follows the Medallion Architecture, structured into three layers:

data_architecture

  • Bronze Layer: Stores raw data ingested as-is from source systems (CSV files) into a SQL Server database.
  • Silver Layer: Processes and transforms data with cleansing, standardization, and normalization techniques.
  • Gold Layer: Contains business-ready, analytics-optimized data modeled using a star schema.

πŸ“– Project Overview

This project involves:

  • Data Architecture: Building a modern warehouse with Medallion Architecture (Bronze, Silver, Gold).
  • ETL Pipelines: Extracting, transforming, and loading data from ERP and CRM CSVs.
  • Data Modeling: Designing fact and dimension tables for optimized analytical queries.
  • Analytics & Reporting: Creating SQL-based reports and dashboards for actionable business insights.

🧰 Tools & Resources

Everything is 100% free and open-source!

  • πŸ“‚ Datasets: ERP and CRM CSV files
  • 🧩 SQL Server Express: Lightweight SQL Server instance
  • πŸ–₯️ SQL Server Management Studio (SSMS): GUI for SQL Server
  • 🧠 Draw.io: For data modeling and architecture diagrams
  • πŸ’‘ Notion: Project templates and documentation
  • πŸ’» GitHub: For version control and collaboration

πŸš€ Project Requirements

🧱 Part 1: Building the Data Warehouse (Engineering)

Goal: Develop a modern data warehouse using SQL Server for unified, analytics-ready sales data.

Specifications:

  • Import data from two sources (ERP and CRM, in CSV format).
  • Cleanse and resolve data quality issues.
  • Integrate data into a single analytical model.
  • Focus on the most recent data (no historization required).
  • Document the data model for stakeholders and analysts.

πŸ“Š Part 2: Business Intelligence & Reporting (Analysis)

Goal: Use SQL to analyze data and generate business insights.

Insights Provided:

  • Customer Behavior
  • Product Performance
  • Sales Trends

These insights help drive data-driven decision-making.

πŸ“„ For full details, see docs/requirements.md


πŸ“‚ Repository Structure

data-warehouse-project/
β”‚
β”œβ”€β”€ datasets/                      # Raw datasets used for the project (ERP and CRM data)
β”‚
β”œβ”€β”€ docs/                          # Project documentation and architecture details
β”‚   β”œβ”€β”€ etl.drawio                 # Draw.io file showing ETL techniques and flow
β”‚   β”œβ”€β”€ data_architecture.drawio   # Diagram of the overall data warehouse architecture
β”‚   β”œβ”€β”€ data_catalog.md            # Metadata and field descriptions of datasets
β”‚   β”œβ”€β”€ data_flow.drawio           # Visual data flow from source to destination
β”‚   β”œβ”€β”€ data_models.drawio         # Star schema and data model designs
β”‚   β”œβ”€β”€ naming-conventions.md      # Standards for naming tables, fields, and files
β”‚
β”œβ”€β”€ scripts/                       # SQL scripts for ETL and transformation
β”‚   β”œβ”€β”€ bronze/                    # Scripts for loading raw data (Bronze layer)
β”‚   β”œβ”€β”€ silver/                    # Scripts for data cleansing and transformation (Silver layer)
β”‚   β”œβ”€β”€ gold/                      # Scripts for building the analytical model (Gold layer)
β”‚
β”œβ”€β”€ tests/                         # Data quality checks and testing scripts
β”‚
β”œβ”€β”€ README.md                      # Project overview and setup instructions
β”œβ”€β”€ LICENSE                        # License file for this repository
β”œβ”€β”€ .gitignore                     # Git ignore rules for files and folders
└── requirements.txt               # Required software/tools and setup dependencies

ThomasShikalepo/sql-data-warehouse-project | GitHunt