dyneth02/SQL-Exploritory-Data-Analysis-Project
SQL-based EDA project exploring a retail data warehouse. Includes database setup and analysis scripts for data exploration, dimension profiling, date range discovery, measures analysis, magnitude breakdowns and ranking using aggregates and window functions.
๐ SQL Data Analytics Project โ Retail Data Warehouse EDA
This project showcases SQL-based Exploratory Data Analysis (EDA) on a structured retail data warehouse.
It includes scripts for database creation, metadata exploration, dimensional profiling, date analysis, metric computation, magnitude breakdowns, and ranking using advanced SQL functions.
๐ Project Structure
00_init_database.sql
Creates the DataWarehouseAnalytics database, defines the gold schema, and loads dimension & fact tables via BULK INSERT.
Tables:
gold.dim_customersgold.dim_productsgold.fact_sales
:contentReference[oaicite:0]{index=0}
01_database_exploration.sql
Explores metadata using INFORMATION_SCHEMA.
- Lists all tables
- Inspects columns and data types
Useful for schema validation before EDA.
:contentReference[oaicite:1]{index=1}
02_dimensions_exploration.sql
Profiles dimension tables (dim_customers, dim_products).
Uses DISTINCT + ORDER BY to understand:
- Countries
- Categories & subcategories
- Product diversity
:contentReference[oaicite:2]{index=2}
03_date_range_exploration.sql
Analyzes temporal boundaries using MIN(), MAX(), DATEDIFF().
Includes:
- First/last order date
- Youngest/oldest customers
:contentReference[oaicite:3]{index=3}
04_measures_exploration.sql
Computes key business metrics:
- Total sales, quantity, orders
- Average price
- Distinct customers
- Combined metric report via
UNION ALL
:contentReference[oaicite:4]{index=4}
05_magnitude_analysis.sql
Breakdown of data magnitude using GROUP BY:
- Customers by country/gender
- Products by category
- Average cost per category
- Revenue by category & customer
- Item distribution across regions
:contentReference[oaicite:5]{index=5}
06_ranking_analysis.sql
Ranks products and customers using:
TOPRANK()ROW_NUMBER()
Includes:- Top & bottom performing products
- Top revenue-generating customers
- Customers with fewest orders
:contentReference[oaicite:6]{index=6}
๐ฏ Project Highlights
- Fully SQL-based, no external tools required
- Clear logical progression from schema โ dimensions โ dates โ metrics โ magnitude โ ranking
- Uses advanced SQL: window functions, grouping, distinct profiling, metadata queries
- Excellent template for analytics portfolios and data engineering foundations
๐ How to Use
- Run
00_init_database.sqlto create and populate the warehouse. - Execute each exploration script in order:
- Use any SQL client supporting T-SQL (Azure Data Studio, SSMS).
๐ง Skills Demonstrated
- Data warehousing fundamentals
- SQL EDA methodology
- Window functions (RANK, ROW_NUMBER)
- Aggregations & grouping
- Dimensional modeling analysis
- Schema introspection
- Retail analytics logic
๐ซ Academic Context
- Built as a personal analytics project inspired by coursework and real-world data warehouse approaches.
- Suitable for demonstrating SQL proficiency in analytics, BI, and data engineering.
๐จโ๐ป Author
D.G.A. DINETH HIRUSHA