amalphonse/retail-analytics-dbt
Analytics Data Engineering Project using dbt
Retail Revenue Analytics Mart (dbt + DuckDB )
๐ Overview
This project builds a production-style analytics mart from raw transactional retail data using dbt.
Using the UCI Online Retail dataset (2010โ2011 UK e-commerce transactions), I transform raw invoice line items into a clean, tested star schema and generate revenue reporting models with built-in data quality enforcement.
This project demonstrates how modern Analytics Engineers:
- Separate raw ingestion from transformation logic
- Define clear model grain and data contracts
- Implement automated data quality checks
- Build documented, reproducible analytics pipelines
- Design warehouse-portable dbt projects (DuckDB โ Snowflake)
๐ฏ Business Context
Retail transactional data often contains:
- Returns and cancellations
- Negative quantities
- Inconsistent identifiers
- Revenue miscalculations due to unhandled edge cases
Without proper modeling and validation, downstream dashboards and financial reporting can be incorrect.
This project solves that by:
- Standardizing raw data in a staging layer
- Modeling clear fact and dimension tables
- Explicitly handling cancellation logic
- Enforcing integrity via dbt tests
- Producing trustworthy revenue reporting outputs
๐ Data Model
Raw Layer (Seed)
- raw.online_retail
- Grain: one row per invoice line item
Staging Layer
- stg_online_retail
- Typed and standardized fields
Derived fields: - invoice_date
- line_revenue
- is_cancellation (business rule logic)
Mart Layer (Star Schema)
Dimensions
- dim_customers โ one row per customer
- dim_products โ one row per product (StockCode)
Fact
- fct_order_items โ one row per invoice line item
- Revenue cleaned and business-validated
Reporting
- rpt_daily_revenue
Aggregated daily revenue excluding cancellation rows
๐ Tech Stack
- dbt Core
- DuckDB (local analytics warehouse)
- Snowflake-ready configuration
- SQL
- YAML (schema + test configuration)
- Git / GitHub
Dataset Source:
UCI ML Repository โ Online Retail
๐ How to Run Locally
1๏ธโฃ Clone repository
git clone <your-repo-url>
cd retail_analytics_dbt2๏ธโฃ Create virtual environment
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install dbt-core dbt-duckdb##3๏ธโฃ Configure dbt profile
Create ~/.dbt/profiles.yml:
retail_analytics_dbt:
target: dev
outputs:
dev:
type: duckdb
path: "retail.duckdb"Verify connection:
dbt debug5๏ธโฃ Build the full pipeline
dbt seed
dbt build6๏ธโฃ View documentation & lineage
dbt docs generate
dbt docs serveโ Data Quality & Testing Strategy
This project treats data tests as contracts, not afterthoughts.
Generic Tests
- unique and not_null on dimension keys
- relationships tests between fact and dimensions
Business Logic Test
Custom SQL test:
- Ensures non-cancellation revenue is never negative
Cancellation Handling Logic
The dataset contains returns that appear as:
- Invoice numbers beginning with โCโ
- Negative quantities
- Negative revenue patterns
The staging layer introduces a derived is_cancellation flag to ensure:
- Revenue reporting excludes returns
- Downstream aggregates are financially accurate
๐ Results & Impact
โ Built a fully reproducible analytics mart from raw retail data
โ Enforced primary and foreign key integrity
โ Identified and corrected dataset-specific revenue edge cases
โ Designed project for seamless warehouse portability
โ Full dataset (~540k rows) builds locally in under 1 second
This pipeline ensures revenue metrics are reliable before reaching BI dashboards or executive reporting.
๐ง Key Learnings
- Clearly defining model grain prevents metric inflation
- Data tests should reflect business logic, not just schema constraints
- Cancellation logic must be explicitly modeled in retail datasets
- Warehouse abstraction in dbt enables seamless platform migration
- Lineage documentation improves transparency and trust
๐ Why This Project Matters
This is not a dashboard exercise.
It demonstrates:
- Production-style analytics engineering practices
- Version-controlled transformation logic
- Data contract enforcement
- Business-aware modeling
- Reproducible local development
- Clean migration path to enterprise warehouse
๐ค Author
Anju Mercian
Platform Engineer | Data Platform | Modern Data Stack