GitHunt
SO

sofaquitegud/omnichannel-sales-engine

End-to-end e-commerce ETL pipeline: Web scraping, Star Schema modeling with PostgreSQL, Airflow orchestration, and Streamlit analytics dashboard.

Omnichannel Sales Engine

A complete data engineering portfolio project demonstrating ETL pipelines, dimensional modeling, orchestration, and analytics visualization.

Python
PostgreSQL
Airflow
Streamlit
Docker

🎯 Project Overview

This project scrapes product data from multiple e-commerce platforms (Amazon, eBay, Shopee), transforms it into a star schema data warehouse, and visualizes insights through an interactive dashboard.

Skills Demonstrated

Area Technologies
Data Ingestion Web scraping with BeautifulSoup, API integration
Data Modeling Star schema (fact/dimension tables), PostgreSQL
ETL Pipeline Python, SQLAlchemy, data validation
Orchestration Apache Airflow DAGs
Containerization Docker, Docker Compose
Visualization Streamlit, Plotly

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   E-commerce    β”‚     β”‚    Scrapers     β”‚     β”‚   Raw Storage   β”‚
β”‚   Platforms     │────▢│  (Python)       │────▢│   (CSV/JSON)    β”‚
β”‚ Amazon/eBay/... β”‚     β”‚                 β”‚     β”‚   data/raw/     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                         β”‚
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β–Ό
                        β”‚   Airflow       β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚  Orchestration  │────▢│   Processors    β”‚
                        β”‚                 β”‚     β”‚ Validate/Clean  β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β–Ό
β”‚   Streamlit     │◀────│   PostgreSQL    β”‚β—€β”€β”€β”€β”€β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Dashboard     β”‚     β”‚  Star Schema    β”‚     β”‚  Transformer    β”‚
β”‚                 β”‚     β”‚                 β”‚     β”‚ Staging β†’ DW    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Directory Structure

omnichannel-sales-engine/
β”œβ”€β”€ airflow/dags/           # Airflow DAG definitions
β”œβ”€β”€ config/                 # Configuration and settings
β”œβ”€β”€ dashboard/              # Streamlit app (multi-page)
β”œβ”€β”€ database/               # SQLAlchemy models and migrations
β”œβ”€β”€ processors/             # Data validation, transformation, loading
β”œβ”€β”€ scrapers/               # Platform-specific scrapers
β”œβ”€β”€ sql/schema/             # Raw SQL table definitions
β”œβ”€β”€ tests/                  # Unit and integration tests
β”œβ”€β”€ docker-compose.yml      # Full stack configuration
└── Makefile                # Common commands

πŸš€ Quick Start

Prerequisites

  • Docker and Docker Compose
  • Python 3.10+ (for local development)

1. Clone and Setup

git clone https://github.com/syfqfrhnn/omnichannel-sales-engine.git
cd omnichannel-sales-engine
cp .env.example .env  # Edit with your settings

2. Start with Docker

# Start all services
docker-compose up -d

# Wait for services to be healthy
docker-compose ps

# View logs
docker-compose logs -f

3. Access Services

Service URL Credentials
Airflow http://localhost:8080 admin / admin
Dashboard http://localhost:8501 -
PostgreSQL localhost:5432 sales_user / sales_password

4. Initialize Database with Sample Data

# Enter the dashboard container
docker-compose exec dashboard bash

# Initialize tables and seed sample data
python database/init_db.py
python database/seed_data.py

πŸ”§ Local Development

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r config/requirements.txt

# Set environment variables
export DATABASE_URL=postgresql://sales_user:sales_password@localhost:5432/sales_analytics

# Run dashboard locally
streamlit run dashboard/app.py

πŸ“Š Data Warehouse Schema

Star Schema Design

Dimension Tables:

  • dim_product - Product attributes (name, brand, URL)
  • dim_platform - E-commerce platforms
  • dim_category - Product categories (hierarchical)
  • dim_date - Date dimension for time analysis

Fact Tables:

  • fact_price_history - Daily price snapshots
  • fact_sales - Sales transactions
  • fact_reviews_daily - Aggregated review metrics

Staging Tables:

  • stg_products - Raw scraped product data
  • stg_reviews - Raw scraped reviews

βš™οΈ Airflow DAGs

DAG Schedule Description
daily_scrape 6:00 AM UTC Scrape products from all platforms
etl_pipeline 7:30 AM UTC Transform staging β†’ warehouse
data_quality 8:00 AM UTC Run quality checks

πŸ“ˆ Dashboard Pages

  1. Overview - KPIs, product distribution, price trends
  2. Products - Search and browse products
  3. Price Trends - Historical price analysis, discounts
  4. Platform Comparison - Cross-platform analytics

πŸ§ͺ Testing

# Run all tests
make test

# Run with coverage
pytest --cov=scrapers --cov=processors tests/

πŸ“ Make Commands

make setup          # Create virtual environment
make install        # Install dependencies
make db-init        # Initialize database
make db-seed        # Seed sample data
make scrape-all     # Run all scrapers
make process        # Run data processing
make dashboard      # Start Streamlit
make docker-up      # Start Docker stack
make docker-down    # Stop Docker stack
make test           # Run tests
make clean          # Clean cache files

πŸ” Environment Variables

# Database
DATABASE_URL=postgresql://user:pass@host:5432/db
POSTGRES_USER=sales_user
POSTGRES_PASSWORD=sales_password

# Airflow
AIRFLOW_USER=admin
AIRFLOW_PASSWORD=admin

# Scraper
SCRAPER_USER_AGENT=Mozilla/5.0...
SCRAPER_DELAY=2

🚧 Roadmap

  • Add more e-commerce platforms (Lazada, Alibaba)
  • Implement sentiment analysis on reviews
  • Add price alert notifications
  • Deploy to cloud (AWS/GCP)
  • Add real-time streaming with Kafka

πŸ“„ License

MIT License - feel free to use for your portfolio!


Built with ❀️ for data engineering portfolios

sofaquitegud/omnichannel-sales-engine | GitHunt