SO
sofaquitegud/omnichannel-sales-engine
End-to-end e-commerce ETL pipeline: Web scraping, Star Schema modeling with PostgreSQL, Airflow orchestration, and Streamlit analytics dashboard.
Omnichannel Sales Engine
A complete data engineering portfolio project demonstrating ETL pipelines, dimensional modeling, orchestration, and analytics visualization.
π― Project Overview
This project scrapes product data from multiple e-commerce platforms (Amazon, eBay, Shopee), transforms it into a star schema data warehouse, and visualizes insights through an interactive dashboard.
Skills Demonstrated
| Area | Technologies |
|---|---|
| Data Ingestion | Web scraping with BeautifulSoup, API integration |
| Data Modeling | Star schema (fact/dimension tables), PostgreSQL |
| ETL Pipeline | Python, SQLAlchemy, data validation |
| Orchestration | Apache Airflow DAGs |
| Containerization | Docker, Docker Compose |
| Visualization | Streamlit, Plotly |
ποΈ Architecture
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β E-commerce β β Scrapers β β Raw Storage β
β Platforms ββββββΆβ (Python) ββββββΆβ (CSV/JSON) β
β Amazon/eBay/... β β β β data/raw/ β
βββββββββββββββββββ βββββββββββββββββββ ββββββββββ¬βββββββββ
β
βββββββββββββββββββ βΌ
β Airflow β βββββββββββββββββββ
β Orchestration ββββββΆβ Processors β
β β β Validate/Clean β
βββββββββββββββββββ ββββββββββ¬βββββββββ
β
βββββββββββββββββββ βββββββββββββββββββ βΌ
β Streamlit βββββββ PostgreSQL βββββββββββββββββββββββββ
β Dashboard β β Star Schema β β Transformer β
β β β β β Staging β DW β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
π Directory Structure
omnichannel-sales-engine/
βββ airflow/dags/ # Airflow DAG definitions
βββ config/ # Configuration and settings
βββ dashboard/ # Streamlit app (multi-page)
βββ database/ # SQLAlchemy models and migrations
βββ processors/ # Data validation, transformation, loading
βββ scrapers/ # Platform-specific scrapers
βββ sql/schema/ # Raw SQL table definitions
βββ tests/ # Unit and integration tests
βββ docker-compose.yml # Full stack configuration
βββ Makefile # Common commands
π Quick Start
Prerequisites
- Docker and Docker Compose
- Python 3.10+ (for local development)
1. Clone and Setup
git clone https://github.com/syfqfrhnn/omnichannel-sales-engine.git
cd omnichannel-sales-engine
cp .env.example .env # Edit with your settings2. Start with Docker
# Start all services
docker-compose up -d
# Wait for services to be healthy
docker-compose ps
# View logs
docker-compose logs -f3. Access Services
| Service | URL | Credentials |
|---|---|---|
| Airflow | http://localhost:8080 | admin / admin |
| Dashboard | http://localhost:8501 | - |
| PostgreSQL | localhost:5432 | sales_user / sales_password |
4. Initialize Database with Sample Data
# Enter the dashboard container
docker-compose exec dashboard bash
# Initialize tables and seed sample data
python database/init_db.py
python database/seed_data.pyπ§ Local Development
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r config/requirements.txt
# Set environment variables
export DATABASE_URL=postgresql://sales_user:sales_password@localhost:5432/sales_analytics
# Run dashboard locally
streamlit run dashboard/app.pyπ Data Warehouse Schema
Star Schema Design
Dimension Tables:
dim_product- Product attributes (name, brand, URL)dim_platform- E-commerce platformsdim_category- Product categories (hierarchical)dim_date- Date dimension for time analysis
Fact Tables:
fact_price_history- Daily price snapshotsfact_sales- Sales transactionsfact_reviews_daily- Aggregated review metrics
Staging Tables:
stg_products- Raw scraped product datastg_reviews- Raw scraped reviews
βοΈ Airflow DAGs
| DAG | Schedule | Description |
|---|---|---|
daily_scrape |
6:00 AM UTC | Scrape products from all platforms |
etl_pipeline |
7:30 AM UTC | Transform staging β warehouse |
data_quality |
8:00 AM UTC | Run quality checks |
π Dashboard Pages
- Overview - KPIs, product distribution, price trends
- Products - Search and browse products
- Price Trends - Historical price analysis, discounts
- Platform Comparison - Cross-platform analytics
π§ͺ Testing
# Run all tests
make test
# Run with coverage
pytest --cov=scrapers --cov=processors tests/π Make Commands
make setup # Create virtual environment
make install # Install dependencies
make db-init # Initialize database
make db-seed # Seed sample data
make scrape-all # Run all scrapers
make process # Run data processing
make dashboard # Start Streamlit
make docker-up # Start Docker stack
make docker-down # Stop Docker stack
make test # Run tests
make clean # Clean cache filesπ Environment Variables
# Database
DATABASE_URL=postgresql://user:pass@host:5432/db
POSTGRES_USER=sales_user
POSTGRES_PASSWORD=sales_password
# Airflow
AIRFLOW_USER=admin
AIRFLOW_PASSWORD=admin
# Scraper
SCRAPER_USER_AGENT=Mozilla/5.0...
SCRAPER_DELAY=2π§ Roadmap
- Add more e-commerce platforms (Lazada, Alibaba)
- Implement sentiment analysis on reviews
- Add price alert notifications
- Deploy to cloud (AWS/GCP)
- Add real-time streaming with Kafka
π License
MIT License - feel free to use for your portfolio!
Built with β€οΈ for data engineering portfolios
On this page
Languages
Python98.8%Makefile0.7%Dockerfile0.5%
Contributors
Created December 29, 2025
Updated January 2, 2026