GitHunt
HP

hpena212/snowflake-shaved-ice

Evaluating Naive Baselines and Simple Seasonal Models for Cloud Capacity Planning. A reproducible analytics engineering project using dbt, DuckDB, and Python to aggregate global VM demand and benchmark rolling-average heuristics against interpretable seasonal time series models using Snowflake’s Shaved Ice dataset for ICPE 2026.

ICPE 2026 Data Challenge - Shaved Ice Dataset Analysis

Deadline: January 28, 2026
Status: Submitted 🎉

Weekly Seasonality in Cloud Demand: Lessons from Snowflake's Shaved Ice Dataset

   **ICPE 2026 Data Challenge Submission** | Under Peer Review

The Problem

Cloud providers offer steep discounts for long-term capacity commitments — but under-forecasting demand is 2.1x more expensive than over-forecasting. Most capacity planning teams use rolling averages as their baseline forecast. We asked a simple ques tion: do those rolling averages actually capture what's happening in the data?

They don't. Snowflake's VM demand drops ~25% every weekend like clockwork — a pattern rolling averages structurally cannot capt ure.

Overview

This project analyzes Snowflake's "Shaved Ice" VM demand dataset for the ICPE 2026 Data Challenge. It uses dbt + DuckDB for data transformations and Python for variance-aware forecasting analysis.

The core insight: In capacity planning, the outliers are the problem. A forecast that nails the mean but misses the variance causes downtime or wastes money.

Tech Stack

Layer Technology
Containerization Docker + Docker Compose
Data Engineering dbt-duckdb (SQL transformations)
Database DuckDB (local analytical database)
Analysis Python 3.13 (pandas, statsmodels)
Visualization matplotlib, seaborn
Notebooks Jupyter

🚀 Quick Start

# 1. Clone the repository
git clone https://github.com/hpena212/snowflake-shaved-ice.git "Shaved Ice"
cd "Shaved Ice"

# 2. Clone the dataset
mkdir -p data/raw && cd data/raw
git clone https://github.com/Snowflake-Labs/shavedice-dataset.git
cd ../..

# 3. Build and start Docker
docker-compose build
docker-compose up -d

# 4. Run dbt to build the database
docker exec shaved-ice-project dbt run --profiles-dir .

# 5. Open Jupyter
# Navigate to: http://localhost:8888

Option B: Virtual Environment

# 1. Clone the repository
git clone https://github.com/hpena212/snowflake-shaved-ice.git "Shaved Ice"
cd "Shaved Ice"

# 2. Create virtual environment (Python 3.10-3.13)
py -3.13 -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

# 3. Clone the dataset
mkdir data\raw -Force && cd data\raw
git clone https://github.com/Snowflake-Labs/shavedice-dataset.git
cd ..\..

# 4. Run dbt
mkdir data\processed -Force
dbt run --profiles-dir .

# 5. Launch Jupyter
jupyter notebook

📁 Project Structure

Shaved Ice/
├── data/
│   ├── raw/shavedice-dataset/   # Downloaded dataset
│   └── processed/shaved_ice.duckdb  # Built by dbt
├── notebooks/
│   └── 01_data_exploration.ipynb
├── sql/models/                  # dbt SQL models
│   ├── staging/                 # Raw → Clean columns
│   ├── intermediate/            # Daily aggregations
│   └── marts/                   # Analysis-ready tables
├── src/                         # Python modules
│   ├── data_loader.py
│   ├── duckdb_loader.py
│   ├── forecasting.py
│   ├── plotting.py
│   └── utils.py
├── Dockerfile                   # Container definition
├── docker-compose.yml           # Container orchestration
├── dbt_project.yml              # dbt configuration
├── profiles.yml                 # DuckDB connection
├── requirements.txt             # Python dependencies
└── verify_setup.py              # Setup verification script

🔧 Commands Reference

Docker Commands

Task Command
Start container docker-compose up -d
Stop container docker-compose down
Run dbt docker exec shaved-ice-project dbt run --profiles-dir .
Open shell docker exec -it shaved-ice-project bash
View logs docker-compose logs -f
Rebuild docker-compose build --no-cache

dbt Commands

Task Command
Run all models dbt run --profiles-dir .
Run specific model dbt run --select mart_forecast_input --profiles-dir .
Test data quality dbt test --profiles-dir .
Generate docs dbt docs generate --profiles-dir .

📊 How It Works

  1. dbt run → Transforms raw parquet data into clean DuckDB tables
  2. Python → Loads tables for forecasting analysis
  3. Jupyter → Interactive exploration and visualization
# Load dbt output in your Python scripts
from src.duckdb_loader import load_mart_data
df = load_mart_data()  # Returns a clean pandas DataFrame

🎯 Research Focus

Core Question: Do simple seasonal time-series models provide materially better demand forecasts than rolling-average baselines for aggregated cloud infrastructure?


✅ Progress

  • Project setup with Docker support
  • dbt pipeline configured (4 models)
  • Dataset integration
  • Multi-device workflow documented
  • Variance analysis
  • Forecasting models
  • Paper visualizations
  • ICPE 2026 submission

License

This project is for the ICPE 2026 Data Challenge submission.