danijeun/foresee-app
AI-powered AutoML platform that transforms CSV files into ML insights in 60 seconds. Upload data → Get AI-ranked target recommendations (Gemini 2.5) → Auto-train 3 models (LR, DT, XGBoost) → Download professional PDF reports with visualizations. Built with React, Flask, Snowflake & Google Gemini AI.
ForeSee - AI-Powered AutoML Platform
Automated Machine Learning Analysis & Reporting with Google Gemini AI
ForeSee is an intelligent web application that transforms raw data into actionable ML insights. Upload a CSV file and get professional ML analysis reports with AI-powered target variable recommendations, automated model training, and comprehensive PDF reports—all in minutes.
🎯 What Does Foresee Do?
From CSV to ML Insights in 5 Simple Steps:
- Upload your dataset (CSV format)
- AI Analysis - Google Gemini automatically suggests the best target variables to predict
- Select your prediction target from ranked recommendations
- Auto-Train - System trains 3 ML models in parallel (Logistic Regression, Decision Tree, XGBoost)
- Download a comprehensive PDF report with insights, metrics, and recommendations
✨ Key Features
🤖 AI-Powered Target Selection (Google Gemini 2.5)
- Uses Google Gemini 2.5 Flash (
gemini-2.5-flash-preview-05-20) to intelligently analyze your dataset - Recommends the top 5 most valuable prediction targets with importance scores (1-100)
- Distinguishes between target variables (outcomes) and features (predictors)
- Provides detailed business rationale, predictability assessment, and suggested features
- Runs in parallel with EDA for faster results
📊 Automatic Exploratory Data Analysis (EDA)
- Snowflake-based comprehensive statistical analysis
- Analyzes all column types: numeric, categorical, datetime, text
- Detects data types, missing values, duplicates, and cardinality
- Calculates metrics: mean, std, quartiles, skewness, kurtosis, top values
- Stores all results in Snowflake for querying and persistence
- Parallel execution with Target Analysis for optimal performance
🚀 Multi-Model Machine Learning
Trains 3 models sequentially after target selection:
| Model | Description | Key Metrics |
|---|---|---|
| Logistic Regression | Fast, interpretable baseline | Accuracy, Precision, Recall, F1, ROC-AUC |
| Decision Tree | Non-linear pattern detection | Tree depth, leaves, feature importance |
| XGBoost | State-of-the-art gradient boosting | N-estimators, max depth, learning rate |
Each model provides:
- Performance metrics (train & test)
- Confusion matrices
- Feature importance rankings
- Model-specific recommendations
- Data quality assessments
📄 Professional PDF Reports (AI-Generated)
- Natural language insights generated by Google Gemini
- Executive summary with best-performing model
- Data quality and EDA insights
- Model performance comparisons
- Feature importance analysis
- Actionable recommendations
- Professional charts and visualizations
❄️ Snowflake Data Platform
- Isolated workflow schemas - Each upload creates
WORKFLOW_<UUID>schema - Scalable data warehouse for enterprise datasets
- SQL-based data processing and storage
- Persistent storage for all EDA and ML results
- Clean separation between workflows
⚡ Modern Web Interface (React + Tailwind)
- Drag-and-drop file upload
- Real-time progress tracking with rotating status messages
- Interactive podium display for top 3 target recommendations
- Responsive design for all devices
- Smooth animations with AOS (Animate On Scroll)
- In-browser PDF viewing and download
🏗️ Architecture
┌─────────────────────────────────────────────────────────────────┐
│ FRONTEND (React 19 + Vite) │
│ │
│ • Drag & drop file upload • Podium target display │
│ • Real-time progress tracking • PDF viewer │
│ • Target variable selection • Responsive UI │
└─────────────────────┬───────────────────────────────────────────┘
│ REST API (CORS enabled)
│
┌─────────────────────▼───────────────────────────────────────────┐
│ BACKEND (Flask 3.0 API) │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ MULTI-AGENT SYSTEM │ │
│ │ │ │
│ │ 1️⃣ EDA Agent (Parallel) │ │
│ │ → Analyze dataset │ │
│ │ → Store stats in Snowflake │ │
│ │ │ │
│ │ 2️⃣ Target Variable Agent (Parallel) - Gemini 2.5 │ │
│ │ → Sample data │ │
│ │ → LLM analysis │ │
│ │ → Rank top 5 targets (importance scores) │ │
│ │ │ │
│ │ 3️⃣ ML Training Agents (Sequential after target select) │ │
│ │ → Logistic Regression Agent │ │
│ │ → Decision Tree Agent │ │
│ │ → XGBoost Agent │ │
│ │ │ │
│ │ 4️⃣ Natural Language Agent - Gemini 2.5 │ │
│ │ → Collect EDA & ML results │ │
│ │ → Generate insights (LLM) │ │
│ │ → Create PDF report (ReportLab) │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ Services: │
│ • Workflow Manager • Snowflake Ingestion │
│ • EDA Service • Config Management │
└─────────────────────┬───────────────────────────────────────────┘
│ Snowflake Connector
│
┌─────────────────────▼───────────────────────────────────────────┐
│ SNOWFLAKE DATA PLATFORM │
│ │
│ Isolated Schemas: WORKFLOW_<UUID> │
│ │
│ Tables per Workflow: │
│ • WORKFLOW_METADATA → Workflow info │
│ • WORKFLOW_EDA_SUMMARY → EDA results │
│ • COLUMN_STATS → Column metrics │
│ • LOGISTIC_REGRESSION_SUMMARY → LR model results │
│ • DECISION_TREE_SUMMARY → DT model results │
│ • XGBOOST_SUMMARY → XGB model results │
│ • RAW_DATA_TABLE → Original CSV data │
└─────────────────────────────────────────────────────────────────┘
🔄 Application Workflow
Phase 1: Upload & Parallel Analysis (45-60s)
User uploads CSV
│
├─→ Store in Snowflake (temp file → Snowflake table)
│
├─→ 🧵 Thread 1: EDA Agent
│ └─→ Analyze all columns
│ └─→ Save to WORKFLOW_EDA_SUMMARY
│
└─→ 🧵 Thread 2: Target Variable Agent (Gemini 2.5)
└─→ Sample 100 rows
└─→ LLM analysis
└─→ Return top 5 targets (ranked)
Time Saved: ~40% faster than sequential execution
Phase 2: Target Selection (User interaction)
Frontend displays 3 recommendations in podium:
🥇 Gold (Rank 1 - Highest importance)
🥈 Silver (Rank 2)
🥉 Bronze (Rank 3)
+ "Other Options" button → Shows all 5 recommendations
Each recommendation includes:
• Importance Score (1-100)
• Problem Type (regression/classification)
• Why Important (business value)
• Predictability (HIGH/MEDIUM/LOW)
• Suggested Features (top predictors)
User selects target → Saved to workflow_metadata
Phase 3: Sequential ML Training (10-15s)
Automatic training after target selection:
1. Logistic Regression Agent
├─→ Feature engineering
├─→ Train/test split (80/20)
├─→ Model training (max_iter=1000)
├─→ Performance evaluation
└─→ Save to LOGISTIC_REGRESSION_SUMMARY
2. Decision Tree Agent
├─→ Feature engineering
├─→ Train/test split (80/20)
├─→ Model training (max_depth=10)
├─→ Performance evaluation
└─→ Save to DECISION_TREE_SUMMARY
3. XGBoost Agent
├─→ Feature engineering
├─→ Train/test split (80/20)
├─→ Model training (n_estimators=100, max_depth=6)
├─→ Performance evaluation
└─→ Save to XGBOOST_SUMMARY
Phase 4: Report Generation (5-10s)
Natural Language Agent (Gemini 2.5)
│
├─→ Collect EDA insights from Snowflake
├─→ Collect ML results (all 3 models)
├─→ Generate narrative with Gemini LLM
├─→ Create visualizations (Matplotlib)
├─→ Generate PDF (ReportLab)
└─→ Save to backend/pdf/
Total Time: ~60-85 seconds from upload to PDF
🛠️ Technology Stack
Frontend
| Library | Version | Purpose |
|---|---|---|
| React | 19.1.1 | UI framework |
| React Router | 7.9.3 | Navigation |
| Vite | 7.1.7 | Build tool & dev server |
| Tailwind CSS | 4.1.14 | Styling framework |
| AOS | 2.3.4 | Scroll animations |
| ESLint | 9.36.0 | Code linting |
Backend
| Library | Version | Purpose |
|---|---|---|
| Flask | 3.0.0 | REST API framework |
| Flask-CORS | 4.0.0 | Cross-origin support |
| python-dotenv | 1.0.1 | Environment config |
AI & Machine Learning
| Library | Version | Purpose |
|---|---|---|
| google-generativeai | 0.8.3 | Google Gemini 2.5 API |
| scikit-learn | 1.5.0 | Logistic Regression, Decision Tree |
| XGBoost | 2.1.0 | Gradient boosting |
| SHAP | 0.44.0 | Model explainability |
| pandas | 2.2.0 | Data manipulation |
| NumPy | 1.26.0 | Numerical operations |
Data Platform
| Library | Version | Purpose |
|---|---|---|
| snowflake-connector-python | 3.12.0 | Snowflake connectivity |
| snowflake-snowpark-python | 1.39.1 | Snowpark DataFrame API |
Reporting & Visualization
| Library | Version | Purpose |
|---|---|---|
| ReportLab | 4.0.7 | PDF generation |
| Matplotlib | 3.8.0 | Charts & visualizations |
📋 Project Structure
foresee-app/
├── frontend/ # React Application
│ ├── src/
│ │ ├── pages/
│ │ │ ├── Home.jsx # Landing page
│ │ │ ├── Foresee.jsx # Main app (upload, analysis, results)
│ │ │ ├── AboutUs.jsx # Team information
│ │ │ └── Help.jsx # User guide
│ │ ├── components/
│ │ │ ├── TopBanner.jsx # Navigation header
│ │ │ └── Footer.jsx # Footer
│ │ ├── App.jsx # Main component & routing
│ │ └── main.jsx # Entry point
│ ├── package.json # Frontend dependencies
│ └── vite.config.js # Vite configuration
│
├── backend/ # Flask API + ML Agents
│ ├── app.py # Main Flask API (1949 lines)
│ │
│ ├── agents/ # AI/ML Agents
│ │ ├── eda_agent/ # EDA Agent (Snowflake-based)
│ │ │ ├── agent.py # Main EDA orchestration
│ │ │ ├── config.py # EDA configuration
│ │ │ ├── database/
│ │ │ │ ├── connection.py # Snowflake connection
│ │ │ │ ├── schema.py # Schema management
│ │ │ │ └── storage.py # Results storage
│ │ │ ├── metrics/ # Metric calculators
│ │ │ │ ├── basic_metrics.py # Basic stats
│ │ │ │ ├── numeric_metrics.py # Numeric stats
│ │ │ │ ├── categorical_metrics.py
│ │ │ │ ├── datetime_metrics.py
│ │ │ │ ├── text_metrics.py
│ │ │ │ └── target_metrics.py
│ │ │ └── utils/
│ │ │ ├── helpers.py
│ │ │ ├── logger.py
│ │ │ └── validators.py
│ │ │
│ │ ├── target_variable_agent.py # Gemini-powered target suggestions
│ │ ├── logistic_regression_agent.py
│ │ ├── decision_tree_agent.py
│ │ ├── xgboost_agent.py
│ │ └── natural_language_agent.py # Gemini-powered PDF generation
│ │
│ ├── services/
│ │ ├── workflow_manager.py # Workflow & schema management
│ │ ├── snowflake_ingestion.py # CSV → Snowflake
│ │ ├── eda_service.py # EDA orchestration
│ │ └── config.py # Configuration loader
│ │
│ ├── insights/ # Generated JSON insights
│ └── pdf/ # Generated PDF reports
│
├── requirements.txt # Python dependencies
├── .env # Environment variables (not tracked)
├── .gitignore
├── start.bat # Windows startup script
├── start.sh # Linux/Mac startup script
└── README.md
🚀 Getting Started
Prerequisites
- Python 3.11+ (Download)
- Node.js 18+ (Download)
- Snowflake Account (Sign up)
- Google Gemini API Key (Get free key)
Installation
1. Clone the repository
git clone https://github.com/yourusername/foresee-app.git
cd foresee-app2. Set up backend
# Create virtual environment
python -m venv myenv
# Activate virtual environment
# Windows:
myenv\Scripts\activate
# macOS/Linux:
source myenv/bin/activate
# Install Python dependencies
pip install -r requirements.txt3. Configure environment variables
Create a .env file in the project root:
# Snowflake Configuration
SNOWFLAKE_ACCOUNT=your_account_identifier
SNOWFLAKE_USER=your_username
SNOWFLAKE_PASSWORD=your_password
SNOWFLAKE_DATABASE=your_database
SNOWFLAKE_SCHEMA=PUBLIC
INGESTION_WAREHOUSE=your_warehouse
# Google Gemini API
GEMINI_API_KEY=your_gemini_api_key_hereGet your Gemini API key:
- Visit https://aistudio.google.com/app/apikey
- Sign in with Google account
- Click "Create API Key"
- Copy and paste into
.env
4. Set up frontend
cd frontend
npm install
cd ..🎬 Running the Application
Option 1: Quick Start (Recommended) ⭐
Windows:
start.batmacOS/Linux:
chmod +x start.sh # First time only
./start.shThis automatically:
- Activates Python virtual environment
- Starts Flask backend (port 5000)
- Starts Vite frontend (port 5173)
Option 2: Using npm
cd frontend
npm run dev:allUses concurrently to run both servers simultaneously.
Option 3: Manual (Two Terminals)
Terminal 1 - Backend:
# Activate virtual environment
myenv\Scripts\activate # Windows
# or
source myenv/bin/activate # macOS/Linux
# Start Flask server
cd backend
python app.pyTerminal 2 - Frontend:
cd frontend
npm run devAccess the Application
- Frontend: http://localhost:5173
- Backend API: http://localhost:5000
- Health Check: http://localhost:5000/api/health
📖 Usage Guide
1. Upload Dataset
- Navigate to "Foresee" in the top menu
- Drag & drop your CSV file or click "Choose File"
- Click "Upload & Analyze"
The system will:
- Upload data to Snowflake
- Run parallel EDA + Target Analysis (~45-60s)
- Display progress with rotating status messages
2. Select Target Variable
After analysis, you'll see:
Podium Display (Top 3):
- 🥇 Gold - Most important target (Rank 1)
- 🥈 Silver - Second best (Rank 2)
- 🥉 Bronze - Third option (Rank 3)
Click "Other Options" to see all 5 recommendations.
Each recommendation shows:
- Importance Score (1-100) - Quantitative ranking
- Problem Type - regression/classification
- Why Important - Business value explanation
- Predictability - HIGH/MEDIUM/LOW feasibility
- Suggested Features - Best predictor columns
3. Model Training (Automatic)
After selecting a target, the system automatically:
- Trains Logistic Regression model
- Trains Decision Tree model
- Trains XGBoost model
- Generates Natural Language Insights (Gemini)
- Creates PDF Report (ReportLab)
Total Time: 10-15 seconds
4. View/Download Report
When complete:
- Click "View Report" → Opens PDF in browser
- Click "Download Report" → Saves PDF to your computer
📊 What's in the PDF Report?
1. Executive Summary
- Dataset overview (rows, columns)
- Selected target variable
- Best-performing model
- Key findings
2. Data Quality Analysis
- Missing value analysis
- Duplicate detection
- Column type breakdown
- Data completeness metrics
3. Exploratory Data Analysis
- Numeric column statistics (mean, std, quartiles, skewness, kurtosis)
- Categorical distributions (top values, cardinality)
- Datetime patterns
- Text metrics
4. Model Performance Comparison
| Model | Accuracy | Precision | Recall | F1 Score | ROC-AUC |
|---|---|---|---|---|---|
| Logistic Regression | 0.85 | 0.82 | 0.88 | 0.85 | 0.91 |
| Decision Tree | 0.83 | 0.80 | 0.87 | 0.83 | 0.89 |
| XGBoost | 0.88 | 0.86 | 0.90 | 0.88 | 0.94 |
5. Feature Importance
- Top 10 most important features
- Feature importance scores
- Model-specific interpretations
6. Model-Specific Insights
- Confusion matrices
- Decision tree depth/leaves
- XGBoost hyperparameters
- Performance summaries
7. Recommendations
- Best model selection advice
- Data quality improvements
- Feature engineering suggestions
- Next steps for deployment
🎯 API Reference
Core Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/health |
Health check |
POST |
/api/upload |
Upload CSV & run parallel analysis |
GET |
/api/workflows |
List all workflows |
DELETE |
/api/workflow/<id> |
Delete workflow & schema |
POST |
/api/query |
Execute SQL query on workflow |
Target Variable Selection
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/target-suggestions/<workflow_id>/<table_name> |
Get AI recommendations (Gemini) |
POST |
/api/workflow/<id>/select-target |
Save target & auto-train models |
POST Body:
{
"target_variable": "column_name",
"table_name": "table_name",
"problem_type": "classification",
"importance_score": 95
}Manual Model Training (Optional)
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/workflow/<id>/train-logistic-regression |
Train LR model |
POST |
/api/workflow/<id>/train-decision-tree |
Train DT model |
POST |
/api/workflow/<id>/train-xgboost |
Train XGBoost model |
Model Results
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/workflow/<id>/logistic-regression-results |
Get LR results |
GET |
/api/workflow/<id>/decision-tree-results |
Get DT results |
GET |
/api/workflow/<id>/xgboost-results |
Get XGB results |
Report Generation
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/workflow/<id>/generate-insights |
Generate insights & PDF (Gemini) |
GET |
/api/workflow/<id>/report/view |
View PDF in browser |
GET |
/api/workflow/<id>/report/download |
Download PDF |
🗄️ Snowflake Database Schema
Each workflow creates an isolated schema: WORKFLOW_<UUID>
Tables Created Per Workflow
1. WORKFLOW_METADATA
CREATE TABLE WORKFLOW_METADATA (
key VARCHAR,
value VARIANT,
updated_at TIMESTAMP
);Stores workflow-level metadata (selected target, configuration).
2. WORKFLOW_EDA_SUMMARY
CREATE TABLE WORKFLOW_EDA_SUMMARY (
analysis_id VARCHAR PRIMARY KEY,
table_name VARCHAR,
total_rows INTEGER,
total_columns INTEGER,
duplicate_rows INTEGER,
target_column VARCHAR,
analysis_type VARCHAR,
created_at TIMESTAMP
);3. COLUMN_STATS
CREATE TABLE COLUMN_STATS (
column_name VARCHAR,
data_type VARCHAR,
null_count INTEGER,
unique_count INTEGER,
completeness FLOAT,
-- Numeric metrics
mean FLOAT,
std FLOAT,
min FLOAT,
max FLOAT,
q1 FLOAT,
q2 FLOAT,
q3 FLOAT,
skewness FLOAT,
kurtosis FLOAT,
-- Categorical metrics
mode VARCHAR,
top_values VARIANT,
cardinality INTEGER,
-- Text metrics
avg_length FLOAT,
max_length INTEGER,
-- Datetime metrics
date_range VARIANT
);4. LOGISTIC_REGRESSION_SUMMARY
CREATE TABLE LOGISTIC_REGRESSION_SUMMARY (
analysis_id VARCHAR PRIMARY KEY,
table_name VARCHAR,
target_variable VARCHAR,
model_type VARCHAR,
problem_type VARCHAR,
test_accuracy FLOAT,
test_precision FLOAT,
test_recall FLOAT,
test_f1_score FLOAT,
test_roc_auc FLOAT,
train_accuracy FLOAT,
total_samples INTEGER,
total_features INTEGER,
n_classes INTEGER,
confusion_matrix ARRAY,
top_features ARRAY,
performance_summary VARCHAR,
recommendations VARCHAR,
created_at TIMESTAMP
);5. DECISION_TREE_SUMMARY
Same as Logistic Regression + additional columns:
tree_depth INTEGER,
n_leaves INTEGER,
max_depth INTEGER,
min_samples_split INTEGER,
min_samples_leaf INTEGER6. XGBOOST_SUMMARY
Same as Logistic Regression + additional columns:
n_estimators INTEGER,
max_depth INTEGER,
learning_rate FLOAT,
subsample FLOAT,
colsample_bytree FLOAT⚙️ Configuration
Backend Configuration (backend/services/config.py)
from dotenv import load_dotenv
import os
load_dotenv()
class Config:
SNOWFLAKE_ACCOUNT = os.getenv("SNOWFLAKE_ACCOUNT")
SNOWFLAKE_USER = os.getenv("SNOWFLAKE_USER")
SNOWFLAKE_PASSWORD = os.getenv("SNOWFLAKE_PASSWORD")
SNOWFLAKE_DATABASE = os.getenv("SNOWFLAKE_DATABASE")
SNOWFLAKE_SCHEMA = os.getenv("SNOWFLAKE_SCHEMA")
INGESTION_WAREHOUSE = os.getenv("INGESTION_WAREHOUSE")Frontend Configuration (frontend/src/pages/Foresee.jsx)
const API_BASE_URL = "http://localhost:5000/api";Change this to your backend URL in production.
Flask Configuration (backend/app.py)
ALLOWED_EXTENSIONS = {'csv'}
app.config['MAX_CONTENT_LENGTH'] = 500 * 1024 * 1024 # 500MB max🤖 Google Gemini Integration
Models Used
| Agent | Model | Purpose |
|---|---|---|
| Target Variable Agent | gemini-2.5-flash-preview-05-20 |
Analyze data & rank targets |
| Natural Language Agent | gemini-2.5-flash-preview-05-20 |
Generate PDF insights |
API Configuration
import google.generativeai as genai
genai.configure(api_key=os.getenv('GEMINI_API_KEY'))
model = genai.GenerativeModel('models/gemini-2.5-flash-preview-05-20')
# Generate content
response = model.generate_content(prompt)Rate Limits & Pricing
- Free Tier: 15 requests/minute, 1500 requests/day
- Paid Tier: Higher limits available
- Check current pricing: https://ai.google.dev/pricing
🔒 Security & Privacy
Data Isolation
- Each workflow creates an isolated Snowflake schema (
WORKFLOW_<UUID>) - No data mixing between workflows
- Automatic cleanup on workflow deletion
API Security
- CORS enabled for frontend-backend communication
- File size limits (500MB max)
- File type validation (CSV only)
- Secure filename sanitization
Data Privacy
- Data stored in your Snowflake account (not ours)
- AI models (Gemini) don't retain your data
- Stateless API calls
- No data sent to third parties
Temporary Files
- Uploaded files stored in system temp directory
- Automatically deleted after Snowflake upload
- No persistent local storage
🐛 Troubleshooting
Backend won't start
# Check Python version
python --version # Should be 3.11+
# Verify .env file exists
cat .env # macOS/Linux
type .env # Windows
# Test Snowflake connection
python -c "from services.config import Config; print(Config.SNOWFLAKE_ACCOUNT)"Frontend can't connect to backend
# Verify backend is running
curl http://localhost:5000/api/health
# Check CORS is enabled in backend/app.py
# CORS(app) should be present
# Verify API_BASE_URL in frontend matches backend portUpload fails
Possible causes:
- Invalid CSV format - Verify file has headers and proper encoding
- Snowflake credentials - Check
.envvariables - Warehouse not running - Start warehouse in Snowflake UI
- Insufficient credits - Check Snowflake billing
Gemini API errors
# Verify API key is set
echo $GEMINI_API_KEY # macOS/Linux
echo %GEMINI_API_KEY% # Windows
# Test API key manually
curl -H "Content-Type: application/json" \
-d '{"contents":[{"parts":[{"text":"Hello"}]}]}' \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-05-20:generateContent?key=YOUR_API_KEY"Common errors:
INVALID_ARGUMENT: Invalid API keyRESOURCE_EXHAUSTED: Rate limit exceeded (wait 1 minute)PERMISSION_DENIED: API not enabled (enable in Google Cloud Console)
Model training fails
Possible causes:
- Target has only 1 unique value - Select different target
- Target has excessive nulls (>50%) - Data quality issue
- Insufficient samples (<50 rows) - Upload larger dataset
- All features are null - Data quality issue
📈 Performance Optimizations
Parallel Execution
- EDA + Target Analysis run in parallel using
ThreadPoolExecutor - Saves ~40% time compared to sequential execution
- Typical time saved: 20-30 seconds
Snowflake Optimizations
- Uses
PUTcommand for fast bulk loading - Isolated schemas reduce query overhead
- Warehouse auto-suspend to reduce costs
Frontend Optimizations
- Vite for fast builds and HMR (Hot Module Replacement)
- Code splitting with React Router
- Lazy loading of heavy components
📝 Development Guidelines
Python Code Style
- Follow PEP 8 style guide
- Use type hints for function parameters
- Docstrings for all functions/classes
- Maximum line length: 100 characters
React Code Style
- Use functional components with hooks
- ESLint for code linting
- Consistent file naming (PascalCase for components)
- PropTypes for type checking (optional)
Git Workflow
# Create feature branch
git checkout -b feature/your-feature-name
# Make changes and commit
git add .
git commit -m "Add: your feature description"
# Push to remote
git push origin feature/your-feature-name
# Open Pull Request on GitHub🤝 Contributing
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📝 License
This project is licensed under the MIT License.
🙏 Acknowledgments
Technologies
- Snowflake - Enterprise data platform
- Google Gemini - AI-powered insights (gemini-2.5-flash-preview-05-20)
- React - Modern web framework
- Flask - Lightweight Python API
- scikit-learn & XGBoost - ML frameworks
- ReportLab - Professional PDF generation
Team
Built with ❤️ by the Foresee Team
📞 Support
For issues, questions, or suggestions:
- 🐛 Issues: GitHub Issues
- 📧 Email: support@foresee-app.com
- 📚 Documentation: GitHub Wiki
🚀 Roadmap
Planned Features
- Support for more ML models (Random Forest, Neural Networks)
- Advanced hyperparameter tuning (GridSearchCV)
- Time series forecasting support
- Interactive charts in PDF reports
- Model deployment API (FastAPI)
- Scheduled re-training
- User authentication (OAuth 2.0)
- Multi-user workspaces
- Excel file support (.xlsx)
- Real-time model monitoring dashboard
- SHAP value visualizations
- Model versioning & comparison
Happy Analyzing! 🚀📊
Transform your data into insights with the power of AI.