Tksrivastava/autonomous-metal
Autonomous Metal is an autonomous AI workflow designed to mimic a quantitative commodity analyst, transforming market data and economic indicators into explainable forecasts and analyst-style insights for LME Aluminum price movements.
Autonomous Metal
An applied AI system that simulates the workflow of a commodity market analyst — combining time-series forecasting, SHAP-driven explainability, and LLM-powered narrative generation to produce institutional-style market research reports.
The project currently focuses on London Metal Exchange (LME) Aluminum, but the architecture is designed to generalize to other commodities and structured economic domains.
Table of Contents
- Motivation
- Sample Report
- Quick Start
- Repository Structure
- Architecture
- Pipeline Lifecycle
- Forecast Model Performance
- Model Architecture
- Core Modules
- API Reference
- Dataset
- Technology Stack
- Design Principles
- Disclaimer
- Author
Motivation
Traditional forecasting systems are primarily focused on answering:
What will happen?
Real-world analysts must answer:
What will happen — and why?
Autonomous Metal bridges this gap by evolving from a predictive ML pipeline into an autonomous analytical system capable of producing structured market interpretations. The goal is not just prediction accuracy, but machine-assisted market reasoning.
Sample Report
A generated analyst report produced from the current system is available at /sample-report.md.
Quick Start
Ensure
.envis configured from.env.examplebefore running any scripts.
Environment Setup
cp .env.example .env
# Fill in your Kaggle credentials, Groq API key, and forecast parameters.env variables:
| Variable | Description |
|---|---|
KAGGLE_USERNAME |
Kaggle account username |
KAGGLE_KEY |
Kaggle API key |
KAGGLE_DATASET |
Target dataset identifier |
GROQ_API_KEY |
Groq LLM inference key |
GROQ_LLM_MODEL |
LLM model name (e.g. llama-3.3-70b-versatile) |
FORECAST_HORIZON |
Number of forecast days ahead |
LAG_WINDOW |
Historical lookback window size |
TRAINING_CUTOFF |
Train/validation split date |
Install Dependencies
pip install -r requirements.txtRun the API
Start the FastAPI service to generate LME Aluminum analyst reports on demand.
chmod +x run-api.sh
./run-api.shOnce running, open the interactive docs at: http://localhost:8000/docs
Retrain Models
Execute the full training and experimentation pipeline.
chmod +x run-experiment.sh
./run-experiment.shThis runs the complete ML lifecycle: data ingestion → feature engineering → model training → evaluation → artifact updates.
Repository Structure
autonomous-metal/
├── .github/
│ └── workflows/
│ └── pylint.yml # Automated static analysis on push
├── api/
│ └── main.py # FastAPI service and report generation endpoint
├── artifacts/ # Generated experiment outputs (models, scalers, plots)
├── core/
│ ├── graph.py # LangGraph reasoning workflow and analyst engine
│ ├── logging.py # Centralized LoggerFactory
│ ├── model.py # CNN forecasting model architecture
│ ├── prompts.py # Structured LLM prompt templates
│ └── utils.py # Data acquisition, feature prep, visualization
├── pipelines/
│ ├── fetch-data-kaggle-pipeline.py
│ ├── label-preparation-pipeline.py
│ ├── feature-engineering-pipeline.py
│ ├── prepare-training-data-pipeline.py
│ ├── forecast-model-training-pipeline.py
│ └── performance-evaluation-pipeline.py
├── .env.example
├── pyproject.toml
├── requirements.txt
├── run-api.sh
├── run-experiment.sh
└── sample-report.md
The repository is organized as a modular, production-oriented ML system rather than a notebook-based experiment. Each directory represents a distinct responsibility, enabling reproducibility, scalability, and clear separation between data processing, modeling, and reporting.
Architecture
flowchart TD
A[Kaggle Dataset] --> B(fetch-data-kaggle-pipeline)
B --> C[(SQLite Market Database)]
C --> D(label-preparation-pipeline)
C --> E(feature-engineering-pipeline)
D --> F[labels.csv]
E --> G[features.csv]
F --> H(prepare-training-data-pipeline)
G --> H
H --> I[(training-x.pkl)]
H --> J[(training-y.pkl)]
H --> K[(features-set.pkl)]
H --> L[(spot-prices.csv)]
I --> M(forecast-model-training-pipeline)
J --> M
M --> N[(feature-scaler.pkl)]
M --> O[(Forecast Models — 1–5 Horizons)]
M --> P[(Training Loss Plots)]
O --> Q(performance-evaluation-pipeline)
N --> Q
G --> Q
F --> Q
Q --> R[MAPE Metrics]
Q --> S[Directional Accuracy]
O --> T[SHAP Explainers]
G --> T
T --> U[Feature Attribution Scores]
U --> V[StructuredFeatureMarketAnalyst]
O --> V
L --> V
V --> W[LangGraph Workflow]
W --> X[Structured Prompts]
X --> Y[Groq LLM Reasoning]
Y --> Z[Analyst Insights]
Z --> AA[Automated Market Report]
AA --> AB[FastAPI Service]
Pipeline Lifecycle
Raw market data flows through six sequential, reproducible stages:
Data Acquisition → Label Construction → Feature Engineering
↓
Training Dataset Preparation → Model Training → Performance Evaluation
↓
Explainability (SHAP) → LLM Reasoning → Analyst Report
fetch-data-kaggle-pipeline.py
Authenticates with Kaggle and downloads the commodity dataset, ensuring reproducibility across environments. Acts as the data ingestion entry point — all downstream pipelines depend on this stage.
label-preparation-pipeline.py
Constructs supervised learning targets by computing forward-looking price returns from historical LME Aluminum spot prices. Targets are expressed as percentage returns rather than raw prices, aligning the model with directional market movement.
Label definition: y = (future_price − current_price) / current_price
Output: dataset/labels.csv
feature-engineering-pipeline.py
Aggregates explanatory variables from multiple economic datasets into a unified feature matrix. Integrates LME spot prices, Google Finance indices, LME inventory levels, and the Baltic Dry Index into a macro-aware representation of commodity markets.
Outputs: dataset/features.csv, artifacts/features.csv
prepare-training-data-pipeline.py
Transforms tabular features and labels into temporal tensors suitable for sequence-based forecasting. Constructs sliding windows aligned with each forecast horizon.
Generated artifacts:
| Artifact | Purpose |
|---|---|
training-x.pkl |
Model input sequences |
training-y.pkl |
Forecast targets |
features-set.pkl |
Feature ordering metadata |
spot-prices.csv |
Historical reference prices |
forecast-model-training-pipeline.py
Trains independent models per forecast horizon (1–5 days ahead) using RobustScaler for outlier-resistant feature normalization. Saves trained models and loss visualizations to artifacts/.
performance-evaluation-pipeline.py
Evaluates models via chronological backtesting using two metrics:
- MAPE (Mean Absolute Percentage Error) — price magnitude accuracy
- Directional Accuracy — whether the predicted direction (up/down) is correct
Directional accuracy is the primary metric, as market usefulness depends on predicting the sign of price movement.
Forecast Model Performance
Evaluation follows a strict chronological split — future data is never available during training. Models perform direct multi-horizon forecasting using a fixed historical lookback across 14 market drivers.
Price Accuracy (MAPE)
| Days Ahead | Train | Validation |
|---|---|---|
| 1 | 0.87% | 0.96% |
| 2 | 1.23% | 1.23% |
| 3 | 1.56% | 1.46% |
| 4 | 1.95% | 2.24% |
| 5 | 2.21% | 2.22% |
Validation performance closely tracks training error, indicating minimal overfitting.
Directional Accuracy (Primary Metric)
| Days Ahead | Train | Validation |
|---|---|---|
| 1 | 63.3% | 57.4% |
| 2 | 63.5% | 58.0% |
| 3 | 62.6% | 60.5% |
| 4 | 56.0% | 58.5% |
| 5 | 55.8% | 58.3% |
Directional accuracy remains consistently above random-walk behavior across all horizons.
Benchmarking Context
Financial markets exhibit extremely low signal-to-noise ratios and near-random short-term dynamics.
| Accuracy | Interpretation |
|---|---|
| ~50% | Random walk |
| 52–55% | Weak predictive signal |
| 55–58% | Strong ML performance |
| 58–61% | Research-level forecasting |
Autonomous Metal achieves approximately 57–60% directional accuracy, placing it within modern deep-learning commodity forecasting ranges reported in applied quantitative research.
Model Architecture
The forecasting model is a lightweight temporal convolutional network designed for noisy financial time-series.
Input (lookback × features)
↓
Conv1D — Temporal Feature Extraction
↓
Batch Normalization
↓
Flatten Projection
↓
Regularized Dense Forecast Head (L1 + L2)
↓
Multi-Horizon Output (tanh activation)
Architectural decisions:
- Conv1D over RNN/LSTM — parallel computation, stable gradients, strong performance on structured time-series
- GELU activation — improved gradient smoothness and training stability
- Batch Normalization — stabilizes training under non-stationary market distributions
- tanh output — bounds predictions, stabilizes return forecasting
Directional Penalty Loss
Financial usefulness depends on predicting direction, not only magnitude. The model uses a custom loss combining MSE with a directional agreement penalty:
def _directional_penalty_loss(y_true, y_pred, sample_weight=None):
mse = tf.keras.losses.mean_squared_error(y_true, y_pred)
directional_accuracy = tf.reduce_mean(
tf.cast(tf.equal(tf.sign(y_true), tf.sign(y_pred)), tf.float32)
)
directional_penalty = 2 / (1 + directional_accuracy)
return mse * directional_penaltyCorrect direction → smaller penalty. Incorrect direction → stronger corrective gradient. Magnitude learning is preserved through the MSE backbone.
Training Configuration
| Component | Setting |
|---|---|
| Optimizer | Adam |
| Loss | Directional Penalty Loss |
| Metric | MSE |
| Early stopping | patience=10, monitors val_loss, restores best weights |
Deterministic Training Controls
- Python, NumPy, and TensorFlow seed synchronization
TF_DETERMINISTIC_OPS=1- GPU disabled (
CUDA_VISIBLE_DEVICES="-1")
Full bitwise determinism cannot be guaranteed due to floating-point behavior across hardware.
Core Modules
core/model.py — AutonomousForecastModelArchitecture
The quantitative engine that converts engineered market features into forward-looking forecasts.
Forecasting formulation: for timestamp T:
- Input
X:[T − lookback + 1 … T] - Output
y:[T + 1 … T + horizon]
This is sequence-to-multi-horizon regression — all future steps are predicted in a single forward pass, avoiding error accumulation.
Usage:
model = AutonomousForecastModelArchitecture(
seed=42,
input_horizon_space=60,
input_feature_space=12,
output_horizon_space=10
)
model.fit(X_train, y_train, validation_data=(X_val, y_val))
predictions = model.predict(X_test)
model.save("artifacts/model.keras")
loaded_model = AutonomousForecastModelArchitecture.load("artifacts/model.keras")core/graph.py — StructuredFeatureMarketAnalyst
The central orchestration engine that transforms forecasts and SHAP explanations into structured economic narratives.
Conceptual workflow:
Forecast Models → SHAP Explainability → Structured Feature Analyst
↓
LLM Reasoning → Automated Market Report
LangGraph execution graph:
get_feature_information
↓
get_feature_timeseries
↓
get_forecasting
↓
get_shap_scores
↓
get_llm_insight
↓
END
Key components:
| Component | Role |
|---|---|
FeatureInsight |
Pydantic schema for structured analyst output |
FeatureAnalystState |
Workflow state container |
StructuredFeatureMarketAnalyst |
Main orchestration engine |
The LLM operates via Groq at deterministic temperature with schema-constrained outputs, preventing hallucinated structure while maintaining machine-readable responses.
core/prompts.py — Prompt Architecture
Enforces reasoning boundaries, prevents hallucinations, and standardizes analyst communication.
Forecast Models → SHAP Explainability → Structured Feature Analyst
↓
Prompt Layer → LLM Output → Market Research Report
| Component | Role |
|---|---|
StructuredSystemPrompt |
Global behavioral contract — interpret model outputs only |
StructuredUserPrompt |
Injects forecast trajectory, feature metadata, and SHAP signals |
StructuredAnalystReportPrompt |
Generates the final institutional research report |
Core rule: The forecast already exists and is final — explanation only.
Required report structure:
# LME Aluminum Market Outlook Report
## Executive Summary
## Key Market Insight
## Aluminum Market Fundamentals
...
## Conclusion
core/utils.py — Data Utilities
| Component | Role |
|---|---|
FetchFromKaggle |
Authenticated dataset acquisition and storage |
PlotHistory |
TensorFlow training visualization via Plotly |
PrepareLabels |
Forward-looking return label generation |
FetchRawFeatures |
Multi-source feature assembly from SQLite |
FetchRawFeatures integrates:
- LME Aluminum spot prices
- Google Finance indices
- LME inventory levels
- Baltic Dry Index
Constructs a macro-aware feature space combining supply signals, logistics indicators, and financial market conditions.
core/logging.py — LoggerFactory
Centralized logging factory providing a unified log format, standardized output destinations, controlled log rotation, and consistent logging levels across all system components.
API Reference
The api/main.py module exposes Autonomous Metal as a deployable analytical service via FastAPI.
POST /create_report
Generates an analyst-style Markdown report for a given Friday date.
Request:
{
"friday_date": "YYYY-MM-DD"
}Validation rules:
| Rule | Requirement |
|---|---|
| Format | YYYY-MM-DD |
| Day | Must be Friday |
| Range | 2015-01-14 to 2026-02-05 |
Friday-only execution is enforced because the forecasting system is aligned with LME weekly settlement logic.
Response:
{
"status": "success",
"report_date": "YYYY-MM-DD",
"markdown_report": "..."
}Interactive docs: http://localhost:8000/docs
Dataset
Autonomous Metal is built upon a purpose-designed dataset available on Kaggle:
LME Aluminum Forecasting Dataset — Explainability
This dataset was engineered to enable not just price forecasting, but the explanation of why forecasts occur — making it suitable for SHAP attribution and LLM reasoning workflows.
Data Composition
| Category | Signals |
|---|---|
| Market price | LME Aluminum spot prices, historical reference prices |
| Supply-side | LME inventory levels, stock movement signals |
| Logistics | Baltic Dry Index (global shipping demand proxy) |
| Financial | Major financial indices, cross-asset sentiment indicators |
Integration With Pipelines
| Pipeline | Dataset Usage |
|---|---|
fetch-data-kaggle-pipeline |
Automated dataset acquisition |
feature-engineering-pipeline |
Feature extraction and alignment |
label-preparation-pipeline |
Future return target generation |
prepare-training-data-pipeline |
Temporal training tensor construction |
performance-evaluation-pipeline |
Ground-truth validation |
Technology Stack
| Layer | Technology | Purpose |
|---|---|---|
| Language | Python ≥ 3.11 | Primary implementation language |
| Deep Learning | TensorFlow / Keras 2.15 | CNN forecasting model, custom loss |
| Numerical | NumPy 1.26 | Tensor operations, window generation |
| Preprocessing | Scikit-learn 1.4 | RobustScaler for outlier-resistant scaling |
| Data | Pandas 2.2 | Time-series alignment, feature merging |
| Storage | SQLite | Structured local market database |
| Data Acquisition | Kaggle API 1.5 | Automated dataset ingestion |
| Explainability | SHAP 0.44 | Feature attribution and model interpretation |
| LLM Orchestration | LangChain ≥ 0.2 | Structured prompt templating |
| Workflow | LangGraph ≥ 0.2 | Deterministic reasoning state graphs |
| LLM Inference | Groq (langchain-groq) |
Fast inference with schema constraints |
| API | FastAPI 0.134, Uvicorn 0.41 | REST service for report generation |
| Visualization | Plotly 5.24 | Interactive training loss curves |
| Serialization | Pickle | Artifact storage for tensors and scalers |
| CI | GitHub Actions + Pylint | Automated static code analysis |
Design Principles
Separation of Concerns — Core logic, pipelines, artifacts, and configuration are isolated to maintain clarity and extensibility.
Reproducibility — Pipelines are structured so datasets and models can be rebuilt consistently from scratch.
Modularity — Components can evolve independently as the system grows toward autonomous analysis.
Explainability First — Forecasting outputs are designed to be interpretable and directly usable in analyst-style reporting.
Controlled AI Reasoning — LLMs operate under strict analytical constraints: they interpret model outputs, never generate independent forecasts.
Local-First — Experiments run deterministically on a single machine without requiring external ML infrastructure.
This structure supports the gradual evolution of Autonomous Metal from a forecasting pipeline into an autonomous market research assistant capable of reasoning about market drivers and producing decision-support insights.
Disclaimer
This project is developed strictly for research, educational, and experimental purposes.
Autonomous Metal is an applied machine learning and analytical systems project intended to explore forecasting methodologies, explainability techniques, and AI-assisted market reasoning. Nothing contained in this repository, generated reports, forecasts, or analytical outputs should be interpreted as financial advice, investment recommendations, or trading guidance.
Users are solely responsible for any decisions made based on information produced by this system.
Author
Tanul Kumar Srivastava
Applied Data Scientist & ML Systems Engineer
Licensed under the MIT License.