Autonomous Metal

An applied AI system that simulates the workflow of a commodity market analyst — combining time-series forecasting, SHAP-driven explainability, and LLM-powered narrative generation to produce institutional-style market research reports.

The project currently focuses on London Metal Exchange (LME) Aluminum, but the architecture is designed to generalize to other commodities and structured economic domains.

Motivation
Sample Report
Quick Start
Repository Structure
Architecture
Pipeline Lifecycle
Forecast Model Performance
Model Architecture
Core Modules
API Reference
Dataset
Technology Stack
Design Principles
Disclaimer
Author

Motivation

Traditional forecasting systems are primarily focused on answering:

What will happen?

Real-world analysts must answer:

What will happen — and why?

Autonomous Metal bridges this gap by evolving from a predictive ML pipeline into an autonomous analytical system capable of producing structured market interpretations. The goal is not just prediction accuracy, but machine-assisted market reasoning.

Sample Report

A generated analyst report produced from the current system is available at /sample-report.md.

Quick Start

Ensure .env is configured from .env.example before running any scripts.

Environment Setup

cp .env.example .env
# Fill in your Kaggle credentials, Groq API key, and forecast parameters

.env variables:

Variable	Description
`KAGGLE_USERNAME`	Kaggle account username
`KAGGLE_KEY`	Kaggle API key
`KAGGLE_DATASET`	Target dataset identifier
`GROQ_API_KEY`	Groq LLM inference key
`GROQ_LLM_MODEL`	LLM model name (e.g. `llama-3.3-70b-versatile`)
`FORECAST_HORIZON`	Number of forecast days ahead
`LAG_WINDOW`	Historical lookback window size
`TRAINING_CUTOFF`	Train/validation split date

Install Dependencies

pip install -r requirements.txt

Run the API

Start the FastAPI service to generate LME Aluminum analyst reports on demand.

chmod +x run-api.sh
./run-api.sh

Once running, open the interactive docs at: http://localhost:8000/docs

Retrain Models

Execute the full training and experimentation pipeline.

chmod +x run-experiment.sh
./run-experiment.sh

This runs the complete ML lifecycle: data ingestion → feature engineering → model training → evaluation → artifact updates.

Repository Structure

autonomous-metal/
├── .github/
│   └── workflows/
│       └── pylint.yml          # Automated static analysis on push
├── api/
│   └── main.py                 # FastAPI service and report generation endpoint
├── artifacts/                  # Generated experiment outputs (models, scalers, plots)
├── core/
│   ├── graph.py                # LangGraph reasoning workflow and analyst engine
│   ├── logging.py              # Centralized LoggerFactory
│   ├── model.py                # CNN forecasting model architecture
│   ├── prompts.py              # Structured LLM prompt templates
│   └── utils.py                # Data acquisition, feature prep, visualization
├── pipelines/
│   ├── fetch-data-kaggle-pipeline.py
│   ├── label-preparation-pipeline.py
│   ├── feature-engineering-pipeline.py
│   ├── prepare-training-data-pipeline.py
│   ├── forecast-model-training-pipeline.py
│   └── performance-evaluation-pipeline.py
├── .env.example
├── pyproject.toml
├── requirements.txt
├── run-api.sh
├── run-experiment.sh
└── sample-report.md

The repository is organized as a modular, production-oriented ML system rather than a notebook-based experiment. Each directory represents a distinct responsibility, enabling reproducibility, scalability, and clear separation between data processing, modeling, and reporting.

Architecture

flowchart TD

A[Kaggle Dataset] --> B(fetch-data-kaggle-pipeline)
B --> C[(SQLite Market Database)]

C --> D(label-preparation-pipeline)
C --> E(feature-engineering-pipeline)

D --> F[labels.csv]
E --> G[features.csv]

F --> H(prepare-training-data-pipeline)
G --> H

H --> I[(training-x.pkl)]
H --> J[(training-y.pkl)]
H --> K[(features-set.pkl)]
H --> L[(spot-prices.csv)]

I --> M(forecast-model-training-pipeline)
J --> M

M --> N[(feature-scaler.pkl)]
M --> O[(Forecast Models — 1–5 Horizons)]
M --> P[(Training Loss Plots)]

O --> Q(performance-evaluation-pipeline)
N --> Q
G --> Q
F --> Q

Q --> R[MAPE Metrics]
Q --> S[Directional Accuracy]

O --> T[SHAP Explainers]
G --> T
T --> U[Feature Attribution Scores]

U --> V[StructuredFeatureMarketAnalyst]
O --> V
L --> V

V --> W[LangGraph Workflow]
W --> X[Structured Prompts]
X --> Y[Groq LLM Reasoning]

Y --> Z[Analyst Insights]
Z --> AA[Automated Market Report]
AA --> AB[FastAPI Service]

Pipeline Lifecycle

Raw market data flows through six sequential, reproducible stages:

Data Acquisition  →  Label Construction  →  Feature Engineering
        ↓
Training Dataset Preparation  →  Model Training  →  Performance Evaluation
        ↓
Explainability (SHAP)  →  LLM Reasoning  →  Analyst Report

`fetch-data-kaggle-pipeline.py`

Authenticates with Kaggle and downloads the commodity dataset, ensuring reproducibility across environments. Acts as the data ingestion entry point — all downstream pipelines depend on this stage.

`label-preparation-pipeline.py`

Constructs supervised learning targets by computing forward-looking price returns from historical LME Aluminum spot prices. Targets are expressed as percentage returns rather than raw prices, aligning the model with directional market movement.

Label definition: y = (future_price − current_price) / current_price

Output: dataset/labels.csv

`feature-engineering-pipeline.py`

Aggregates explanatory variables from multiple economic datasets into a unified feature matrix. Integrates LME spot prices, Google Finance indices, LME inventory levels, and the Baltic Dry Index into a macro-aware representation of commodity markets.

Outputs: dataset/features.csv, artifacts/features.csv

`prepare-training-data-pipeline.py`

Transforms tabular features and labels into temporal tensors suitable for sequence-based forecasting. Constructs sliding windows aligned with each forecast horizon.

Generated artifacts:

Artifact	Purpose
`training-x.pkl`	Model input sequences
`training-y.pkl`	Forecast targets
`features-set.pkl`	Feature ordering metadata
`spot-prices.csv`	Historical reference prices

`forecast-model-training-pipeline.py`

Trains independent models per forecast horizon (1–5 days ahead) using RobustScaler for outlier-resistant feature normalization. Saves trained models and loss visualizations to artifacts/.

`performance-evaluation-pipeline.py`

Evaluates models via chronological backtesting using two metrics:

MAPE (Mean Absolute Percentage Error) — price magnitude accuracy
Directional Accuracy — whether the predicted direction (up/down) is correct

Directional accuracy is the primary metric, as market usefulness depends on predicting the sign of price movement.

Forecast Model Performance

Evaluation follows a strict chronological split — future data is never available during training. Models perform direct multi-horizon forecasting using a fixed historical lookback across 14 market drivers.

Price Accuracy (MAPE)

Days Ahead	Train	Validation
1	0.87%	0.96%
2	1.23%	1.23%
3	1.56%	1.46%
4	1.95%	2.24%
5	2.21%	2.22%

Validation performance closely tracks training error, indicating minimal overfitting.

Directional Accuracy (Primary Metric)

Days Ahead	Train	Validation
1	63.3%	57.4%
2	63.5%	58.0%
3	62.6%	60.5%
4	56.0%	58.5%
5	55.8%	58.3%

Directional accuracy remains consistently above random-walk behavior across all horizons.

Benchmarking Context

Financial markets exhibit extremely low signal-to-noise ratios and near-random short-term dynamics.

Accuracy	Interpretation
~50%	Random walk
52–55%	Weak predictive signal
55–58%	Strong ML performance
58–61%	Research-level forecasting

Autonomous Metal achieves approximately 57–60% directional accuracy, placing it within modern deep-learning commodity forecasting ranges reported in applied quantitative research.

Model Architecture

The forecasting model is a lightweight temporal convolutional network designed for noisy financial time-series.

Input (lookback × features)
        ↓
Conv1D — Temporal Feature Extraction
        ↓
Batch Normalization
        ↓
Flatten Projection
        ↓
Regularized Dense Forecast Head  (L1 + L2)
        ↓
Multi-Horizon Output  (tanh activation)

Architectural decisions:

Conv1D over RNN/LSTM — parallel computation, stable gradients, strong performance on structured time-series
GELU activation — improved gradient smoothness and training stability
Batch Normalization — stabilizes training under non-stationary market distributions
tanh output — bounds predictions, stabilizes return forecasting

Directional Penalty Loss

Financial usefulness depends on predicting direction, not only magnitude. The model uses a custom loss combining MSE with a directional agreement penalty:

def _directional_penalty_loss(y_true, y_pred, sample_weight=None):
    mse = tf.keras.losses.mean_squared_error(y_true, y_pred)

    directional_accuracy = tf.reduce_mean(
        tf.cast(tf.equal(tf.sign(y_true), tf.sign(y_pred)), tf.float32)
    )

    directional_penalty = 2 / (1 + directional_accuracy)

    return mse * directional_penalty

Correct direction → smaller penalty. Incorrect direction → stronger corrective gradient. Magnitude learning is preserved through the MSE backbone.

Training Configuration

Component	Setting
Optimizer	Adam
Loss	Directional Penalty Loss
Metric	MSE
Early stopping	`patience=10`, monitors `val_loss`, restores best weights

Deterministic Training Controls

Python, NumPy, and TensorFlow seed synchronization
TF_DETERMINISTIC_OPS=1
GPU disabled (CUDA_VISIBLE_DEVICES="-1")

Full bitwise determinism cannot be guaranteed due to floating-point behavior across hardware.

Core Modules

`core/model.py` — `AutonomousForecastModelArchitecture`

The quantitative engine that converts engineered market features into forward-looking forecasts.

Forecasting formulation: for timestamp T:

Input X: [T − lookback + 1 … T]
Output y: [T + 1 … T + horizon]

This is sequence-to-multi-horizon regression — all future steps are predicted in a single forward pass, avoiding error accumulation.

Usage:

model = AutonomousForecastModelArchitecture(
    seed=42,
    input_horizon_space=60,
    input_feature_space=12,
    output_horizon_space=10
)

model.fit(X_train, y_train, validation_data=(X_val, y_val))
predictions = model.predict(X_test)

model.save("artifacts/model.keras")
loaded_model = AutonomousForecastModelArchitecture.load("artifacts/model.keras")

`core/graph.py` — `StructuredFeatureMarketAnalyst`

The central orchestration engine that transforms forecasts and SHAP explanations into structured economic narratives.

Conceptual workflow:

Forecast Models  →  SHAP Explainability  →  Structured Feature Analyst
        ↓
LLM Reasoning  →  Automated Market Report

LangGraph execution graph:

get_feature_information
        ↓
get_feature_timeseries
        ↓
get_forecasting
        ↓
get_shap_scores
        ↓
get_llm_insight
        ↓
END

Key components:

Component	Role
`FeatureInsight`	Pydantic schema for structured analyst output
`FeatureAnalystState`	Workflow state container
`StructuredFeatureMarketAnalyst`	Main orchestration engine

The LLM operates via Groq at deterministic temperature with schema-constrained outputs, preventing hallucinated structure while maintaining machine-readable responses.

`core/prompts.py` — Prompt Architecture

Enforces reasoning boundaries, prevents hallucinations, and standardizes analyst communication.

Forecast Models  →  SHAP Explainability  →  Structured Feature Analyst
        ↓
Prompt Layer  →  LLM Output  →  Market Research Report

Component	Role
`StructuredSystemPrompt`	Global behavioral contract — interpret model outputs only
`StructuredUserPrompt`	Injects forecast trajectory, feature metadata, and SHAP signals
`StructuredAnalystReportPrompt`	Generates the final institutional research report

Core rule: The forecast already exists and is final — explanation only.

Required report structure:

# LME Aluminum Market Outlook Report
## Executive Summary
## Key Market Insight
## Aluminum Market Fundamentals
...
## Conclusion

`core/utils.py` — Data Utilities

Component	Role
`FetchFromKaggle`	Authenticated dataset acquisition and storage
`PlotHistory`	TensorFlow training visualization via Plotly
`PrepareLabels`	Forward-looking return label generation
`FetchRawFeatures`	Multi-source feature assembly from SQLite

FetchRawFeatures integrates:

LME Aluminum spot prices
Google Finance indices
LME inventory levels
Baltic Dry Index

Constructs a macro-aware feature space combining supply signals, logistics indicators, and financial market conditions.

`core/logging.py` — `LoggerFactory`

Centralized logging factory providing a unified log format, standardized output destinations, controlled log rotation, and consistent logging levels across all system components.

API Reference

The api/main.py module exposes Autonomous Metal as a deployable analytical service via FastAPI.

`POST /create_report`

Generates an analyst-style Markdown report for a given Friday date.

Request:

{
  "friday_date": "YYYY-MM-DD"
}

Validation rules:

Rule	Requirement
Format	`YYYY-MM-DD`
Day	Must be Friday
Range	2015-01-14 to 2026-02-05

Friday-only execution is enforced because the forecasting system is aligned with LME weekly settlement logic.

Response:

{
  "status": "success",
  "report_date": "YYYY-MM-DD",
  "markdown_report": "..."
}

Interactive docs: http://localhost:8000/docs

Dataset

Autonomous Metal is built upon a purpose-designed dataset available on Kaggle:

LME Aluminum Forecasting Dataset — Explainability

This dataset was engineered to enable not just price forecasting, but the explanation of why forecasts occur — making it suitable for SHAP attribution and LLM reasoning workflows.

Data Composition

Category	Signals
Market price	LME Aluminum spot prices, historical reference prices
Supply-side	LME inventory levels, stock movement signals
Logistics	Baltic Dry Index (global shipping demand proxy)
Financial	Major financial indices, cross-asset sentiment indicators

Integration With Pipelines

Pipeline	Dataset Usage
`fetch-data-kaggle-pipeline`	Automated dataset acquisition
`feature-engineering-pipeline`	Feature extraction and alignment
`label-preparation-pipeline`	Future return target generation
`prepare-training-data-pipeline`	Temporal training tensor construction
`performance-evaluation-pipeline`	Ground-truth validation

Technology Stack

Layer	Technology	Purpose
Language	Python ≥ 3.11	Primary implementation language
Deep Learning	TensorFlow / Keras 2.15	CNN forecasting model, custom loss
Numerical	NumPy 1.26	Tensor operations, window generation
Preprocessing	Scikit-learn 1.4	`RobustScaler` for outlier-resistant scaling
Data	Pandas 2.2	Time-series alignment, feature merging
Storage	SQLite	Structured local market database
Data Acquisition	Kaggle API 1.5	Automated dataset ingestion
Explainability	SHAP 0.44	Feature attribution and model interpretation
LLM Orchestration	LangChain ≥ 0.2	Structured prompt templating
Workflow	LangGraph ≥ 0.2	Deterministic reasoning state graphs
LLM Inference	Groq (`langchain-groq`)	Fast inference with schema constraints
API	FastAPI 0.134, Uvicorn 0.41	REST service for report generation
Visualization	Plotly 5.24	Interactive training loss curves
Serialization	Pickle	Artifact storage for tensors and scalers
CI	GitHub Actions + Pylint	Automated static code analysis

Design Principles

Separation of Concerns — Core logic, pipelines, artifacts, and configuration are isolated to maintain clarity and extensibility.

Reproducibility — Pipelines are structured so datasets and models can be rebuilt consistently from scratch.

Modularity — Components can evolve independently as the system grows toward autonomous analysis.

Explainability First — Forecasting outputs are designed to be interpretable and directly usable in analyst-style reporting.

Controlled AI Reasoning — LLMs operate under strict analytical constraints: they interpret model outputs, never generate independent forecasts.

Local-First — Experiments run deterministically on a single machine without requiring external ML infrastructure.

This structure supports the gradual evolution of Autonomous Metal from a forecasting pipeline into an autonomous market research assistant capable of reasoning about market drivers and producing decision-support insights.

Disclaimer

This project is developed strictly for research, educational, and experimental purposes.

Autonomous Metal is an applied machine learning and analytical systems project intended to explore forecasting methodologies, explainability techniques, and AI-assisted market reasoning. Nothing contained in this repository, generated reports, forecasts, or analytical outputs should be interpreted as financial advice, investment recommendations, or trading guidance.

Users are solely responsible for any decisions made based on information produced by this system.

Author

Tanul Kumar Srivastava
Applied Data Scientist & ML Systems Engineer

Licensed under the MIT License.

Tksrivastava/autonomous-metal