GitHunt
TK

Tksrivastava/autonomous-metal

Autonomous Metal is an autonomous AI workflow designed to mimic a quantitative commodity analyst, transforming market data and economic indicators into explainable forecasts and analyst-style insights for LME Aluminum price movements.

Autonomous Metal

An applied AI system that simulates the workflow of a commodity market analyst — combining time-series forecasting, SHAP-driven explainability, and LLM-powered narrative generation to produce institutional-style market research reports.

The project currently focuses on London Metal Exchange (LME) Aluminum, but the architecture is designed to generalize to other commodities and structured economic domains.


Table of Contents


Motivation

Traditional forecasting systems are primarily focused on answering:

What will happen?

Real-world analysts must answer:

What will happen — and why?

Autonomous Metal bridges this gap by evolving from a predictive ML pipeline into an autonomous analytical system capable of producing structured market interpretations. The goal is not just prediction accuracy, but machine-assisted market reasoning.


Sample Report

A generated analyst report produced from the current system is available at /sample-report.md.


Quick Start

Ensure .env is configured from .env.example before running any scripts.

Environment Setup

cp .env.example .env
# Fill in your Kaggle credentials, Groq API key, and forecast parameters

.env variables:

Variable Description
KAGGLE_USERNAME Kaggle account username
KAGGLE_KEY Kaggle API key
KAGGLE_DATASET Target dataset identifier
GROQ_API_KEY Groq LLM inference key
GROQ_LLM_MODEL LLM model name (e.g. llama-3.3-70b-versatile)
FORECAST_HORIZON Number of forecast days ahead
LAG_WINDOW Historical lookback window size
TRAINING_CUTOFF Train/validation split date

Install Dependencies

pip install -r requirements.txt

Run the API

Start the FastAPI service to generate LME Aluminum analyst reports on demand.

chmod +x run-api.sh
./run-api.sh

Once running, open the interactive docs at: http://localhost:8000/docs

Retrain Models

Execute the full training and experimentation pipeline.

chmod +x run-experiment.sh
./run-experiment.sh

This runs the complete ML lifecycle: data ingestion → feature engineering → model training → evaluation → artifact updates.


Repository Structure

autonomous-metal/
├── .github/
│   └── workflows/
│       └── pylint.yml          # Automated static analysis on push
├── api/
│   └── main.py                 # FastAPI service and report generation endpoint
├── artifacts/                  # Generated experiment outputs (models, scalers, plots)
├── core/
│   ├── graph.py                # LangGraph reasoning workflow and analyst engine
│   ├── logging.py              # Centralized LoggerFactory
│   ├── model.py                # CNN forecasting model architecture
│   ├── prompts.py              # Structured LLM prompt templates
│   └── utils.py                # Data acquisition, feature prep, visualization
├── pipelines/
│   ├── fetch-data-kaggle-pipeline.py
│   ├── label-preparation-pipeline.py
│   ├── feature-engineering-pipeline.py
│   ├── prepare-training-data-pipeline.py
│   ├── forecast-model-training-pipeline.py
│   └── performance-evaluation-pipeline.py
├── .env.example
├── pyproject.toml
├── requirements.txt
├── run-api.sh
├── run-experiment.sh
└── sample-report.md

The repository is organized as a modular, production-oriented ML system rather than a notebook-based experiment. Each directory represents a distinct responsibility, enabling reproducibility, scalability, and clear separation between data processing, modeling, and reporting.


Architecture

flowchart TD

A[Kaggle Dataset] --> B(fetch-data-kaggle-pipeline)
B --> C[(SQLite Market Database)]

C --> D(label-preparation-pipeline)
C --> E(feature-engineering-pipeline)

D --> F[labels.csv]
E --> G[features.csv]

F --> H(prepare-training-data-pipeline)
G --> H

H --> I[(training-x.pkl)]
H --> J[(training-y.pkl)]
H --> K[(features-set.pkl)]
H --> L[(spot-prices.csv)]

I --> M(forecast-model-training-pipeline)
J --> M

M --> N[(feature-scaler.pkl)]
M --> O[(Forecast Models — 1–5 Horizons)]
M --> P[(Training Loss Plots)]

O --> Q(performance-evaluation-pipeline)
N --> Q
G --> Q
F --> Q

Q --> R[MAPE Metrics]
Q --> S[Directional Accuracy]

O --> T[SHAP Explainers]
G --> T
T --> U[Feature Attribution Scores]

U --> V[StructuredFeatureMarketAnalyst]
O --> V
L --> V

V --> W[LangGraph Workflow]
W --> X[Structured Prompts]
X --> Y[Groq LLM Reasoning]

Y --> Z[Analyst Insights]
Z --> AA[Automated Market Report]
AA --> AB[FastAPI Service]
Loading

Pipeline Lifecycle

Raw market data flows through six sequential, reproducible stages:

Data Acquisition  →  Label Construction  →  Feature Engineering
        ↓
Training Dataset Preparation  →  Model Training  →  Performance Evaluation
        ↓
Explainability (SHAP)  →  LLM Reasoning  →  Analyst Report

fetch-data-kaggle-pipeline.py

Authenticates with Kaggle and downloads the commodity dataset, ensuring reproducibility across environments. Acts as the data ingestion entry point — all downstream pipelines depend on this stage.

label-preparation-pipeline.py

Constructs supervised learning targets by computing forward-looking price returns from historical LME Aluminum spot prices. Targets are expressed as percentage returns rather than raw prices, aligning the model with directional market movement.

Label definition: y = (future_price − current_price) / current_price

Output: dataset/labels.csv

feature-engineering-pipeline.py

Aggregates explanatory variables from multiple economic datasets into a unified feature matrix. Integrates LME spot prices, Google Finance indices, LME inventory levels, and the Baltic Dry Index into a macro-aware representation of commodity markets.

Outputs: dataset/features.csv, artifacts/features.csv

prepare-training-data-pipeline.py

Transforms tabular features and labels into temporal tensors suitable for sequence-based forecasting. Constructs sliding windows aligned with each forecast horizon.

Generated artifacts:

Artifact Purpose
training-x.pkl Model input sequences
training-y.pkl Forecast targets
features-set.pkl Feature ordering metadata
spot-prices.csv Historical reference prices

forecast-model-training-pipeline.py

Trains independent models per forecast horizon (1–5 days ahead) using RobustScaler for outlier-resistant feature normalization. Saves trained models and loss visualizations to artifacts/.

performance-evaluation-pipeline.py

Evaluates models via chronological backtesting using two metrics:

  • MAPE (Mean Absolute Percentage Error) — price magnitude accuracy
  • Directional Accuracy — whether the predicted direction (up/down) is correct

Directional accuracy is the primary metric, as market usefulness depends on predicting the sign of price movement.


Forecast Model Performance

Evaluation follows a strict chronological split — future data is never available during training. Models perform direct multi-horizon forecasting using a fixed historical lookback across 14 market drivers.

Price Accuracy (MAPE)

Days Ahead Train Validation
1 0.87% 0.96%
2 1.23% 1.23%
3 1.56% 1.46%
4 1.95% 2.24%
5 2.21% 2.22%

Validation performance closely tracks training error, indicating minimal overfitting.

Directional Accuracy (Primary Metric)

Days Ahead Train Validation
1 63.3% 57.4%
2 63.5% 58.0%
3 62.6% 60.5%
4 56.0% 58.5%
5 55.8% 58.3%

Directional accuracy remains consistently above random-walk behavior across all horizons.

Benchmarking Context

Financial markets exhibit extremely low signal-to-noise ratios and near-random short-term dynamics.

Accuracy Interpretation
~50% Random walk
52–55% Weak predictive signal
55–58% Strong ML performance
58–61% Research-level forecasting

Autonomous Metal achieves approximately 57–60% directional accuracy, placing it within modern deep-learning commodity forecasting ranges reported in applied quantitative research.


Model Architecture

The forecasting model is a lightweight temporal convolutional network designed for noisy financial time-series.

Input (lookback × features)
        ↓
Conv1D — Temporal Feature Extraction
        ↓
Batch Normalization
        ↓
Flatten Projection
        ↓
Regularized Dense Forecast Head  (L1 + L2)
        ↓
Multi-Horizon Output  (tanh activation)

Architectural decisions:

  • Conv1D over RNN/LSTM — parallel computation, stable gradients, strong performance on structured time-series
  • GELU activation — improved gradient smoothness and training stability
  • Batch Normalization — stabilizes training under non-stationary market distributions
  • tanh output — bounds predictions, stabilizes return forecasting

Directional Penalty Loss

Financial usefulness depends on predicting direction, not only magnitude. The model uses a custom loss combining MSE with a directional agreement penalty:

def _directional_penalty_loss(y_true, y_pred, sample_weight=None):
    mse = tf.keras.losses.mean_squared_error(y_true, y_pred)

    directional_accuracy = tf.reduce_mean(
        tf.cast(tf.equal(tf.sign(y_true), tf.sign(y_pred)), tf.float32)
    )

    directional_penalty = 2 / (1 + directional_accuracy)

    return mse * directional_penalty

Correct direction → smaller penalty. Incorrect direction → stronger corrective gradient. Magnitude learning is preserved through the MSE backbone.

Training Configuration

Component Setting
Optimizer Adam
Loss Directional Penalty Loss
Metric MSE
Early stopping patience=10, monitors val_loss, restores best weights

Deterministic Training Controls

  • Python, NumPy, and TensorFlow seed synchronization
  • TF_DETERMINISTIC_OPS=1
  • GPU disabled (CUDA_VISIBLE_DEVICES="-1")

Full bitwise determinism cannot be guaranteed due to floating-point behavior across hardware.


Core Modules

core/model.pyAutonomousForecastModelArchitecture

The quantitative engine that converts engineered market features into forward-looking forecasts.

Forecasting formulation: for timestamp T:

  • Input X: [T − lookback + 1 … T]
  • Output y: [T + 1 … T + horizon]

This is sequence-to-multi-horizon regression — all future steps are predicted in a single forward pass, avoiding error accumulation.

Usage:

model = AutonomousForecastModelArchitecture(
    seed=42,
    input_horizon_space=60,
    input_feature_space=12,
    output_horizon_space=10
)

model.fit(X_train, y_train, validation_data=(X_val, y_val))
predictions = model.predict(X_test)

model.save("artifacts/model.keras")
loaded_model = AutonomousForecastModelArchitecture.load("artifacts/model.keras")

core/graph.pyStructuredFeatureMarketAnalyst

The central orchestration engine that transforms forecasts and SHAP explanations into structured economic narratives.

Conceptual workflow:

Forecast Models  →  SHAP Explainability  →  Structured Feature Analyst
        ↓
LLM Reasoning  →  Automated Market Report

LangGraph execution graph:

get_feature_information
        ↓
get_feature_timeseries
        ↓
get_forecasting
        ↓
get_shap_scores
        ↓
get_llm_insight
        ↓
END

Key components:

Component Role
FeatureInsight Pydantic schema for structured analyst output
FeatureAnalystState Workflow state container
StructuredFeatureMarketAnalyst Main orchestration engine

The LLM operates via Groq at deterministic temperature with schema-constrained outputs, preventing hallucinated structure while maintaining machine-readable responses.


core/prompts.py — Prompt Architecture

Enforces reasoning boundaries, prevents hallucinations, and standardizes analyst communication.

Forecast Models  →  SHAP Explainability  →  Structured Feature Analyst
        ↓
Prompt Layer  →  LLM Output  →  Market Research Report
Component Role
StructuredSystemPrompt Global behavioral contract — interpret model outputs only
StructuredUserPrompt Injects forecast trajectory, feature metadata, and SHAP signals
StructuredAnalystReportPrompt Generates the final institutional research report

Core rule: The forecast already exists and is final — explanation only.

Required report structure:

# LME Aluminum Market Outlook Report
## Executive Summary
## Key Market Insight
## Aluminum Market Fundamentals
...
## Conclusion

core/utils.py — Data Utilities

Component Role
FetchFromKaggle Authenticated dataset acquisition and storage
PlotHistory TensorFlow training visualization via Plotly
PrepareLabels Forward-looking return label generation
FetchRawFeatures Multi-source feature assembly from SQLite

FetchRawFeatures integrates:

  • LME Aluminum spot prices
  • Google Finance indices
  • LME inventory levels
  • Baltic Dry Index

Constructs a macro-aware feature space combining supply signals, logistics indicators, and financial market conditions.


core/logging.pyLoggerFactory

Centralized logging factory providing a unified log format, standardized output destinations, controlled log rotation, and consistent logging levels across all system components.


API Reference

The api/main.py module exposes Autonomous Metal as a deployable analytical service via FastAPI.

POST /create_report

Generates an analyst-style Markdown report for a given Friday date.

Request:

{
  "friday_date": "YYYY-MM-DD"
}

Validation rules:

Rule Requirement
Format YYYY-MM-DD
Day Must be Friday
Range 2015-01-14 to 2026-02-05

Friday-only execution is enforced because the forecasting system is aligned with LME weekly settlement logic.

Response:

{
  "status": "success",
  "report_date": "YYYY-MM-DD",
  "markdown_report": "..."
}

Interactive docs: http://localhost:8000/docs


Dataset

Autonomous Metal is built upon a purpose-designed dataset available on Kaggle:

LME Aluminum Forecasting Dataset — Explainability

This dataset was engineered to enable not just price forecasting, but the explanation of why forecasts occur — making it suitable for SHAP attribution and LLM reasoning workflows.

Data Composition

Category Signals
Market price LME Aluminum spot prices, historical reference prices
Supply-side LME inventory levels, stock movement signals
Logistics Baltic Dry Index (global shipping demand proxy)
Financial Major financial indices, cross-asset sentiment indicators

Integration With Pipelines

Pipeline Dataset Usage
fetch-data-kaggle-pipeline Automated dataset acquisition
feature-engineering-pipeline Feature extraction and alignment
label-preparation-pipeline Future return target generation
prepare-training-data-pipeline Temporal training tensor construction
performance-evaluation-pipeline Ground-truth validation

Technology Stack

Layer Technology Purpose
Language Python ≥ 3.11 Primary implementation language
Deep Learning TensorFlow / Keras 2.15 CNN forecasting model, custom loss
Numerical NumPy 1.26 Tensor operations, window generation
Preprocessing Scikit-learn 1.4 RobustScaler for outlier-resistant scaling
Data Pandas 2.2 Time-series alignment, feature merging
Storage SQLite Structured local market database
Data Acquisition Kaggle API 1.5 Automated dataset ingestion
Explainability SHAP 0.44 Feature attribution and model interpretation
LLM Orchestration LangChain ≥ 0.2 Structured prompt templating
Workflow LangGraph ≥ 0.2 Deterministic reasoning state graphs
LLM Inference Groq (langchain-groq) Fast inference with schema constraints
API FastAPI 0.134, Uvicorn 0.41 REST service for report generation
Visualization Plotly 5.24 Interactive training loss curves
Serialization Pickle Artifact storage for tensors and scalers
CI GitHub Actions + Pylint Automated static code analysis

Design Principles

Separation of Concerns — Core logic, pipelines, artifacts, and configuration are isolated to maintain clarity and extensibility.

Reproducibility — Pipelines are structured so datasets and models can be rebuilt consistently from scratch.

Modularity — Components can evolve independently as the system grows toward autonomous analysis.

Explainability First — Forecasting outputs are designed to be interpretable and directly usable in analyst-style reporting.

Controlled AI Reasoning — LLMs operate under strict analytical constraints: they interpret model outputs, never generate independent forecasts.

Local-First — Experiments run deterministically on a single machine without requiring external ML infrastructure.

This structure supports the gradual evolution of Autonomous Metal from a forecasting pipeline into an autonomous market research assistant capable of reasoning about market drivers and producing decision-support insights.


Disclaimer

This project is developed strictly for research, educational, and experimental purposes.

Autonomous Metal is an applied machine learning and analytical systems project intended to explore forecasting methodologies, explainability techniques, and AI-assisted market reasoning. Nothing contained in this repository, generated reports, forecasts, or analytical outputs should be interpreted as financial advice, investment recommendations, or trading guidance.

Users are solely responsible for any decisions made based on information produced by this system.


Author

Tanul Kumar Srivastava
Applied Data Scientist & ML Systems Engineer


Licensed under the MIT License.

Tksrivastava/autonomous-metal | GitHunt