AI-Powered Quantitative Trading System (LSTM)

📈 Executive Summary

This project implements an end-to-end algorithmic trading system leveraging Deep Learning (LSTM) to predict stock price movements and generate alpha.

Unlike traditional price prediction models, this system focuses on Cross-Sectional Alpha Scoring—ranking stocks based on their relative strength against the market (SPY) and volatility metrics. The project demonstrates a complete quantitative workflow: from data ingestion and complex feature engineering to model training and event-driven backtesting.

Key Performance Highlight (Backtest):

Total Return: Achieved a CAGR of 79.9% (Total Return +483%) vs. S&P 500 Benchmark (+23% CAGR) over the 3-year out-of-sample period.
Strategy: Dynamic position sizing with compounding capital.

📊 Performance Analysis

1. Equity Curve vs. Benchmark (see backtest_trades.csv how the AI trade)

The strategy significantly outperformed the S&P 500 benchmark during the out-of-sample testing period. The compounding effect and dynamic position sizing allowed the portfolio to capitalize on high-confidence signals.

(Blue: AI Strategy | Grey: S&P 500 Benchmark)

Strategy Logic:

Alpha Selection (Top-K): The model screens the S&P 500 universe daily. It only enters positions when the predicted Alpha Score is > 0.2 (> 0 indicates that the model has more than 50% confidence that the stock will outperform the S&P 500 index.), selecting the highest-ranked candidates.
Dynamic Position Sizing (Compounding): The portfolio is capped at 5 positions (20% allocation each). Crucially, trade sizes are dynamically calculated based on Current Total Equity rather than initial capital, allowing the portfolio to compound gains aggressively during winning streaks.
Condition-Based Exit: Holdings are reviewed on a weekly basis (every 7 days). A position is liquidated if the model's predicted score turns negative (Score < 0), ensuring capital is protected from deteriorating trends.

2. Risk & Return Distribution

The analysis of closed trades shows a positive skew in returns. The "Realized PnL" chart demonstrates steady capital appreciation with controlled drawdowns.

3. Model Interpretability (Feature Importance)

Using Permutation Importance, we identified that long-term trend indicators (sma100_gap) and momentum oscillators (rsi14) were the most critical drivers for the LSTM's decision-making process.

🛠 Project Structure

The codebase is modularized to mimic a production-grade quantitative pipeline:

├── getdata.py              # Data Ingestion: Downloads historical data via yfinance
├── indicator.py            # Library: Custom implementation of technical indicators (RSI, MACD, BOLL, etc.)
├── feature_engineering.py  # ETL Pipeline: Cleans data, generates factors, calculates Alpha Targets
├── train.py                # Modeling: PyTorch LSTM implementation with sliding window datasets
├── analyze_features.py     # Analysis: Permutation importance to interpret "Black Box" models
├── backtest.py             # Simulation: Event-driven backtester with dynamic portfolio management
└── analyze_trade.py        # Reporting: Generates visualizations and financial metrics (Sharpe, Win Rate)

💻 Installation & Usage

Prerequisites

pip install torch pandas numpy matplotlib seaborn yfinance scikit-learn tqdm joblib

Workflow

Download Data:
```
python getdata.py
```
Generate Features:
```
python feature_engineering.py
```
Train Model:
```
python train_model.py
```
Run Backtest:
```
python backtest.py
```

Analyze Results:

python analyze_features.py analyze_trade.py

ImlyChlung/LSTM-Stock-Predication