GitHunt
IM

ImlyChlung/LSTM-Stock-Predication

End-to-end algorithmic trading system using PyTorch LSTM. Features a custom Cross-Sectional Alpha ranking engine, dynamic portfolio compounding, and event-driven backtesting (+483% return vs SPY).

AI-Powered Quantitative Trading System (LSTM)

Python
PyTorch
Pandas
Status

πŸ“ˆ Executive Summary

This project implements an end-to-end algorithmic trading system leveraging Deep Learning (LSTM) to predict stock price movements and generate alpha.

Unlike traditional price prediction models, this system focuses on Cross-Sectional Alpha Scoringβ€”ranking stocks based on their relative strength against the market (SPY) and volatility metrics. The project demonstrates a complete quantitative workflow: from data ingestion and complex feature engineering to model training and event-driven backtesting.

Key Performance Highlight (Backtest):

  • Total Return: Achieved a CAGR of 79.9% (Total Return +483%) vs. S&P 500 Benchmark (+23% CAGR) over the 3-year out-of-sample period.
  • Strategy: Dynamic position sizing with compounding capital.

πŸ“Š Performance Analysis

1. Equity Curve vs. Benchmark (see backtest_trades.csv how the AI trade)

The strategy significantly outperformed the S&P 500 benchmark during the out-of-sample testing period. The compounding effect and dynamic position sizing allowed the portfolio to capitalize on high-confidence signals.

Equity Curve
(Blue: AI Strategy | Grey: S&P 500 Benchmark)

Strategy Logic:

  • Alpha Selection (Top-K): The model screens the S&P 500 universe daily. It only enters positions when the predicted Alpha Score is > 0.2 (> 0 indicates that the model has more than 50% confidence that the stock will outperform the S&P 500 index.), selecting the highest-ranked candidates.
  • Dynamic Position Sizing (Compounding): The portfolio is capped at 5 positions (20% allocation each). Crucially, trade sizes are dynamically calculated based on Current Total Equity rather than initial capital, allowing the portfolio to compound gains aggressively during winning streaks.
  • Condition-Based Exit: Holdings are reviewed on a weekly basis (every 7 days). A position is liquidated if the model's predicted score turns negative (Score < 0), ensuring capital is protected from deteriorating trends.

2. Risk & Return Distribution

The analysis of closed trades shows a positive skew in returns. The "Realized PnL" chart demonstrates steady capital appreciation with controlled drawdowns.

Metrics Distribution

3. Model Interpretability (Feature Importance)

Using Permutation Importance, we identified that long-term trend indicators (sma100_gap) and momentum oscillators (rsi14) were the most critical drivers for the LSTM's decision-making process.

Feature Importance


πŸ›  Project Structure

The codebase is modularized to mimic a production-grade quantitative pipeline:

β”œβ”€β”€ getdata.py              # Data Ingestion: Downloads historical data via yfinance
β”œβ”€β”€ indicator.py            # Library: Custom implementation of technical indicators (RSI, MACD, BOLL, etc.)
β”œβ”€β”€ feature_engineering.py  # ETL Pipeline: Cleans data, generates factors, calculates Alpha Targets
β”œβ”€β”€ train.py                # Modeling: PyTorch LSTM implementation with sliding window datasets
β”œβ”€β”€ analyze_features.py     # Analysis: Permutation importance to interpret "Black Box" models
β”œβ”€β”€ backtest.py             # Simulation: Event-driven backtester with dynamic portfolio management
└── analyze_trade.py        # Reporting: Generates visualizations and financial metrics (Sharpe, Win Rate)

πŸ’» Installation & Usage

Prerequisites

pip install torch pandas numpy matplotlib seaborn yfinance scikit-learn tqdm joblib

Workflow

  1. Download Data:
    python getdata.py
  2. Generate Features:
    python feature_engineering.py
  3. Train Model:
    python train_model.py
  4. Run Backtest:
    python backtest.py
  5. Analyze Results:
    python analyze_features.py analyze_trade.py