pleiadian53/GRL
A generalized reinforcement learning framework for structured action representations and adaptive decision-making in evolving systems.
GRL: Generalized Reinforcement Learning
Actions as Operators on State Space
π― What is GRL?
Generalized Reinforcement Learning (GRL) redefines the concept of "action" in reinforcement learning. Instead of treating actions as discrete indices or fixed-dimensional vectors, GRL models actions as parametric operators that transform the state space.
flowchart TB
subgraph TRL["π΅ Traditional RL"]
direction LR
S1["<b>State</b><br/>s"] --> P1["<b>Policy</b><br/>Ο"]
P1 --> A1["<b>Action Symbol</b><br/>a β A"]
A1 --> NS1["<b>Next State</b><br/>s'"]
end
TRL --> GRL
subgraph GRL["β¨ Generalized RL"]
direction LR
S2["<b>State</b><br/>s"] --> P2["<b>Policy</b><br/>Ο"]
P2 --> AP["<b>Operator Params</b><br/>ΞΈ"]
AP --> OP["<b>Operator</b><br/>Γ<sub>ΞΈ</sub>"]
OP --> ST["<b>State Transform</b><br/>s' = Γ<sub>ΞΈ</sub>(s)"]
end
style S1 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style NS1 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style A1 fill:#fff9c4,stroke:#f57c00,stroke-width:3px,color:#000
style P1 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000
style S2 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style ST fill:#c8e6c9,stroke:#388e3c,stroke-width:3px,color:#000
style AP fill:#fff59d,stroke:#fbc02d,stroke-width:3px,color:#000
style OP fill:#ffcc80,stroke:#f57c00,stroke-width:3px,color:#000
style P2 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000
style TRL fill:#fafafa,stroke:#666,stroke-width:2px
style GRL fill:#fafafa,stroke:#666,stroke-width:2px
linkStyle 4 stroke:#666,stroke-width:2px
This formulation, inspired by the least-action principle in physics, leads to policies that are not only optimal but also physically groundedβpreferring smooth, efficient transformations over abrupt changes.
π Tutorial Papers
Part I: Reinforcement Fields β Particle-Based Learning
Status: π In progress (9/10 chapters complete)
Particle-based belief representation, energy landscapes, and functional learning over augmented state-action space.
Start Learning β | Research Roadmap β
| Section | Chapters | Topics |
|---|---|---|
| Foundations | 0, 1, 2, 3 | Augmented space, particles, RKHS, energy |
| Field & Memory | 4, 4a, 5, 6, 6a | Functional fields, Riesz theorem, belief states, MemoryUpdate, advanced memory |
| Algorithms | 7 | RF-SARSA (next) |
| Interpretation | 8-10 | Soft transitions, POMDP, synthesis |
Part II: Reinforcement Fields β Emergent Structure & Spectral Abstraction
Status: π Planned (after Part I)
Spectral discovery of hierarchical concepts through functional clustering in RKHS.
| Section | Chapters | Topics |
|---|---|---|
| Functional Clustering | 11 | Clustering in function space |
| Spectral Concepts | 12 | Concepts as eigenmodes |
| Hierarchical Control | 13 | Multi-level abstraction |
Based on: Section V of the original paper
Reading time: ~10 hours total (both parts)
Quantum-Inspired Extensions
Status: π¬ Advanced topics (9 chapters complete)
Mathematical connections to quantum mechanics and novel probability formulations for ML.
| Theme | Chapters | Topics |
|---|---|---|
| Foundations | 01, 01a, 02 | RKHS-QM parallel, state vs. wavefunction, amplitude interpretation |
| Complex RKHS | 03 | Complex-valued kernels, interference, phase semantics |
| Projections | 04, 05, 06 | Action/state fields, concept subspaces, belief dynamics |
| Learning & Memory | 07, 08 | Beyond GP, memory dynamics, principled consolidation |
Novel Contributions:
- Amplitude-based RL: Complex-valued value functions with phase semantics
- MDL consolidation: Information-theoretic memory management
- Concept-based MoE: Hierarchical RL via subspace projections
π Key Innovations
| Aspect | Classical RL | GRL |
|---|---|---|
| Action | Discrete index or vector | Parametric operator |
| Action Space | Finite or bounded | Continuous manifold |
| Value Function | Reinforcement field |
|
| Experience | Replay buffer | Particle memory in RKHS |
| Policy | Learned function | Inferred from energy landscape |
| Uncertainty | External (dropout, ensembles) | Emergent from particle sparsity |
GRL as a Unifying Framework
Key Insight: Traditional RL algorithms (Q-learning, DQN, PPO, SAC, RLHF for LLMs) are special cases of GRL!
When you:
- Discretize actions β GRL recovers Q-learning
- Use neural networks β GRL recovers DQN
- Apply Boltzmann policies β GRL recovers REINFORCE/Actor-Critic
- Fine-tune LLMs β GRL generalizes RLHF
See: Recovering Classical RL from GRL β
Why GRL?
- Generalization: Subsumes existing methods as special cases
- Continuous actions: No discretization, full precision
- Smooth interpolation: Nearby parameters β similar behavior
- Compositional: Operators can be composed (operator algebra)
- Uncertainty: Sparse particles = high uncertainty (no ensembles needed)
- Interpretability: Energy landscapes, particle inspection
- Modern applications: Applies to RLHF, prompt optimization, neural architecture search
π Quick Start
Installation
# Clone the repository
git clone https://github.com/pleiadian53/GRL.git
cd GRL
# Create environment with mamba/conda
mamba env create -f environment.yml
mamba activate grl
# Install in development mode
pip install -e .
# Verify installation (auto-detects CPU/GPU/MPS)
python scripts/verify_installation.pySee INSTALL.md for detailed instructions.
First Steps
- Read the tutorial: Start with Chapter 0: Overview
- Explore concepts: Work through Chapter 1: Core Concepts
- Understand algorithms: See the algorithm chapters (coming soon)
- Implement: Follow the implementation guide
π Project Structure
GRL/
βββ src/grl/ # Core library
β βββ core/ # Particle memory, kernels
β βββ algorithms/ # MemoryUpdate, RF-SARSA
β βββ envs/ # Environments
β βββ visualization/ # Plotting tools
βββ docs/ # π Public documentation
β βββ GRL0/ # Tutorial paper (Reinforcement Fields)
β βββ tutorials/ # Tutorial chapters (6/10 complete)
β βββ paper/ # Paper-ready sections
β βββ implementation/ # Implementation specs
βββ notebooks/ # Jupyter notebooks
β βββ vector_field.ipynb # Vector field demonstrations
βββ examples/ # Runnable examples
βββ scripts/ # Utility scripts
βββ tests/ # Unit tests
βββ configs/ # Configuration files
π Documentation
Tutorial Papers: Reinforcement Fields (Two Parts)
Part I: Particle-Based Learning (6/10 chapters complete)
- Start Here β Overview
- Tutorials β Chapter-by-chapter learning
- Implementation β Technical specifications
Part II: Emergent Structure & Spectral Abstraction (Planned)
Additional Resources
- Installation Guide β Detailed setup instructions
- Interactive Notebooks β Jupyter demos with visualizations (best viewed on Pages)
- View source β Raw notebooks in repository
π¬ Research Papers
Original Paper (arXiv 2022)
Po-Hsiang Chiu, Manfred Huber
arXiv:2208.04822 (2022) β 37 pages, 15 figures
The foundational work introducing particle-based belief states, reinforcement fields, and concept-driven learning.
Tutorial Papers (This Repository)
Reinforcement Fields Framework β Enhanced exposition with modern formalization
Part I: Particle-Based Learning
- Functional fields over augmented state-action space
- Particle memory as belief state in RKHS
- MemoryUpdate and RF-SARSA algorithms
- Emergent soft state transitions, POMDP interpretation
Status: π Tutorial in progress (6/10 chapters complete)
Part II: Emergent Structure & Spectral Abstraction
- Functional clustering (clustering functions, not points)
- Spectral methods on kernel matrices
- Concepts as coherent subspaces of the reinforcement field
- Hierarchical policy organization
Status: π Planned (after Part I)
Planned Extensions
| Paper | Title | Status | Progress |
|---|---|---|---|
| Paper A | Generalized Reinforcement Learning β Actions as Operators | π’ Draft Complete | ~70% |
| Operator algebra, generalized Bellman equation, energy regularization | Complete draft, 3/7 figures, proofs outlined | ||
| Paper B | Operator Policies β Learning State-Space Operators with Neural Operator Networks (tentative) | β³ Planned | ~0% |
| Neural operators, scalable training, operator-actor-critic | After Paper A | ||
| Paper C | Applications of GRL to Physics, Robotics, and Differentiable Control (tentative) | β³ Planned | ~0% |
| Physics-based control, compositional behaviors, transfer learning | After Paper B |
Timeline:
- Paper A: Target submission April 2026 (NeurIPS/ICML)
- Paper B: Target submission June 2026 (ICML/NeurIPS)
- Paper C: Target submission July 2026 (CoRL)
See: Research Roadmap for detailed timeline and additional research directions.
π How GRL Works: Particle-Based Learning
flowchart LR
A["π <b>State</b><br/>s"] --> B["πΎ <b>Query</b><br/>Memory Ξ©"]
B --> C["π <b>Compute</b><br/>Field QβΊ"]
C --> D["π― <b>Infer</b><br/>Action ΞΈ"]
D --> E["β‘ <b>Execute</b><br/>Operator"]
E --> F["ποΈ <b>Observe</b><br/>s', r"]
F --> G["β¨ <b>Create</b><br/>Particle"]
G --> H["π <b>Update</b><br/>Memory"]
H -->|Loop| B
style A fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style B fill:#fff9c4,stroke:#f57c00,stroke-width:3px,color:#000
style C fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000
style D fill:#fff59d,stroke:#fbc02d,stroke-width:3px,color:#000
style E fill:#ffcc80,stroke:#f57c00,stroke-width:3px,color:#000
style F fill:#c8e6c9,stroke:#388e3c,stroke-width:3px,color:#000
style G fill:#f8bbd0,stroke:#c2185b,stroke-width:3px,color:#000
style H fill:#b2dfdb,stroke:#00796b,stroke-width:3px,color:#000
Code Example
from grl.core import ParticleMemory
from grl.core import RBFKernel
from grl.algorithms import MemoryUpdate, RFSarsa
# Create particle memory (the agent's belief state)
memory = ParticleMemory()
# Define similarity kernel
kernel = RBFKernel(lengthscale=1.0)
# Learning loop
for episode in range(num_episodes):
state = env.reset()
for step in range(max_steps):
# Infer action from particle memory
action = infer_action(memory, state, kernel)
# Execute and observe
next_state, reward, done = env.step(action)
# Update particle memory (belief transition)
memory = memory_update(memory, state, action, reward, kernel)
state = next_stateπ Citation
Original arXiv Paper
The foundational work is available on arXiv:
Chiu, P.-H., & Huber, M. (2022). Generalized Reinforcement Learning: Experience Particles, Action Operator, Reinforcement Field, Memory Association, and Decision Concepts. arXiv:2208.04822.
@article{chiu2022generalized,
title={Generalized Reinforcement Learning: Experience Particles, Action Operator,
Reinforcement Field, Memory Association, and Decision Concepts},
author={Chiu, Po-Hsiang and Huber, Manfred},
journal={arXiv preprint arXiv:2208.04822},
year={2022},
url={https://arxiv.org/abs/2208.04822}
}Tutorial Papers (This Repository)
The tutorial series provides enhanced exposition and modern formalization:
Part I: Particle-Based Learning (In progress)
@article{chiu2026part1,
title={Reinforcement Fields: Particle-Based Learning},
author={Chiu, Po-Hsiang and Huber, Manfred},
journal={In preparation},
year={2026}
}Part II: Emergent Structure & Spectral Abstraction (Planned)
@article{chiu2026part2,
title={Reinforcement Fields: Emergent Structure and Spectral Abstraction},
author={Chiu, Po-Hsiang and Huber, Manfred},
journal={In preparation},
year={2026}
}Operator Extensions (Future Work)
@article{chiu2026operators,
title={Generalized Reinforcement Learning β Actions as Operators},
author={Chiu, Po-Hsiang},
journal={In preparation},
year={2026+}
}π License
This project is licensed under the MIT License - see the LICENSE file for details.
π The GRL Framework
GRL (Generalized Reinforcement Learning) is a family of methods that rethink how actions are represented and learned.
Original paper: arXiv:2208.04822 (Chiu & Huber, 2022)
Reinforcement Fields (This Repository)
Two-Part Tutorial Series:
Part I: Particle-Based Learning
- Actions as continuous parameters in augmented state-action space
- Particle memory as belief state, kernel-induced value functions
- Learning through energy landscape navigation
Part II: Emergent Structure & Spectral Abstraction
- Concepts emerge from functional clustering in RKHS
- Spectral methods discover hierarchical structure
- Multi-level policy organization
Key Innovation: Learning emerges from particle dynamics in function space, not explicit policy optimization.
Actions as Operators (Paper A β In Development)
Core Idea: Actions as parametric operators that transform state space, with operator algebra providing compositional structure.
Key Innovation: Operator manifolds replace fixed action spaces, enabling compositional behaviors and physical interpretability.
π Acknowledgments
Mathematical Foundations
Core Framework:
- Formulated in Reproducing Kernel Hilbert Spaces (RKHS) β the functional framework for particle-based belief states
- Kernel methods define the geometry and similarity structure of augmented state-action space
- Inspired by the least-action principle in classical mechanics
Quantum-Inspired Probability:
- Probability amplitudes instead of direct probabilities β RKHS inner products as amplitude overlaps
- Complex-valued RKHS enabling interference effects and phase semantics for temporal/contextual dynamics
- Wave function analogy β The reinforcement field as a superposition of particle basis states
- This formulation is novel to mainstream ML and opens new directions for probabilistic reasoning
See: Quantum-Inspired Extensions for technical details.
Conceptual Connections
- Energy-based models (EBMs) β Control as energy landscape navigation
- POMDPs and belief-based control β Particle ensembles as implicit belief states
- Score-based methods β Energy gradients guide policy inference
Implementation Tools
- Gaussian process regression can model scalar energy fields (but is not essential to the framework)
- Neural operators for learning parametric action transformations
- Diffusion models share the gradient-field perspective