Qredence/fleet-rlm
DSPy's Recursive Language Model (RLM) with Modal Sandbox for secure cloud-based code execution
fleet-rlm
Secure, cloud-sandboxed Recursive Language Models (RLM) with DSPy and Modal.
fleet-rlm gives AI agents a secure cloud sandbox for long-context code and document work, with a Web UI-first experience, recursive delegation, and DSPy-aligned tooling.
Paper | Docs | Contributing
Quick Start
Install and launch the Web UI in under a minute:
# Option 1: install as a runnable tool
uv tool install fleet-rlm
fleet webOr in your active environment:
# Option 2: regular environment install
uv pip install fleet-rlm
fleet webOpen http://localhost:8000 in your browser.
fleet web is the primary interactive interface. The published package already includes the built frontend assets, so end users do not need bun or a separate frontend toolchain.
What You Get
- Browser-first RLM chat (
fleet web) - A focused Web UI with
RLM Workspace,Volumes, andSettings - Secure Modal-backed long-context execution for code/doc workflows
- WS-first runtime streaming for chat and execution events
GET /api/v1/auth/meas the canonical frontend identity/bootstrap surface- Multitenant Entra auth with Neon-backed tenant admission when
AUTH_MODE=entra - Runtime configuration and diagnostics from the Web UI settings
- MLflow-backed trace correlation, feedback capture, offline evaluation, and DSPy optimization workflows
- Optional MCP server surface (
fleet-rlm serve-mcp)
Common Commands
# Standalone terminal chat
fleet-rlm chat --trace-mode compact
# Explicit API server
fleet-rlm serve-api --port 8000
# MCP server
fleet-rlm serve-mcp --transport stdio
# Scaffold assets for Claude Code
fleet-rlm init --listRuntime Notes
- The current Web UI shell supports
RLM Workspace,Volumes, andSettings. - Legacy
taxonomy,skills,memory, andanalyticsbrowser routes redirect to the supported surfaces. - Product chat transport is WS-first (
/api/v1/ws/chat). - Frontend identity/bootstrap is
GET /api/v1/auth/me. - Runtime model updates from Settings are hot-applied in-process (
/api/v1/runtime/settings) and reflected on/api/v1/runtime/status. - Secret inputs in Runtime Settings are write-only.
- In
AUTH_MODE=entra, bearer tokens are validated against Entra JWKS and admitted only for active Neon tenants.
Running From Source (Contributors)
# from repo root
uv sync --extra dev --extra server
uv run fleet web
uv run fastapi devFor release/packaging workflows, uv build now runs frontend build sync automatically (requires bun in repo checkouts that include src/frontend).
Use full contributor setup and quality gates in AGENTS.md and CONTRIBUTING.md.
MLflow Workflows
fleet-rlm now supports MLflow as the GenAI tracing and evaluation plane on top of the existing PostHog runtime telemetry.
# from repo root
make mlflow-server
# in another shell
export MLFLOW_ENABLED=true
export MLFLOW_TRACKING_URI=http://127.0.0.1:5000
export MLFLOW_EXPERIMENT=fleet-rlm
uv run fleet web- Live chat turns and offline runner entry points emit MLflow-correlated traces with
mlflow_trace_id/mlflow_client_request_idon final payloads when MLflow is enabled. - Human feedback can be recorded through
POST /api/v1/traces/feedback. - Contributors can export annotated traces, run MLflow GenAI evaluation, and optimize DSPy programs with the scripts documented in
docs/how-to-guides/mlflow-workflows.md.
Architecture Overview
Read this after the quick start if you want the full system picture (entry points, ReAct orchestration, tools, Modal execution, persistent storage).
graph TB
subgraph entry ["🚪 Entry Points"]
CLI["fleet / fleet-rlm CLI"]
WebUI["Web UI<br/>(React SPA)"]
API["FastAPI<br/>(WS/REST)"]
TUI["Ink TUI<br/>(standalone runtime)"]
MCP["MCP Server"]
end
subgraph orchestration ["🧠 Orchestration Layer"]
Agent["RLMReActChatAgent<br/>(dspy.Module)"]
LMs["Planner / Delegate LMs"]
History["Chat History"]
Memory["Core Memory<br/>(Persona/Human/Scratchpad)"]
DocCache["Document Cache"]
end
subgraph tools ["🔧 ReAct Tools"]
DocTools["📄 load_document<br/>read_file_slice<br/>chunk_by_*"]
RecursiveTools["🔄 rlm_query<br/>llm_query<br/>(recursive delegation)"]
ExecTools["⚡ execute_code<br/>edit_file<br/>search_code"]
end
subgraph execution ["⚙️ Execution Layer"]
Interpreter["ModalInterpreter<br/>(JSON protocol)"]
Profiles["Execution Profiles:<br/>ROOT | DELEGATE | MAINTENANCE"]
end
subgraph cloud ["☁️ Cloud & Persistence"]
Sandbox["Modal Sandbox<br/>(Python REPL + Driver)"]
Volume[("💾 Modal Volume<br/>/data/<br/>• workspaces<br/>• docs/metadata")]
Neon[("🐘 Neon Postgres<br/>• runs / steps<br/>• artifacts<br/>• tenants")]
PostHog["📈 PostHog<br/>(LLM Observability)"]
end
WebUI -->|"WS / REST"| API
CLI --> Agent
API --> Agent
TUI --> Agent
MCP --> Agent
Agent --> LMs
Agent --> History
Agent --> Memory
Agent --> DocCache
Agent --> DocTools
Agent --> RecursiveTools
Agent --> ExecTools
API -.->|"Persistence"| Neon
Agent -.->|"Traces"| PostHog
DocTools --> Interpreter
RecursiveTools --> Interpreter
ExecTools --> Interpreter
Interpreter --> Profiles
Interpreter -->|"stdin/stdout<br/>JSON commands"| Sandbox
Sandbox -->|"read/write"| Volume
style entry fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style orchestration fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style tools fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style execution fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style cloud fill:#fce4ec,stroke:#c2185b,stroke-width:2px
Docs and Guides
- Documentation index
- Explanation index
- Quick install + setup
- Configure Modal
- Runtime settings (LM/Modal diagnostics)
- MLflow tracing, feedback, eval, and optimization
- Deploying the server
- Using the MCP server
- Frontend ↔ Backend integration
- CLI reference
- HTTP API reference
- Auth modes
- Database architecture
- Source layout
Advanced Features (Docs-First)
fleet-rlm also supports runtime diagnostics endpoints, WebSocket execution streams (/api/v1/ws/execution), multi-tenant Neon-backed persistence, and opt-in PostHog LLM analytics. Those workflows are documented in the guides/reference docs rather than front-loaded here.
Contributing
Contributions are welcome. Start with CONTRIBUTING.md, then use AGENTS.md for repo-specific commands and quality gates.
License
MIT License — see LICENSE.
Based on Recursive Language Modeling research by Alex L. Zhang (MIT CSAIL), Omar Khattab (Stanford), and Tim Kraska (MIT).