GitHunt
QR

Qredence/fleet-rlm

DSPy's Recursive Language Model (RLM) with Modal Sandbox for secure cloud-based code execution

fleet-rlm

PyPI version

License: MIT
CI
PyPI Downloads

Secure, cloud-sandboxed Recursive Language Models (RLM) with DSPy and Modal.

fleet-rlm gives AI agents a secure cloud sandbox for long-context code and document work, with a Web UI-first experience, recursive delegation, and DSPy-aligned tooling.

Paper | Docs | Contributing


Quick Start

Install and launch the Web UI in under a minute:

# Option 1: install as a runnable tool
uv tool install fleet-rlm
fleet web

Or in your active environment:

# Option 2: regular environment install
uv pip install fleet-rlm
fleet web

Open http://localhost:8000 in your browser.

fleet web is the primary interactive interface. The published package already includes the built frontend assets, so end users do not need bun or a separate frontend toolchain.

What You Get

  • Browser-first RLM chat (fleet web)
  • A focused Web UI with RLM Workspace, Volumes, and Settings
  • Secure Modal-backed long-context execution for code/doc workflows
  • WS-first runtime streaming for chat and execution events
  • GET /api/v1/auth/me as the canonical frontend identity/bootstrap surface
  • Multitenant Entra auth with Neon-backed tenant admission when AUTH_MODE=entra
  • Runtime configuration and diagnostics from the Web UI settings
  • MLflow-backed trace correlation, feedback capture, offline evaluation, and DSPy optimization workflows
  • Optional MCP server surface (fleet-rlm serve-mcp)

Common Commands

# Standalone terminal chat
fleet-rlm chat --trace-mode compact

# Explicit API server
fleet-rlm serve-api --port 8000

# MCP server
fleet-rlm serve-mcp --transport stdio

# Scaffold assets for Claude Code
fleet-rlm init --list

Runtime Notes

  • The current Web UI shell supports RLM Workspace, Volumes, and Settings.
  • Legacy taxonomy, skills, memory, and analytics browser routes redirect to the supported surfaces.
  • Product chat transport is WS-first (/api/v1/ws/chat).
  • Frontend identity/bootstrap is GET /api/v1/auth/me.
  • Runtime model updates from Settings are hot-applied in-process (/api/v1/runtime/settings) and reflected on /api/v1/runtime/status.
  • Secret inputs in Runtime Settings are write-only.
  • In AUTH_MODE=entra, bearer tokens are validated against Entra JWKS and admitted only for active Neon tenants.

Running From Source (Contributors)

# from repo root
uv sync --extra dev --extra server
uv run fleet web
uv run fastapi dev

For release/packaging workflows, uv build now runs frontend build sync automatically (requires bun in repo checkouts that include src/frontend).

Use full contributor setup and quality gates in AGENTS.md and CONTRIBUTING.md.

MLflow Workflows

fleet-rlm now supports MLflow as the GenAI tracing and evaluation plane on top of the existing PostHog runtime telemetry.

# from repo root
make mlflow-server

# in another shell
export MLFLOW_ENABLED=true
export MLFLOW_TRACKING_URI=http://127.0.0.1:5000
export MLFLOW_EXPERIMENT=fleet-rlm
uv run fleet web
  • Live chat turns and offline runner entry points emit MLflow-correlated traces with mlflow_trace_id / mlflow_client_request_id on final payloads when MLflow is enabled.
  • Human feedback can be recorded through POST /api/v1/traces/feedback.
  • Contributors can export annotated traces, run MLflow GenAI evaluation, and optimize DSPy programs with the scripts documented in docs/how-to-guides/mlflow-workflows.md.

Architecture Overview

Read this after the quick start if you want the full system picture (entry points, ReAct orchestration, tools, Modal execution, persistent storage).

graph TB
    subgraph entry ["🚪 Entry Points"]
        CLI["fleet / fleet-rlm CLI"]
        WebUI["Web UI<br/>(React SPA)"]
        API["FastAPI<br/>(WS/REST)"]
        TUI["Ink TUI<br/>(standalone runtime)"]
        MCP["MCP Server"]
    end

    subgraph orchestration ["🧠 Orchestration Layer"]
        Agent["RLMReActChatAgent<br/>(dspy.Module)"]
        LMs["Planner / Delegate LMs"]
        History["Chat History"]
        Memory["Core Memory<br/>(Persona/Human/Scratchpad)"]
        DocCache["Document Cache"]
    end

    subgraph tools ["🔧 ReAct Tools"]
        DocTools["📄 load_document<br/>read_file_slice<br/>chunk_by_*"]
        RecursiveTools["🔄 rlm_query<br/>llm_query<br/>(recursive delegation)"]
        ExecTools["⚡ execute_code<br/>edit_file<br/>search_code"]
    end

    subgraph execution ["⚙️ Execution Layer"]
        Interpreter["ModalInterpreter<br/>(JSON protocol)"]
        Profiles["Execution Profiles:<br/>ROOT | DELEGATE | MAINTENANCE"]
    end

    subgraph cloud ["☁️ Cloud & Persistence"]
        Sandbox["Modal Sandbox<br/>(Python REPL + Driver)"]
        Volume[("💾 Modal Volume<br/>/data/<br/>• workspaces<br/>• docs/metadata")]
        Neon[("🐘 Neon Postgres<br/>• runs / steps<br/>• artifacts<br/>• tenants")]
        PostHog["📈 PostHog<br/>(LLM Observability)"]
    end

    WebUI -->|"WS / REST"| API
    CLI --> Agent
    API --> Agent
    TUI --> Agent
    MCP --> Agent

    Agent --> LMs
    Agent --> History
    Agent --> Memory
    Agent --> DocCache

    Agent --> DocTools
    Agent --> RecursiveTools
    Agent --> ExecTools

    API -.->|"Persistence"| Neon
    Agent -.->|"Traces"| PostHog

    DocTools --> Interpreter
    RecursiveTools --> Interpreter
    ExecTools --> Interpreter

    Interpreter --> Profiles
    Interpreter -->|"stdin/stdout<br/>JSON commands"| Sandbox
    Sandbox -->|"read/write"| Volume

    style entry fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style orchestration fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style tools fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style execution fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style cloud fill:#fce4ec,stroke:#c2185b,stroke-width:2px
Loading

Docs and Guides

Advanced Features (Docs-First)

fleet-rlm also supports runtime diagnostics endpoints, WebSocket execution streams (/api/v1/ws/execution), multi-tenant Neon-backed persistence, and opt-in PostHog LLM analytics. Those workflows are documented in the guides/reference docs rather than front-loaded here.

Contributing

Contributions are welcome. Start with CONTRIBUTING.md, then use AGENTS.md for repo-specific commands and quality gates.

License

MIT License — see LICENSE.

Based on Recursive Language Modeling research by Alex L. Zhang (MIT CSAIL), Omar Khattab (Stanford), and Tim Kraska (MIT).

Languages

TypeScript49.0%Python48.2%CSS2.6%Makefile0.1%JavaScript0.1%Mako0.0%HTML0.0%

Contributors

MIT License
Created February 7, 2026
Updated March 9, 2026
Qredence/fleet-rlm | GitHunt