sattyamjjain/ferrumdeck

Production-grade AgentOps control plane for safe AI agent execution. Dual-plane architecture: Rust governance engine + Python LLM runtime + Next.js dashboard. Deny-by-default policies, budget enforcement, approval gates & audit logging.

agent-framework agentops ai-agents ai-governance control-plane llm-ops mcp mlops nextjs observability opentelemetry production-ai python rust security

FerrumDeck

AgentOps Control Plane — A production-grade platform for running agentic AI workflows with deterministic governance, comprehensive observability, and measurable reliability.

Overview
Key Features
Architecture
Quick Start
Project Structure
Components
API Reference
Configuration
Security Model
Observability
Evaluation Framework
Development
Deployment
License

Overview

FerrumDeck solves the critical challenge of running AI agents safely in production. While LLMs are probabilistic and unpredictable, production systems require deterministic governance, audit trails, and budget controls.

The Problem

AI agents can make costly mistakes (token spend, wrong tool calls)
Prompt injection attacks can bypass safety measures
No visibility into what agents are doing in production
Difficult to reproduce and debug agent failures
Compliance requirements demand audit trails

The Solution

FerrumDeck provides a dual-plane architecture:

Control Plane (Rust)	Data Plane (Python)
Deterministic state	Probabilistic execution
Policy enforcement	LLM interactions
Budget tracking	Tool calls via MCP
Audit logging	Step execution
Approval gates	Artifact storage

Key Features

Governance

Deny-by-Default Tools: Only explicitly allowed tools can be called
Approval Gates: High-risk actions require human approval before execution
Budget Enforcement: Automatic run termination when limits exceeded (tokens, cost, time)
Policy Engine: Configurable rules for tool access and risk management

Observability

OpenTelemetry Integration: Full distributed tracing with GenAI semantic conventions
Cost Tracking: Real-time token counting and cost calculation per run
Jaeger UI: Visual trace exploration and debugging
Audit Trail: Immutable logging of every action for compliance

Reproducibility

Versioned Registry: Agents, tools, and prompts are version-controlled
Step-Level Replay: Debug specific steps with exact inputs
Deterministic IDs: ULID-based identifiers for time-ordered, collision-resistant tracking

Quality

Evaluation Framework: Deterministic test suites for agent workflows
Regression Gating: CI blocks merges if agent quality degrades
Baseline Comparisons: Track performance across versions

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              Clients                                      │
│              (Dashboard / CLI / SDK / CI Pipelines)                      │
└─────────────────────────────────────────────────────────────────────────┘
        │                           │                           │
        ▼                           ▼                           ▼
┌─────────────────┐    ┌──────────────────────────────────────────────────┐
│    DASHBOARD    │    │                CONTROL PLANE (Rust)               │
│   (Next.js)     │    │                                                   │
│                 │    │  ┌───────────┐  ┌──────────┐  ┌──────────────┐   │
│ • Runs Monitor  │◀──▶│  │  Gateway  │  │  Policy  │  │   Registry   │   │
│ • Approvals     │    │  │  (Axum)   │  │  Engine  │  │  (Versioned) │   │
│ • Analytics     │    │  │           │  │          │  │              │   │
│ • Audit Trail   │    │  │ • REST    │  │ • Budget │  │ • Agents     │   │
│ • Evals UI      │    │  │ • SSE     │  │ • Rules  │  │ • Tools      │   │
│                 │    │  │ • Auth    │  │ • Gates  │  │ • Versions   │   │
└─────────────────┘    │  └───────────┘  └──────────┘  └──────────────┘   │
   :3001/:8000         │                                                   │
                       │  ┌───────────┐  ┌──────────┐  ┌──────────────┐   │
                       │  │   Audit   │  │   DAG    │  │    OTEL      │   │
                       │  │    Log    │  │Scheduler │  │    Setup     │   │
                       │  └───────────┘  └──────────┘  └──────────────┘   │
                       └──────────────────────────────────────────────────┘
                                              │
                          ┌───────────────────┼───────────────────┐
                          ▼                   ▼                   ▼
                   ┌───────────────┐   ┌───────────────┐   ┌───────────┐
                   │   PostgreSQL  │   │     Redis     │   │   Jaeger  │
                   │   (pgvector)  │   │    Streams    │   │    UI     │
                   │               │   │               │   │           │
                   │ • runs/steps  │   │ • Job Queue   │   │ • Traces  │
                   │ • agents/tools│   │ • Pub/Sub     │   │ • GenAI   │
                   │ • audit_events│   │               │   │   Spans   │
                   └───────────────┘   └───────┬───────┘   └───────────┘
                        :5433                  │                :16686
                                               ▼
              ┌───────────────────────────────────────────────────────────┐
              │                      DATA PLANE (Python)                    │
              │                                                             │
              │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
              │  │    Worker    │  │     LLM      │  │    MCP Router    │  │
              │  │              │  │   Executor   │  │                  │  │
              │  │ • Poll Queue │  │              │  │ • GitHub MCP     │  │
              │  │ • Execute    │  │ • Claude     │  │ • Filesystem MCP │  │
              │  │ • Report     │  │ • GPT-4      │  │ • Custom Tools   │  │
              │  │ • Retry      │  │ • litellm    │  │ • Policy Checks  │  │
              │  └──────────────┘  └──────────────┘  └──────────────────┘  │
              └───────────────────────────────────────────────────────────┘

Data Flow

Client creates a run via POST /v1/runs
Gateway authenticates, validates, creates run in PostgreSQL
Gateway enqueues first step to Redis Stream
Worker polls Redis, fetches step details from Gateway
Worker executes step (LLM call, tool call, etc.) with tracing
Worker reports result back to Gateway
Gateway updates state, checks budget, enqueues next step
Repeat until run completes or fails

Service Ports

Service	Port	Description
Gateway	`8080`	REST API (Rust control plane)
Dashboard	`3001` / `8000`	Next.js UI (dev) / Static server
PostgreSQL	`5433`	Database (pgvector enabled)
Redis	`6379`	Queue and cache
Jaeger UI	`16686`	Distributed tracing
OTel Collector	`4317` / `4318`	gRPC / HTTP endpoints

Quick Start

Prerequisites

Rust 1.80+ (rustup.rs)
Python 3.12+
Docker & Docker Compose
uv (docs.astral.sh/uv) - Fast Python package manager

1. Clone and Setup

git clone https://github.com/sattyamjjain/ferrumdeck.git
cd ferrumdeck

# Copy environment file
cp .env.example .env

# Start infrastructure (PostgreSQL, Redis, Jaeger)
make dev-up

# Install all dependencies
make install

# Run database migrations
make db-migrate

# Build everything
make build

2. Start Services

# Terminal 1: Start the Gateway (Rust)
make run-gateway
# Gateway running at http://localhost:8080

# Terminal 2: Start a Worker (Python)
make run-worker

3. Create Your First Run

# Create an API key (dev mode)
export API_KEY="fd_dev_key_abc123"

# Create a run
curl -X POST http://localhost:8080/v1/runs \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "agt_safe_pr_agent",
    "input": {
      "task": "Review the latest changes in the repository"
    }
  }'

# Check run status
curl http://localhost:8080/v1/runs/{run_id} \
  -H "Authorization: Bearer $API_KEY"

4. Open the Dashboard

# Start the dashboard (static server)
make run-dashboard
# Open http://localhost:8000

# Or run the Next.js development server
cd nextjs && npm run dev
# Open http://localhost:3001

The dashboard provides a complete UI for:

Monitoring runs in real-time
Approving/rejecting tool calls
Managing agents and tools
Viewing analytics and audit trails

5. View Traces

Open Jaeger UI at http://localhost:16686 to see distributed traces.

Project Structure

ferrumdeck/
├── .github/
│   └── workflows/           # CI/CD pipelines
│       └── ci.yml          # Main CI (lint, test, build, eval gate)
│
├── contracts/               # API Contracts
│   ├── openapi/            # OpenAPI 3.1 specifications
│   │   └── control-plane.openapi.yaml
│   └── jsonschema/         # JSON Schema definitions
│       ├── run.schema.json
│       ├── policy.schema.json
│       ├── tool.schema.json
│       └── workflow.schema.json
│
├── rust/                    # Control Plane (Rust)
│   ├── crates/             # Shared libraries
│   │   ├── fd-core/        # IDs, errors, config, time utilities
│   │   ├── fd-policy/      # Policy engine, budgets, rules
│   │   ├── fd-registry/    # Agent/tool versioning
│   │   ├── fd-audit/       # Audit logging, redaction
│   │   ├── fd-storage/     # PostgreSQL repos + Redis queue
│   │   ├── fd-dag/         # DAG scheduler
│   │   └── fd-otel/        # OpenTelemetry setup
│   └── services/
│       └── gateway/        # Axum HTTP API service
│
├── python/                  # Data Plane (Python)
│   └── packages/
│       ├── fd-runtime/     # Workflow execution, tracing, client
│       ├── fd-worker/      # Queue consumer, step execution
│       ├── fd-mcp-router/  # MCP tool routing with policy checks
│       ├── fd-mcp-tools/   # MCP server implementations (git, test runner)
│       ├── fd-cli/         # Command-line interface
│       └── fd-evals/       # Evaluation framework with scorers
│
├── nextjs/                  # Dashboard (Next.js 16.1)
│   ├── src/
│   │   ├── app/            # App Router pages
│   │   │   └── (dashboard)/ # Dashboard route group
│   │   │       ├── runs/       # Run monitoring & detail
│   │   │       ├── approvals/  # Approval queue
│   │   │       ├── agents/     # Agent registry
│   │   │       ├── tools/      # Tool registry
│   │   │       ├── workflows/  # Workflow management
│   │   │       ├── analytics/  # Usage charts
│   │   │       ├── audit/      # Audit trail viewer
│   │   │       ├── evals/      # Evaluation results
│   │   │       ├── policies/   # Policy management
│   │   │       ├── logs/       # Container logs
│   │   │       └── settings/   # API keys & config
│   │   ├── components/     # React components (shadcn/ui)
│   │   ├── hooks/          # Custom React hooks
│   │   ├── lib/            # API client, utilities
│   │   └── types/          # TypeScript interfaces
│   └── Dockerfile          # Multi-stage production build
│
├── evals/                   # Evaluation Suite
│   ├── suites/             # Test suite definitions (YAML)
│   │   ├── smoke.yaml      # Quick smoke tests
│   │   └── regression.yaml # Full regression suite
│   ├── datasets/           # Test datasets
│   ├── agents/             # Agent configs for testing
│   ├── scorers/            # Scorer configurations
│   └── reports/            # Generated reports (gitignored)
│
├── examples/                # Example Agents
│   └── safe-pr-agent/      # PR review agent example
│       ├── agent.yaml      # Agent configuration
│       └── workflow.yaml   # Multi-step workflow
│
├── deploy/
│   └── docker/
│       ├── compose.dev.yaml    # Local development stack
│       ├── Dockerfile.gateway  # Gateway Docker build
│       └── Dockerfile.worker   # Worker Docker build
│
├── config/
│   └── mcp-config.json     # MCP server configuration
│
├── observability/
│   └── otel/
│       └── collector.yaml  # OTel Collector configuration
│
├── docs/                    # Documentation
│   ├── architecture/       # System design docs
│   ├── adr/                # Architecture decisions
│   ├── security/           # Security documentation
│   └── runbooks/           # Operational guides
│
├── Cargo.toml              # Rust workspace manifest
├── pyproject.toml          # Python workspace manifest (uv)
├── Makefile                # Development commands
└── .env.example            # Environment template

Components

Control Plane (Rust)

fd-core — Foundation Primitives

Type-safe IDs, error handling, and configuration.

ID System (ULID-based with prefixes):

TenantId     // ten_01ARZ3NDEKTSV4RRFFQ69G5FAV
AgentId      // agt_01ARZ3NDEKTSV4RRFFQ69G5FAV
RunId        // run_01ARZ3NDEKTSV4RRFFQ69G5FAV
StepId       // stp_01ARZ3NDEKTSV4RRFFQ69G5FAV
PolicyRuleId // pol_01ARZ3NDEKTSV4RRFFQ69G5FAV

Error Types:

NotFound, Validation, Unauthorized, Forbidden
PolicyDenied, BudgetExceeded, ApprovalRequired
Database, Queue, ExternalService, Internal

fd-policy — Policy Engine

Governance rules enforcement with deny-by-default security.

Tool Allowlist:

pub struct ToolAllowlist {
    allowed_tools: Vec<String>,      // Explicitly allowed
    approval_required: Vec<String>,  // Require human approval
    denied_tools: Vec<String>,       // Explicitly denied
}
// Priority: Denied > Approval Required > Allowed > Default Deny

Budget System:

pub struct Budget {
    max_input_tokens: Option<u64>,   // Default: 100,000
    max_output_tokens: Option<u64>,  // Default: 50,000
    max_total_tokens: Option<u64>,   // Default: 150,000
    max_tool_calls: Option<u32>,     // Default: 50
    max_wall_time_ms: Option<u64>,   // Default: 5 minutes
    max_cost_cents: Option<u64>,     // Default: $5.00
}

Tool Risk Levels:

Level	Description	Examples
Low	Read-only operations	read_file, list_directory
Medium	Limited mutations	write_file (with approval)
High	External communications	send_email, create_pr
Critical	Security-sensitive	deploy, payment, delete

fd-registry — Versioned Registry

Immutable, version-controlled storage for agents and tools.

// Agent versions are immutable - changes require new versions
pub struct AgentVersion {
    id: AgentVersionId,
    agent_id: AgentId,
    version: String,           // Semantic version: "1.2.3"
    system_prompt: String,
    model: String,             // "claude-sonnet-4-20250514"
    allowed_tools: Vec<String>,
    model_params: Value,       // temperature, max_tokens, etc.
    changelog: String,
}

fd-storage — Database & Queue

PostgreSQL repositories with SQLx compile-time checked queries:

RunsRepo, StepsRepo, AgentsRepo, ToolsRepo
PoliciesRepo, ApiKeysRepo, AuditRepo, WorkflowsRepo

Redis Streams for reliable job queuing:

Consumer groups for horizontal scaling
Automatic acknowledgment and retry
Message format: StepJob with context

fd-audit — Audit Trail

Append-only, immutable event logging:

Run creation/completion
Tool calls (allowed/denied)
Policy decisions
Approval resolutions
API key usage

Gateway Service

Axum-based HTTP API with middleware:

Authentication: API keys (SHA256 hashed) or OAuth2 JWT
Rate Limiting: Per-tenant request limiting
Request ID: X-Request-ID for distributed tracing

Data Plane (Python)

fd-runtime — Runtime Primitives

Models:

class RunStatus(Enum):
    CREATED, QUEUED, RUNNING, WAITING_APPROVAL,
    COMPLETED, FAILED, BUDGET_KILLED, POLICY_BLOCKED

class StepType(Enum):
    LLM, TOOL, RETRIEVAL, SANDBOX, APPROVAL

class Budget(BaseModel):
    max_input_tokens: int = 100_000
    max_output_tokens: int = 50_000
    max_total_tokens: int = 150_000
    max_tool_calls: int = 50
    max_wall_time_ms: int = 300_000  # 5 minutes
    max_cost_cents: int = 500        # $5.00

Control Plane Client:

client = ControlPlaneClient(base_url, api_key)
run = await client.create_run(agent_id, input_data)
await client.submit_step_result(run_id, step_id, output, status)

Tracing (GenAI Semantic Conventions):

with trace_llm_call(model="claude-sonnet-4", run_id=run.id) as span:
    response = await llm.complete(messages)
    set_llm_response_attributes(span, response)
    # Automatically tracks: tokens, cost, latency

fd-worker — Step Executor

Queue consumer that executes individual steps:

async def run_worker():
    consumer = RedisQueueConsumer(redis_url)
    executor = StepExecutor(
        control_plane_url,
        api_key,
        mcp_servers=load_mcp_config(),
        tool_allowlist=allowlist,
    )

    while running:
        job = await consumer.poll()
        if job:
            await executor.execute(job)

Retry Strategy (exponential backoff):

@retry(
    retry=retry_if_exception_type(RETRYABLE_EXCEPTIONS),
    stop=stop_after_attempt(3),
    wait=wait_exponential(min=1000, max=30000)
)
async def execute_with_retry(step):
    ...

fd-mcp-router — Tool Router

Deny-by-default MCP tool routing:

class MCPRouter:
    async def call_tool(self, tool_name: str, args: dict) -> ToolResult:
        # 1. Check allowlist (deny-by-default)
        status = self.allowlist.check(tool_name)
        if status == "denied":
            return ToolResult(success=False, error="Tool not allowed")
        if status == "requires_approval":
            # Pause and wait for human approval
            ...

        # 2. Find server and execute
        server = self.find_server(tool_name)
        return await server.call(tool_name, args)

Supported MCP Servers:

GitHub (@modelcontextprotocol/server-github)
Filesystem (@modelcontextprotocol/server-filesystem)
Custom servers (stdio or HTTP-based)

fd-cli — Command Line Interface

# Runs
fd run create --agent agt_xxx --input '{"task": "..."}'
fd run status <run_id>
fd run logs <run_id> --follow

# Registry
fd agent list
fd agent get <agent_id>
fd tool list

# Approvals
fd approval list
fd approval approve <approval_id>
fd approval reject <approval_id> --reason "..."

# Evaluations
fd eval run --dataset evals/datasets/safe-pr-agent.jsonl
fd eval report --output reports/latest.html

fd-evals — Evaluation Framework

Deterministic testing for agent workflows:

runner = EvalRunner(
    scorers=[
        FilesChangedScorer(),
        PRCreatedScorer(),
        TestPassScorer(),
        LintScorer(),
    ],
    control_plane_url=url,
)

summary = runner.run_eval(
    dataset_path="evals/datasets/safe-pr-agent.jsonl",
    agent_id="agt_safe_pr_agent",
    max_tasks=20,
)
# Returns: pass_rate, avg_score, cost_per_task, regressions

fd-mcp-tools — MCP Server Implementations

Built-in MCP tool servers for common operations:

# Git operations server
from fd_mcp_tools import GitMCPServer

# Test runner server
from fd_mcp_tools import TestRunnerMCPServer

Dashboard (Next.js)

A professional admin UI built with Next.js 16.1.1, React 19.2, and Tailwind CSS 4.

Key Pages

Page	Description
`/overview`	Dashboard home with key metrics and recent activity
`/runs`	Real-time run monitoring with step timeline visualization
`/runs/{runId}`	Detailed run view with step-by-step execution trace
`/approvals`	Approval queue with approve/reject actions
`/agents`	Agent registry with version management
`/tools`	Tool registry and MCP server status
`/workflows`	Multi-step workflow definitions and runs
`/analytics`	Usage charts, cost tracking, performance metrics
`/audit`	Immutable audit trail viewer with filtering
`/evals`	Evaluation suite results and comparisons
`/policies`	Policy configuration and management
`/threats`	Security threat detection and monitoring
`/logs`	Container and service logs viewer
`/settings`	API key management and configuration

Technology Stack

Next.js 16.1.1      # App Router with standalone output
React 19.2.3        # Concurrent features, Server Components
Tailwind CSS 4      # Utility-first styling with dark theme
TanStack Query 5    # Server state with polling (2-3s intervals)
TanStack Table 8    # Data tables with sorting/filtering
Radix UI            # Accessible component primitives
shadcn/ui           # Pre-built component library
Recharts 3          # Analytics visualizations
nuqs 2              # URL state management
sonner 2            # Toast notifications

Running the Dashboard

# Development (hot reload)
cd nextjs && npm install && npm run dev
# Open http://localhost:3001

# Production build
npm run build
npm start  # Runs on port 3001

# Static dashboard (simple HTTP server)
make run-dashboard
# Open http://localhost:8000

# Docker
docker build -t ferrumdeck-dashboard nextjs/
docker run -p 3001:3001 \
  -e GATEWAY_URL=http://gateway:8080 \
  -e FD_API_KEY=fd_dev_key_abc123 \
  ferrumdeck-dashboard

Environment Variables

GATEWAY_URL=http://localhost:8080     # Control plane URL
FD_API_KEY=fd_dev_key_abc123          # API key for authentication
NEXT_PUBLIC_POLL_INTERVAL=2000        # Polling interval (ms)

API Proxy (BFF Pattern)

The dashboard proxies all API calls through /api/v1/* routes:

// src/app/api/v1/[...path]/route.ts
// Forwards requests to GATEWAY_URL with authentication

API Reference

Authentication

All API requests require authentication via Authorization header:

# API Key
Authorization: Bearer fd_tenant_abc123xyz

# Or OAuth2 JWT
Authorization: Bearer eyJhbGciOiJSUzI1NiIs...

Endpoints

Runs

Method	Endpoint	Description
POST	`/v1/runs`	Create a new run
GET	`/v1/runs`	List runs with filtering
GET	`/v1/runs/{runId}`	Get run details
POST	`/v1/runs/{runId}/cancel`	Cancel a running run
GET	`/v1/runs/{runId}/steps`	List steps in a run
POST	`/v1/runs/{runId}/steps/{stepId}`	Submit step result (worker)
POST	`/v1/runs/{runId}/check-tool`	Check tool policy before execution

Registry

Method	Endpoint	Description
GET	`/v1/registry/agents`	List agents
POST	`/v1/registry/agents`	Create agent
GET	`/v1/registry/agents/{agentId}`	Get agent details
GET	`/v1/registry/agents/{agentId}/versions`	List agent versions
POST	`/v1/registry/agents/{agentId}/versions`	Create agent version
GET	`/v1/registry/agents/{agentId}/stats`	Get agent statistics
GET	`/v1/registry/tools`	List tools
POST	`/v1/registry/tools`	Create tool
GET	`/v1/registry/tools/{toolId}`	Get tool details
GET	`/v1/registry/mcp-servers`	List MCP servers

Approvals

Method	Endpoint	Description
GET	`/v1/approvals`	List pending approvals
PUT	`/v1/approvals/{approvalId}`	Approve or reject

Policies

Method	Endpoint	Description
GET	`/v1/policies`	List policies
POST	`/v1/policies`	Create policy
GET	`/v1/policies/{policyId}`	Get policy details
PATCH	`/v1/policies/{policyId}`	Update policy
DELETE	`/v1/policies/{policyId}`	Delete policy

API Keys

Method	Endpoint	Description
GET	`/v1/api-keys`	List API keys
GET	`/v1/api-keys/{keyId}`	Get API key details
POST	`/v1/api-keys/{keyId}/revoke`	Revoke an API key

Workflows

Method	Endpoint	Description
POST	`/v1/workflows`	Create workflow
GET	`/v1/workflows`	List workflows
GET	`/v1/workflows/{workflowId}`	Get workflow
GET	`/v1/workflows/{workflowId}/runs`	List workflow runs
POST	`/v1/workflow-runs`	Execute workflow
GET	`/v1/workflow-runs/{runId}`	Get execution status
POST	`/v1/workflow-runs/{runId}/cancel`	Cancel workflow run
GET	`/v1/workflow-runs/{runId}/executions`	List step executions
POST	`/v1/workflow-runs/{runId}/executions`	Create step execution
POST	`/v1/workflow-runs/{runId}/executions/{executionId}`	Submit step result

Health & Documentation

Method	Endpoint	Description
GET	`/health`	Liveness probe
GET	`/ready`	Readiness probe
GET	`/docs`	Swagger UI documentation
GET	`/api-docs/openapi.json`	OpenAPI specification

Example: Create a Run

curl -X POST http://localhost:8080/v1/runs \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "agt_safe_pr_agent",
    "input": {
      "task": "Review PR #123 in repo owner/repo",
      "repository": "owner/repo",
      "pr_number": 123
    },
    "config": {
      "budget": {
        "max_total_tokens": 50000,
        "max_cost_cents": 100
      }
    }
  }'

Response:

{
  "id": "run_01ARZ3NDEKTSV4RRFFQ69G5FAV",
  "agent_id": "agt_safe_pr_agent",
  "status": "queued",
  "created_at": "2024-12-24T10:00:00Z"
}

Configuration

Environment Variables

Create a .env file from .env.example:

# ============================================
# Application
# ============================================
FERRUMDECK_ENV=development
FERRUMDECK_LOG_LEVEL=debug
FERRUMDECK_LOG_FORMAT=pretty  # or "json" for production

# ============================================
# Gateway
# ============================================
GATEWAY_HOST=0.0.0.0
GATEWAY_PORT=8080
GATEWAY_WORKERS=4

# ============================================
# Database (PostgreSQL)
# ============================================
DATABASE_URL=postgres://ferrumdeck:ferrumdeck@localhost:5433/ferrumdeck
DATABASE_MAX_CONNECTIONS=20
DATABASE_MIN_CONNECTIONS=5

# ============================================
# Queue (Redis)
# ============================================
REDIS_URL=redis://localhost:6379
REDIS_QUEUE_PREFIX=fd:queue:

# ============================================
# LLM Providers
# ============================================
ANTHROPIC_API_KEY=sk-ant-api03-xxx
OPENAI_API_KEY=sk-xxx
DEFAULT_MODEL=claude-sonnet-4-20250514

# ============================================
# OpenTelemetry
# ============================================
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_SERVICE_NAME=ferrumdeck
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=1.0

# ============================================
# Worker
# ============================================
FD_API_KEY=fd_dev_key_abc123
CONTROL_PLANE_URL=http://localhost:8080
WORKER_CONCURRENCY=4
WORKER_MAX_RETRIES=3

# ============================================
# OAuth2 (Optional)
# ============================================
OAUTH2_ENABLED=false
OAUTH2_JWKS_URI=https://your-provider/.well-known/jwks.json
OAUTH2_ISSUER=https://your-provider/
OAUTH2_AUDIENCE=api://ferrumdeck
OAUTH2_TENANT_CLAIM=tenant_id

MCP Server Configuration

Configure MCP servers in config/mcp-servers.json:

{
  "servers": [
    {
      "name": "github",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
      }
    },
    {
      "name": "filesystem",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/workspace"]
    }
  ],
  "allowlist": {
    "allowed": [
      "read_file", "list_directory", "search_files",
      "get_file_contents", "list_commits", "get_pull_request"
    ],
    "approval_required": [
      "write_file", "create_file", "create_pull_request",
      "create_issue", "push_files"
    ],
    "denied": [
      "delete_file", "delete_branch", "merge_pull_request"
    ]
  }
}

Security Model

Defense in Depth

FerrumDeck implements multiple security layers:

┌─────────────────────────────────────────────────────────┐
│ Layer 1: Authentication                                  │
│   • API Keys (SHA256 hashed, scoped)                    │
│   • OAuth2/JWT with tenant claims                       │
├─────────────────────────────────────────────────────────┤
│ Layer 2: Deny-by-Default Tools                          │
│   • Explicit allowlist required                         │
│   • Risk level classification                           │
│   • Per-agent tool restrictions                         │
├─────────────────────────────────────────────────────────┤
│ Layer 3: Budget Enforcement                             │
│   • Token limits (input, output, total)                 │
│   • Cost limits (in cents)                              │
│   • Time limits (wall clock)                            │
│   • Automatic run termination                           │
├─────────────────────────────────────────────────────────┤
│ Layer 4: Approval Gates                                 │
│   • Human-in-the-loop for sensitive actions             │
│   • Configurable per tool                               │
│   • Timeout with auto-rejection                         │
├─────────────────────────────────────────────────────────┤
│ Layer 5: Audit Trail                                    │
│   • Immutable event logging                             │
│   • Every action recorded                               │
│   • Compliance-ready                                    │
└─────────────────────────────────────────────────────────┘

Threat Model

Assumption: Prompt injection cannot be fully prevented.

Strategy: Containment, not prevention.

Threat	Mitigation
Malicious tool calls	Deny-by-default allowlist
Token exhaustion	Budget limits with auto-kill
Data exfiltration	Allowlist blocks unauthorized tools
Privilege escalation	Scoped API keys, tenant isolation
Audit tampering	Append-only, immutable logging

Observability

OpenTelemetry Integration

FerrumDeck uses OpenTelemetry with GenAI semantic conventions:

Tracked Attributes:

gen_ai.system              = "anthropic" | "openai"
gen_ai.request.model       = "claude-sonnet-4-20250514"
gen_ai.usage.input_tokens  = 1234
gen_ai.usage.output_tokens = 5678
gen_ai.usage.cost_usd      = 0.0234

ferrumdeck.run.id          = "run_xxx"
ferrumdeck.step.id         = "stp_xxx"
ferrumdeck.agent.id        = "agt_xxx"
ferrumdeck.tenant.id       = "ten_xxx"

Jaeger UI

Access traces at http://localhost:16686:

Search by run ID, agent ID, or error status
View step execution timeline
Analyze token usage and costs
Debug failures with full context

Cost Tracking

Automatic cost calculation based on model pricing:

Model	Input ($/1M)	Output ($/1M)
claude-opus-4	$15.00	$75.00
claude-sonnet-4	$3.00	$15.00
gpt-4o	$2.50	$10.00
gpt-4o-mini	$0.15	$0.60

Example Agents

Safe PR Agent

A flagship example demonstrating FerrumDeck's governance features. Located in examples/safe-pr-agent/.

Agent Configuration (agent.yaml):

name: safe-pr-agent
description: |
  Reads a repository, analyzes code, proposes changes,
  runs tests in sandbox, and creates a pull request.
  Every action is permissioned, traced, and cost-accounted.

default_model: claude-sonnet-4-20250514

# Read-only tools allowed by default
allowed_tools:
  - read_file
  - list_files
  - search_code

# These require human approval
approval_required_tools:
  - write_file
  - create_pr

# Governance limits
budget:
  max_input_tokens: 50000
  max_output_tokens: 20000
  max_tool_calls: 30
  max_wall_time_ms: 180000  # 3 minutes
  max_cost_cents: 100       # $1

Create Your Own Agent:

# Copy the example
cp -r examples/safe-pr-agent examples/my-agent

# Edit the configuration
vim examples/my-agent/agent.yaml

# Register with the control plane
curl -X POST http://localhost:8080/v1/registry/agents \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d @examples/my-agent/agent.yaml

Evaluation Framework

Running Evaluations

# Run full evaluation suite
./scripts/run-evals.sh

# Run specific dataset
fd eval run \
  --dataset evals/datasets/safe-pr-agent.jsonl \
  --agent agt_safe_pr_agent \
  --output evals/reports/latest.json

# Compare against baseline
fd eval compare \
  --baseline evals/reports/baseline.json \
  --current evals/reports/latest.json

Evaluation Dataset Format

{"task_id": "pr-review-001", "input": {"task": "Review PR #1"}, "expected": {"files_changed": true}}
{"task_id": "pr-review-002", "input": {"task": "Review PR #2"}, "expected": {"files_changed": true}}

CI Integration

Evaluations run automatically on PRs to main:

# .github/workflows/evals.yml
- name: Run evaluations
  run: fd eval run --suite smoke --parallel 4

- name: Check for regressions
  run: |
    if [ $(jq '.pass_rate' report.json) -lt 80 ]; then
      echo "Eval gate FAILED: Pass rate below 80%"
      exit 1
    fi

Development

Prerequisites

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install uv (Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install Docker
# See: https://docs.docker.com/get-docker/

Common Commands

# Start development infrastructure
make dev-up

# Stop infrastructure
make dev-down

# Install all dependencies
make install

# Build everything
make build

# Run all tests
make test

# Format code
make fmt

# Lint code
make lint

# Run full CI checks locally
make check

# Run database migrations
make db-migrate

# Start gateway
make run-gateway

# Start worker
make run-worker

Running Tests

# All tests
make test

# Rust tests
cargo test --workspace

# Python tests
uv run pytest python/packages/fd-evals/tests/ -v
uv run pytest python/packages/fd-worker/tests/ -v

# Specific package
cargo test -p fd-policy
uv run pytest python/packages/fd-runtime

# With coverage
cargo tarpaulin --out Html
uv run pytest --cov=fd_runtime --cov-report=html

# Next.js type checking
cd nextjs && npx tsc --noEmit

Code Quality

# All checks
make check

# Rust
cargo fmt --all -- --check
cargo clippy --workspace --all-targets -- -D warnings

# Python
uv run ruff check python/
uv run ruff format --check python/
uv run pyright python/

# Next.js
cd nextjs && npm run lint

Deployment

Production Checklist

Docker Deployment

# Build all images
docker build -t ferrumdeck-gateway -f deploy/docker/Dockerfile.gateway .
docker build -t ferrumdeck-worker -f deploy/docker/Dockerfile.worker .
docker build -t ferrumdeck-dashboard nextjs/

# Run with Docker Compose (development)
docker compose --env-file .env -f deploy/docker/compose.dev.yaml up -d

# Services will be available at:
#   Gateway:   http://localhost:8080
#   Dashboard: http://localhost:3001
#   Jaeger:    http://localhost:16686

Kubernetes

Helm charts coming soon. For now, use the Docker images with your preferred orchestration.

Minimum resources per service:

Gateway: 512MB RAM, 0.5 CPU
Worker: 1GB RAM, 1 CPU (scales horizontally)
Dashboard: 256MB RAM, 0.25 CPU

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run tests (make check)
Commit (git commit -m 'Add amazing feature')
Push (git push origin feature/amazing-feature)
Open a Pull Request

Code Style

Rust: Follow rustfmt defaults, clippy warnings as errors
Python: Follow ruff rules (see pyproject.toml), pyright type checking
TypeScript: ESLint with Next.js config
Commits: Use conventional commits (feat:, fix:, docs:, etc.)

See AGENTS.md for detailed coding guidelines and single-test commands.

License

Apache-2.0 — see LICENSE for details.

Acknowledgments

Rust Control Plane:

Axum — Web framework
SQLx — Async SQL with compile-time checks
Tower — Middleware framework
Tokio — Async runtime

Python Data Plane:

litellm — Unified LLM interface
MCP — Model Context Protocol
Pydantic — Data validation
Tenacity — Retry with backoff

Dashboard:

Next.js — React framework
Tailwind CSS — Utility-first CSS
shadcn/ui — Component library
TanStack Query — Server state management
Radix UI — Accessible primitives
Recharts — Chart library

Observability:

OpenTelemetry — Tracing framework
Jaeger — Distributed tracing UI

Languages

TypeScript47.4%Python27.2%Rust19.7%HTML1.5%PLpgSQL1.3%CSS1.3%Shell1.0%Makefile0.3%JavaScript0.1%Dockerfile0.0%

Contributors

Apache License 2.0

Created December 23, 2025

Updated February 14, 2026

sattyamjjain/ferrumdeck

FerrumDeck

Table of Contents

Overview

The Problem

The Solution

Key Features

Governance

Observability

Reproducibility

Quality

Architecture

Data Flow

Service Ports

Quick Start

Prerequisites

1. Clone and Setup

2. Start Services

3. Create Your First Run

4. Open the Dashboard

5. View Traces

Project Structure

Components

Control Plane (Rust)

fd-core — Foundation Primitives

fd-policy — Policy Engine

fd-registry — Versioned Registry

fd-storage — Database & Queue

fd-audit — Audit Trail

Gateway Service

Data Plane (Python)

fd-runtime — Runtime Primitives

fd-worker — Step Executor

fd-mcp-router — Tool Router

fd-cli — Command Line Interface

fd-evals — Evaluation Framework

fd-mcp-tools — MCP Server Implementations

Dashboard (Next.js)

Key Pages

Technology Stack

Running the Dashboard

Environment Variables

API Proxy (BFF Pattern)

API Reference

Authentication

Endpoints

Runs

Registry

Approvals

Policies

API Keys

Workflows

Health & Documentation

Example: Create a Run

Configuration

Environment Variables

MCP Server Configuration

Security Model

Defense in Depth

Threat Model

Observability

OpenTelemetry Integration

Jaeger UI

Cost Tracking

Example Agents

Safe PR Agent

Evaluation Framework

Running Evaluations

Evaluation Dataset Format

CI Integration

Development

Prerequisites

Common Commands

Running Tests

Code Quality

Deployment

Production Checklist

Docker Deployment

Kubernetes

Contributing