Claude Code Orchestra

An orchestration layer for Claude Code that routes tasks to the right LLM, enforces safety rules via hooks, and keeps context across sessions. Built for my own workflow managing seven production services -- sharing it because the patterns are reusable.

What This Does (30-Second Version)

I open an issue like "add retry logic to the payment API." The system:

Plans -- an architect agent scopes the work and picks the right tools
Implements -- a code agent writes the changes using GPT-5.2 Codex
Validates -- hooks block the commit until schema is verified, tests pass, and imports trace back to entry points
Reviews -- a judge agent runs a hostile review looking for dead code, security issues, and missed edge cases

I can stop or override at each step. Nothing ships without my approval.

What's Inside

Component	What It Does
Skills (~24)	Workflow templates: deploy, debug, design-to-code, pipeline repair, anti-perfectionism gate
Hooks (~26)	Pre/post validation: blocks SQL without schema query, blocks commits with orphan files, requires proof before "done"
Rules (6)	Domain modules: DB safety, code quality, visual validation, Azure deploy, voice agent tuning
Agents (11)	Specialized roles: architect, coder, judge, researcher, reasoning specialist
MCP Servers (7)	LLM integrations: Gemini, GPT-5, Grok, Perplexity, DeepSeek, Playwright, LunarCrush
Capabilities (68)	Task-to-tool mappings with cost and latency metadata

Architecture

User Request
    |
    v
[Intent Detection] --> Capability Registry (68 entries)
    |
    v
[Agent Selection] --> architect-planner | code-worker | code-judge
    |                  research-specialist | reasoning-specialist
    |                  realtime-specialist | gemini-specialist
    v
[MCP Tool Dispatch] --> Gemini 3 Pro | GPT-5.2 | Grok 4 | Perplexity | DeepSeek
    |
    v
[Hook Validation] --> Pre-tool gates | Post-tool verification | Session lifecycle
    |
    v
[Output with Proof] --> Tests passed | Screenshots taken | API responses verified

Quick Start

# Clone and copy into your Claude Code config
git clone https://github.com/YOUR_USERNAME/claude-code-orchestra.git
cp -r claude-code-orchestra/.claude/* ~/.claude/

# Or cherry-pick what you need:
cp claude-code-orchestra/.claude/rules/code-quality.md ~/.claude/rules/
cp claude-code-orchestra/.claude/hooks/schema-verify.sh ~/.claude/hooks/

The 12 Hard Rules

Every rule exists because something broke in production. Rules with hooks are actively enforced -- not just documented, but prevented:

#	Rule	Hook	What Happens
1	NO mock/fake data	--	Show real errors or "NOT CONNECTED"
2	NO claiming "done" without proof	`stop-verify.sh`	Must show tests, screenshots, API responses
3	NO SQL against assumed schema	`schema-verify.sh`	Must query information_schema first
4	NO orphan files in commits	`dead-code-check.sh`	Must trace import path from entry point
5	NO bypassing debug	`debug-first.sh`	Must read logs before rewriting
6	NO verifying before pipeline completes	`deploy-gate.sh`	Must wait for CI/CD
7	NO cross-project DB access	--	Check pwd first
8	NO pushing without CI	--	Pipeline must pass
9	NO hardcoded credentials	--	Key Vault/env vars only
10	NO destructive queries without WHERE	--	Safety gate
11	Understand before changing	--	Read status, search patterns, map deps
12	Generate options; human decides	--	Present 2-3 approaches

Skills Library (24)

Skills are autonomous workflow templates that combine agents, MCPs, and validation gates.

Core Workflow Skills

Skill	Lines	What It Does
`/multi-model-debate`	227	6-model council (GPT-5, Gemini, Grok, DeepSeek, Claude, Perplexity) with 5 rounds of cross-critique
`/orchestrator`	571	Planner -> Implementer -> Verifier flow with iteration loops
`/enforce-capabilities`	242	Auto-enriches plans with proper agent/skill/MCP usage
`/smart-router`	285	Intent-aware routing to optimal agent based on task context

Development Skills

Skill	Lines	What It Does
`/frontend`	3,869	Design-to-code from screenshots, Figma, or specs
`/fix-pipeline`	199	Auto-diagnose and repair CI/CD failures
`/scrap-reimplement`	140	Destructive recovery after 3+ failed fix attempts
`/pre-mortem`	176	Risk assessment before risky tasks
`/ship-it`	199	Anti-perfectionism gate -- declare "good enough"

Operations Skills

Skill	Lines	What It Does
`/end-of-session`	651	Handover docs, git sync, session persistence
`/learning-loop`	393	Extract patterns, update decisions across sessions
`/morning-update`	198	Daily status rollup, blockers, session scan

Hook System (26)

Hooks enforce rules at tool boundaries. They run as shell scripts before/after every tool call.

Pre-Tool Hooks (Prevent Bad States)

schema-verify.sh    -> Blocks SQL until schema is queried
dead-code-check.sh  -> Blocks commits with orphan files
debug-first.sh      -> Blocks rewrites until logs are read
deploy-gate.sh      -> Blocks verification until pipeline completes
capability-enforcer  -> Validates tool availability before use

Post-Tool Hooks (Validate and Learn)

stop-verify.sh      -> Requires proof before session end
quality-validation   -> Auto-lint, type-check after code generation
test-result-tracker  -> Tracks test pass/fail ratio over time
visual-verify.sh     -> Screenshot validation for UI changes

Session Lifecycle Hooks

session-start-enhanced.sh -> Loads git context, project state, memory
pre-compact-save.sh       -> Saves critical state before context compaction
session-end-learning.sh   -> Extracts learnings, updates patterns

Agent System (11 Consolidated)

Originally 26 agents, consolidated to 11 after discovering that specialist agents with clear MCP focus outperform narrow single-purpose agents:

Agent	Role	MCP Focus
`architect-planner`	Design plans, first-principles reasoning	Gemini (thinking=high)
`code-worker`	Execute plans, multi-file coding	GPT-5.2 Codex
`code-judge`	Hostile review, dead code audit	Azure + Grok
`research-specialist`	Academic, SEC, geo-based research	Perplexity
`reasoning-specialist`	Math, algorithms, complex logic	DeepSeek V3.2
`realtime-specialist`	X/Twitter, trending, social intel	Grok
`gemini-specialist`	Multimodal: vision, PDFs, video	Gemini 3 Pro
`azure-devops-specialist`	CI/CD pipelines, infrastructure	Azure CLI
`worktree-specialist`	Git branching, parallel dev	Git
`cleanup-specialist`	Archive, refactor, tech debt	Code analysis
`voice-specialist`	Voice AI, TTS tuning, SSML	ElevenLabs

MCP Integrations (7)

MCP Server	Key Tools	Use Case
Gemini 3 Pro	vision, image gen, deep research, search	Multimodal analysis, document parsing
Azure AI Foundry	GPT-5.2, GPT-5 Pro, DeepSeek V3.2	Code generation, brainstorming, reasoning
Grok 4	chat, code, X/Twitter search, social pulse	Real-time data, social intelligence
Perplexity	research, reason, search	Evidence-based research with citations
Playwright	browser automation, screenshots	Visual testing, authenticated browsing
ElevenLabs	TTS, STT, conversational AI	Voice agent development
LunarCrush	crypto social metrics	Sentiment analysis, social dominance

Capability Registry

The capabilities-registry.json is the brain of the system -- 68 entries mapping task patterns to optimal tools:

{
  "id": "codex-builder",
  "name": "Codex Builder (GPT-5.2)",
  "triggers": ["build feature", "implement", "refactor"],
  "mcp": "azure-ai-foundry",
  "model": "gpt-5.2-codex",
  "cost_tier": "high",
  "latency": "medium"
}

Plans are automatically enriched with capability annotations before execution:

3. Implement OAuth2 flow
   -> Agent: codex-builder
   -> Skills: azure-unified
   -> MCP: azure-ai-foundry, memory
   -> Confidence: 0.85

Rules System (6 Domain Modules)

Rules load on-demand based on task context:

Rule Module	Trigger	Key Enforcements
`code-quality.md`	new file, refactor, commit	Schema-first SQL, serialization tests, bias detection
`db-safety.md`	SQL, migration, database	Cross-DB isolation, pre-query verification, safe migrations
`visual-validation.md`	screenshot, UI, design	Playwright + Gemini validation, B2B SaaS standards
`azure-deploy.md`	deploy, pipeline, Azure	Post-push verification, rollback procedures
`voice-agent-tuning.md`	voice, ElevenLabs, TTS	Arabic/Hebrew TTS, 3-lever humanization, SSML rules
`project-config.md`	project setup, workspace	Session lifecycle, context management, FPF-Lite reasoning

Where It's Used

In daily use across seven services I maintain -- Azure Function Apps for a trading platform (174K+ monthly executions), voice AI transcription, compliance tools, and internal utilities. The hooks and rules evolved from real incidents: a bulk file deletion that took down a function app, a SQL query against the wrong database, a deploy that passed CI but had zero working functions.

Numbers from actual usage:

15 Azure Function Apps managed through this system
174,000+ function executions monitored monthly
300+ automated tests maintained across projects
148+ development sessions with cross-session memory

Design Notes (From an Ops Brain)

The architecture choices reflect an operations background more than a CS background:

Fail closed, not open: Hooks block bad actions by default. A commit with orphan files is stopped, not warned about.
Enforce, don't document: Every rule has a corresponding hook. "Don't hardcode credentials" is a nice guideline; a pre-commit hook that greps for API keys is an actual gate.
Human-in-the-loop: The system generates 2-3 options; I pick. No autonomous decision-making on architectural choices (Rule #12).
Explicit over magical: Capability routing uses a JSON registry with cost/latency metadata, not hidden heuristics.
Hostile review by default: code-judge runs adversarial audits looking for dead code, schema drift, and security gaps.

Repository Structure

.claude/
  CLAUDE.md                          # Root config: identity, 12 rules, routing
  capabilities-registry.json         # 68 capability entries with triggers + metadata
  rules/
    code-quality.md                  # Language standards, schema-first, test gates
    db-safety.md                     # Cross-DB isolation, migration safety
    visual-validation.md             # Playwright + Gemini screenshot validation
    azure-deploy.md                  # CI/CD safety, rollback procedures
    voice-agent-tuning.md            # ElevenLabs, Arabic TTS, SSML
    project-config.md                # Session lifecycle, context management
  hooks/
    schema-verify.sh                 # Pre-tool: block SQL without schema query
    dead-code-check.sh               # Pre-tool: block commits with orphan files
    stop-verify.sh                   # Post-tool: require proof before "done"
    deploy-gate.sh                   # Pre-tool: block premature verification
    session-start-enhanced.sh        # Session: load git context + project state
    README.md                        # Hook system documentation
  skills/
    multi-model-debate/
      instructions.md                # 6-model council protocol
    enforce-capabilities/
      instructions.md                # Plan enrichment with capability annotations
    ship-it/
      instructions.md                # Anti-perfectionism gate
    pre-mortem/
      instructions.md                # Risk assessment protocol
    README.md                        # How to create skills
  agents/
    architect-planner.md             # Design/planning agent
    code-worker.md                   # Implementation agent
    code-judge.md                    # Hostile review agent
    README.md                        # Agent system overview
docs/
  architecture.md                    # Detailed architecture with diagrams

What You Can Learn From This

How to structure a multi-agent Claude Code setup -- not just one CLAUDE.md, but a full system
How to enforce rules with hooks -- shell scripts that run at tool boundaries
How to route tasks to optimal models -- capability registry with cost/latency metadata
How to persist context across sessions -- status.json, decisions.log, Memory MCP
Real production lessons -- every "Origin:" comment is a real incident that shaped a rule

What This Isn't

Not a framework or library -- it's a configuration system for Claude Code (skills, hooks, rules, agents)
Not generalized for arbitrary setups -- it reflects my specific stack (Azure, PostgreSQL, Python) and my specific paranoia (schema verification, dead code detection)
Not a replacement for proper CI/CD -- the hooks add local safety gates, but production deploys still go through Azure DevOps pipelines
The agent/skill counts evolve and may not match the README exactly at any given commit

Why Not LangGraph / CrewAI / AutoGen?

Those are application frameworks for building multi-agent systems. This is a development environment -- it orchestrates how I write and ship code, not how end-users interact with AI. The closest analogy is a heavily customized IDE config, not a product architecture.

Contributing

PRs welcome. This is a living system that evolves with new models and tools.

If you add a new rule, include an "Origin:" comment explaining what production incident created it. Rules without origin stories are just opinions.

Motivation & Lessons Learned

This started after I accidentally deployed a function app with an orphan file that imported a module I'd already deleted. The deploy succeeded, the health check passed, and the first real request crashed. I wrote a pre-commit hook that night to trace imports from entry points. That was hook #1. The rest grew from similar incidents.

The hardest lesson was about schema drift. I wrote a SQL migration that assumed a column existed because the plan said it would. It didn't. The query silently returned empty results for two days before anyone noticed. Now every SQL operation queries information_schema first, no exceptions, even in dev. That rule alone has prevented more bugs than any other.

What surprised me: the multi-model debate pattern (routing the same question to 6 different LLMs and synthesizing disagreements) consistently caught architectural issues that no single model flagged. The models disagree in useful ways: one catches security issues, another catches performance implications, a third spots edge cases. The disagreements are the signal.

Skills Demonstrated

Multi-agent AI orchestration, production safety hooks, schema-first SQL development, multi-model consensus patterns, session persistence, CI/CD integration (Azure DevOps), MCP Protocol integration, cost/latency-aware model routing, defensive coding practices.

License

MIT

Oded-Ben-Yair/claude-code-orchestra