GitHunt
PG

pguso/rag-in-production

Production-grade RAG on Azure with Node.js - covers Bicep IaC, event-driven ingestion, LangGraph agents, CosmosDB memory, hybrid vector search, Managed Identity security, and full observability. The next step after rag-from-scratch.

RAG in Production

Status: actively building, dropping soon. Star the repo to get notified.

Production-grade RAG on Azure with Node.js and LangGraph. The next step after rag-from-scratch.


What this is

If you've built a RAG prototype, you already know the fundamentals - embeddings, chunking, retrieval, generation. That part is well-covered.

This repo is about what comes after that. The decisions that only surface when you try to run RAG reliably at scale: infrastructure that doesn't drift, an index that stays fresh when your data changes, agents that make retrieval actually trustworthy, and security that is well thought through and reliable.

It's a reference architecture. Opinionated, fully working, built on Azure - every layer wired together the way it needs to be in production, not just in a demo.


The 7 pillars

1. Infrastructure as Code with Azure Bicep

Every Azure resource AI Search, CosmosDB, Azure OpenAI, Functions, Web Apps, Key Vault is declared in Bicep. Nothing is created manually through the portal. You deploy the whole stack with a single command, tear it down, and redeploy identically. That's the guarantee IaC gives you.

2. Event-driven ingestion pipeline

Documents change. Policies get updated, manuals get revised, data gets added. Most RAG demos ignore this entirely and assume a static corpus. This repo uses Azure Functions triggered by Event Grid to detect changes in Blob Storage and re-index automatically. Your vector index stays in sync without manual intervention.

Pure vector search is not always the right tool. Hybrid search, combining dense vector similarity with traditional keyword matching, consistently outperforms either approach alone, especially for domain-specific content with exact terminology. Azure AI Search handles both in a single query.

4. Stateful agent orchestration with LangGraph

This is where reliability comes from. Rather than passing a raw user query directly to the LLM, a LangGraph agent:

  • Rewrites the query for better retrieval
  • Retrieves grounded chunks from the vector index
  • Optionally performs a structured SQL lookup
  • Loads conversation history from CosmosDB
  • Assembles context and streams the response

The LLM is the last step, not the only step. Responses are grounded and auditable.

5. Persistent memory with CosmosDB

Conversation history and agent state are persisted across sessions in CosmosDB. A user's follow-up question carries context from earlier in the conversation. Without this, every message is treated as if it's the first which is fine for demos, not for production.

6. Zero-secret security with Managed Identity

No API keys in .env files. No connection strings committed to repos. Every service Web Apps, Azure Functions, GitHub Actions authenticates to every other service via Azure Managed Identity and OIDC. Azure Key Vault stores the few secrets that can't be avoided. Azure Policy enforces it.

7. Observability with Application Insights

You can't improve what you can't measure. From day one, the stack tracks:

  • Retrieval quality per query
  • LLM latency and token counts
  • Ingestion pipeline success/failure rates
  • End-to-end response times

When retrieval silently degrades — and it will — you'll know before your users do.


Stack

Layer Technology
Language Node.js / TypeScript
Agent orchestration LangGraph
LLM + embeddings Azure OpenAI
Vector index Azure AI Search
Document storage Azure Blob Storage
Memory / state CosmosDB
Ingestion trigger Azure Functions + Event Grid
Hosting Azure Web Apps
Infrastructure Azure Bicep
CI/CD GitHub Actions
Security Azure Entra ID · Key Vault · Managed Identity
Observability Application Insights · Azure Monitor

Who this is for

  • Developers who have worked through RAG fundamentals and want to see what a full production setup looks like
  • Architects evaluating Azure as the platform for a production RAG system
  • Teams who need a reference they can fork and adapt rather than build from scratch

Some familiarity with Azure and Node.js is helpful. You don't need to be an Azure expert, the Bicep templates handle the infrastructure and the repo will be documented step by step.


What's coming

  • Full Bicep templates for every resource
  • Ingestion pipeline with Event Grid trigger
  • LangGraph agent with query rewriting and hybrid retrieval
  • CosmosDB memory integration
  • Streaming responses over WebSockets
  • Application Insights setup and dashboard
  • Step-by-step deployment guide
  • Architecture deep-dives for each pillar

Prerequisites (when it drops)

  • Azure subscription
  • Node.js 20+
  • Azure CLI
  • Basic familiarity with RAG concepts if you're new to RAG, start with rag-from-scratch first

rag-from-scratch the fundamentals: embeddings, chunking, retrieval, generation. Start here if you're new to RAG.


Stay updated

Star the repo, GitHub will notify you when the first release drops.

Questions, ideas, or feedback before launch? Open a Discussion.


Built by Patric Gutersohn