RAG in Production

Status: actively building, dropping soon. Star the repo to get notified.

Production-grade RAG on Azure with Node.js and LangGraph. The next step after rag-from-scratch.

What this is

If you've built a RAG prototype, you already know the fundamentals - embeddings, chunking, retrieval, generation. That part is well-covered.

This repo is about what comes after that. The decisions that only surface when you try to run RAG reliably at scale: infrastructure that doesn't drift, an index that stays fresh when your data changes, agents that make retrieval actually trustworthy, and security that is well thought through and reliable.

It's a reference architecture. Opinionated, fully working, built on Azure - every layer wired together the way it needs to be in production, not just in a demo.

The 7 pillars

1. Infrastructure as Code with Azure Bicep

Every Azure resource AI Search, CosmosDB, Azure OpenAI, Functions, Web Apps, Key Vault is declared in Bicep. Nothing is created manually through the portal. You deploy the whole stack with a single command, tear it down, and redeploy identically. That's the guarantee IaC gives you.

2. Event-driven ingestion pipeline

Documents change. Policies get updated, manuals get revised, data gets added. Most RAG demos ignore this entirely and assume a static corpus. This repo uses Azure Functions triggered by Event Grid to detect changes in Blob Storage and re-index automatically. Your vector index stays in sync without manual intervention.

3. Hybrid vector + keyword search with Azure AI Search

Pure vector search is not always the right tool. Hybrid search, combining dense vector similarity with traditional keyword matching, consistently outperforms either approach alone, especially for domain-specific content with exact terminology. Azure AI Search handles both in a single query.

4. Stateful agent orchestration with LangGraph

This is where reliability comes from. Rather than passing a raw user query directly to the LLM, a LangGraph agent:

Rewrites the query for better retrieval
Retrieves grounded chunks from the vector index
Optionally performs a structured SQL lookup
Loads conversation history from CosmosDB
Assembles context and streams the response

The LLM is the last step, not the only step. Responses are grounded and auditable.

5. Persistent memory with CosmosDB

Conversation history and agent state are persisted across sessions in CosmosDB. A user's follow-up question carries context from earlier in the conversation. Without this, every message is treated as if it's the first which is fine for demos, not for production.

6. Zero-secret security with Managed Identity

No API keys in .env files. No connection strings committed to repos. Every service Web Apps, Azure Functions, GitHub Actions authenticates to every other service via Azure Managed Identity and OIDC. Azure Key Vault stores the few secrets that can't be avoided. Azure Policy enforces it.

7. Observability with Application Insights

You can't improve what you can't measure. From day one, the stack tracks:

Retrieval quality per query
LLM latency and token counts
Ingestion pipeline success/failure rates
End-to-end response times

When retrieval silently degrades — and it will — you'll know before your users do.

Stack

Layer	Technology
Language	Node.js / TypeScript
Agent orchestration	LangGraph
LLM + embeddings	Azure OpenAI
Vector index	Azure AI Search
Document storage	Azure Blob Storage
Memory / state	CosmosDB
Ingestion trigger	Azure Functions + Event Grid
Hosting	Azure Web Apps
Infrastructure	Azure Bicep
CI/CD	GitHub Actions
Security	Azure Entra ID · Key Vault · Managed Identity
Observability	Application Insights · Azure Monitor

Who this is for

Developers who have worked through RAG fundamentals and want to see what a full production setup looks like
Architects evaluating Azure as the platform for a production RAG system
Teams who need a reference they can fork and adapt rather than build from scratch

Some familiarity with Azure and Node.js is helpful. You don't need to be an Azure expert, the Bicep templates handle the infrastructure and the repo will be documented step by step.

What's coming

Full Bicep templates for every resource
Ingestion pipeline with Event Grid trigger
LangGraph agent with query rewriting and hybrid retrieval
CosmosDB memory integration
Streaming responses over WebSockets
Application Insights setup and dashboard
Step-by-step deployment guide
Architecture deep-dives for each pillar

Prerequisites (when it drops)

Azure subscription
Node.js 20+
Azure CLI
Basic familiarity with RAG concepts if you're new to RAG, start with rag-from-scratch first

rag-from-scratch the fundamentals: embeddings, chunking, retrieval, generation. Start here if you're new to RAG.

Stay updated

Star the repo, GitHub will notify you when the first release drops.

Questions, ideas, or feedback before launch? Open a Discussion.

Built by Patric Gutersohn

pguso/rag-in-production

RAG in Production

What this is

The 7 pillars

1. Infrastructure as Code with Azure Bicep

2. Event-driven ingestion pipeline

3. Hybrid vector + keyword search with Azure AI Search

4. Stateful agent orchestration with LangGraph

5. Persistent memory with CosmosDB

6. Zero-secret security with Managed Identity

7. Observability with Application Insights

Stack

Who this is for

What's coming

Prerequisites (when it drops)

Stay updated

On this page

Contributors

pguso/rag-in-production

RAG in Production

What this is

The 7 pillars

1. Infrastructure as Code with Azure Bicep

2. Event-driven ingestion pipeline

3. Hybrid vector + keyword search with Azure AI Search

4. Stateful agent orchestration with LangGraph

5. Persistent memory with CosmosDB

6. Zero-secret security with Managed Identity

7. Observability with Application Insights

Stack

Who this is for

What's coming

Prerequisites (when it drops)

Related

Stay updated

On this page

Contributors