"topic:rag-evaluation" — Search

RAG boilerplate with semantic/propositional chunking, hybrid search (BM25 + dense), LLM reranking, query enhancement agents, CrewAI orchestration, Qdrant vector search, Redis/Mongo sessioning, Celery ingestion pipeline, Gradio UI, and an evaluation suite (Hit-Rate, MRR, hybrid configs).

Python6414Updated 2 weeks ago

ai-agentscrewaihybrid-searchllmpropositional-modelsqdrantquery-enhancementragrag-evaluationrag-pipelinererankingretrieval-augmented-generationsemantic-chunkingvector-database

mts-ai/rurage

No description provided.

Python340Updated 2 weeks ago

information-retrievalllm-evaluationquestion-answeringragrag-evaluation

HZYAI/RagScore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or CLI. Privacy-first, async, visual reports.

Python305Updated 1 week ago

ai-evaluationcolabdataset-generationevaluationfine-tuningjupyterllmllm-as-a-judgellmopslocal-llmmcpollamaprivacyqa-generationragrag-evaluationsynthetic-data

vero-labs-ai/vero-eval

Open source framework for evaluating AI Agents

Python282Updated 2 weeks ago

dataset-generationdatasetsevalsevaluationevaluation-frameworkevaluation-metricslanggraphllm-evaluationllm-evaluation-frameworkpythonrag-evaluationrag-testingsynthetic-dataset-generationtestingtesting-frameworktesting-libraryuser-persona

mburaksayici/smallevals

smallevals — CPU-fast, GPU-blazing fast offline retrieval evaluation for RAG systems with tiny QA models.

Python182Updated 2 weeks ago

chromachromadbmilvusoffline-evaluationqaqa-generationqdrantquestion-generationrag-evaluationretrieval-evaluationretrieval-metricstiny-llmvector-databaseweaviate

dokimos-dev/dokimos

Evaluation Framework for LLM applications in Java and Kotlin

Java182Updated 1 day ago

agent-evaluationagentic-aievaluationevaluation-frameworkevaluation-metricsjavajunitjunit-extensionkoogkotlinlangchain4jllmllm-evaluationllm-evaluation-frameworkllm-evaluation-metricsragrag-evaluationretrieval-augmented-generationspring-aispring-ai-evaluation

oztrkoguz/RAG-Framework-Evaluation

This project aims to compare different Retrieval-Augmented Generation (RAG) frameworks in terms of speed and performance.

Python141Updated 8 months ago

autogenautogen-ragcrewaicrewai-raglangchainlangchain-ragllamaindexllamaindex-ragragrag-evaluationswarmsswarms-rag

ioannis-papadimitriou/rag-playground

A framework for systematic evaluation of retrieval strategies and prompt engineering in RAG systems, featuring an interactive chat interface for document analysis.

Python112Updated 2 months ago

chatbotllm-inferenceqa-generationrag-evaluationretrieval-augmented-generation

simranjeet97/Learn_RAG_from_Scratch_LLM

Learn Retrieval-Augmented Generation (RAG) from Scratch using LLMs from Hugging Face and Langchain or Python

Jupyter Notebook107Updated 2 months ago

artificial-intelligencedatascience-machinelearninggenai-domaingenai-usecasegenerative-aillm-appsllm-evaluationllm-frameworkllm-trainingragrag-applicationrag-chatbotrag-embeddingsrag-evaluationrag-implementationrag-llmrag-modelrag-pipelineretrieval-augmented-generation

rostyslavshovak/RAG-Retrieval-Augmented-Generation

RAG Chatbot for Financial Analysis

Python80Updated 8 months ago

gradio-interfacelangchainopen-sourcepdfqdrant-vector-databaseragrag-evaluationretrieval-augmented-generation

fkapsahili/EntRAG

EntRAG - Enterprise RAG Benchmark

Python50Updated 6 days ago

benchmarkdatasetevaluationevaluationsgenerative-aiknowledge-graphllmllm-evaluationragrag-evaluationretrievalretrieval-augmented-generation

Assistant-Engine/AssistantEngine

A modular, multi-model AI assistant UI built on .NET 9, featuring RAG, extensible tools, and deep code + database knowledge through semantic search.

HTML51Updated 3 months ago

aiai-agentsassistant-chat-botslocallocal-aimaui-blazormcp-clientollama-appollama-guiragrag-evaluation

shaadclt/EvalRAG

A comprehensive evaluation toolkit for assessing Retrieval-Augmented Generation (RAG) outputs using linguistic, semantic, and fairness metrics

Python40Updated 8 months ago

ragrag-evaluation

RomiconEZ/llamator-mcp-server

MCP server for llamator: automate LLM red teaming workflows

Python40Updated 2 weeks ago

agentai-securityattackdetectionhallucinationhttp-serverjailbreakllamatorllmllm-securityllm-testingmcp-servermisinformationnlpowasppythonrag-evaluationred-teamsecurity-toolsvulnerability

SujalKamate/Intel-Unnati-Industrial-Training-2025--Slot-3

Problem Statement-1: Multilingual NCERT Doubt-Solver using OPEA-based RAG Pipeline. A multilingual doubt-solving system for Grades 5–10 built only on NCERT textbooks, using OCR ingestion, grade-aware retrieval, conversational Q&A with citations, feedback capture, and reliable out-of-scope fallback through a web-friendly chat interface.

Jupyter Notebook43Updated 2 weeks ago

augmentationaugmentation-pipelinequeryquery-optimizationragrag-chatbotrag-evaluationrag-pipeline

igorsuhinin/rag-pdf-qa

RAG-powered PDF QA system with self-reflection and multiple retrieval strategies (Stuff/Map Reduce/Refine). Includes monitoring via Langfuse & LangSmith and containerization with Docker

Python30Updated 3 weeks ago

aiai-agentchromadblangchainlangfuseopenai-apipythonragrag-applicationrag-evaluationtavily-search

adityapradhan202/Rag-Badger

Rag Badger is a minimal open source toolkit to evaluate RAG based systems efficiently. Best suited for evaluating RAG pipelines with the help of Ollama's local LLMs and embedding models.

Python30Updated 1 month ago

agentic-aigenerative-ailangchain-pythonrag-evaluationrag-pipeline

tarekmasryo/rag-qa-logs-and-corpus

Multi-table RAG QA telemetry + decision-grade RAG Ops notebook for retrieval attribution, hallucination risk slicing, and quality×cost×latency trade-offs.

Jupyter Notebook20Updated 1 month ago

cost-analysishallucination-detectionkagglellmopsragrag-evaluationtelemetry

tarekmasryo/rag-qa-logs-corpus-data

Synthetic multi-table RAG QA telemetry benchmark (corpus→chunks→retrieval→eval): labels for correctness/faithfulness/hallucination + cost/latency for RAG evaluation and dashboards.

Python20Updated 1 month ago

datasetkagglellmopsragrag-evaluationsynthetic-datatelemetry

neomatrix369/AIE7-Cert-Challenge

AIE7: Certification Challenge

Jupyter Notebook22Updated 4 months ago

agentic-aiai-assistantcustomer-servicefastapifederal-student-loansgpt-4hybrid-datasetlangchainlanggraphmulti-agent-systemsopenaipythonqdrantragrag-evaluationragasretrieval-augmented-generationsemantic-searchvector-databasevector-databases

AnasAber/MLflow_with_RAG

Using MLflow to deploy your RAG pipeline, using LLamaIndex, Langchain and Ollama/HuggingfaceLLMs/Groq

Python21Updated 4 months ago

cicddeploymentevaluation-metricsllamaindexllamaindex-ragmlflowmlflow-deployementmlflow-projectsmlflow-trackingmlflow-tracking-servermlflow-uimlopsmlops-projectmlops-templateragrag-evaluationrag-pipeline

RAILethicsHub/rail-score

Python SDK

Python21Updated 1 week ago

ai-ethicsai-evaluationai-safetyartificial-intelligencecompliancecontent-moderationgdprllmmachine-learningpythonrag-evaluationrail-scoreresponsible-aisdk

alinaleo27/ai-rag-eval-qa

AI RAG evaluation project using Ragas. Includes RAG metrics (precision, recall, faithfulness), retrieval diagnostics, and prompt testing examples for fintech/banking LLM systems. Designed as an AI QA Specialist portfolio project.

Python10Updated 3 months ago

ai-qallm-evaluationllm-testingprompt-testingrag-evaluationragas

sprakash21/aws-genai-rageval-bot

RAG Pipeline Evaluation and monitoring on AWS using RAGAS

Python10Updated 7 months ago

genai-chatbotmonitoringrag-evaluationragas

Page 1 of 3