67 results for “topic:rag-evaluation”
🐢 Open-Source Evaluation & Testing library for LLM Agents
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.
No description provided.
RAG evaluation without the need for "golden answers"
Red Teaming python-framework for testing chatbots and GenAI systems.
RAG boilerplate with semantic/propositional chunking, hybrid search (BM25 + dense), LLM reranking, query enhancement agents, CrewAI orchestration, Qdrant vector search, Redis/Mongo sessioning, Celery ingestion pipeline, Gradio UI, and an evaluation suite (Hit-Rate, MRR, hybrid configs).
No description provided.
⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or CLI. Privacy-first, async, visual reports.
Open source framework for evaluating AI Agents
smallevals — CPU-fast, GPU-blazing fast offline retrieval evaluation for RAG systems with tiny QA models.
Evaluation Framework for LLM applications in Java and Kotlin
This project aims to compare different Retrieval-Augmented Generation (RAG) frameworks in terms of speed and performance.
A framework for systematic evaluation of retrieval strategies and prompt engineering in RAG systems, featuring an interactive chat interface for document analysis.
Learn Retrieval-Augmented Generation (RAG) from Scratch using LLMs from Hugging Face and Langchain or Python
RAG Chatbot for Financial Analysis
EntRAG - Enterprise RAG Benchmark
A modular, multi-model AI assistant UI built on .NET 9, featuring RAG, extensible tools, and deep code + database knowledge through semantic search.
A comprehensive evaluation toolkit for assessing Retrieval-Augmented Generation (RAG) outputs using linguistic, semantic, and fairness metrics
MCP server for llamator: automate LLM red teaming workflows
Problem Statement-1: Multilingual NCERT Doubt-Solver using OPEA-based RAG Pipeline. A multilingual doubt-solving system for Grades 5–10 built only on NCERT textbooks, using OCR ingestion, grade-aware retrieval, conversational Q&A with citations, feedback capture, and reliable out-of-scope fallback through a web-friendly chat interface.
RAG-powered PDF QA system with self-reflection and multiple retrieval strategies (Stuff/Map Reduce/Refine). Includes monitoring via Langfuse & LangSmith and containerization with Docker
Rag Badger is a minimal open source toolkit to evaluate RAG based systems efficiently. Best suited for evaluating RAG pipelines with the help of Ollama's local LLMs and embedding models.
Multi-table RAG QA telemetry + decision-grade RAG Ops notebook for retrieval attribution, hallucination risk slicing, and quality×cost×latency trade-offs.
Synthetic multi-table RAG QA telemetry benchmark (corpus→chunks→retrieval→eval): labels for correctness/faithfulness/hallucination + cost/latency for RAG evaluation and dashboards.
AIE7: Certification Challenge
Using MLflow to deploy your RAG pipeline, using LLamaIndex, Langchain and Ollama/HuggingfaceLLMs/Groq
Python SDK
AI RAG evaluation project using Ragas. Includes RAG metrics (precision, recall, faithfulness), retrieval diagnostics, and prompt testing examples for fintech/banking LLM systems. Designed as an AI QA Specialist portfolio project.
RAG Pipeline Evaluation and monitoring on AWS using RAGAS