GitHunt
HE

hetvidoshi22/AI-DPR-Evaluation-System

NLP-based system to automate compliance checks and risk analysis of government DPR PDFs using hybrid RAG and rule-based validation.

๐Ÿค– AI-Based DPR Evaluation System (SIH โ€“ MDoNER)

๐Ÿ“Œ Overview

The Ministry of Development of North Eastern Region (MDoNER) evaluates hundreds of Detailed Project Reports (DPRs) for infrastructure and socio-economic development projects. Manual evaluation is time-consuming, inconsistent, and prone to human error, often delaying project approvals.

This project implements an offline AI-powered DPR evaluation system that automatically extracts, analyzes, and evaluates DPR PDFs using NLP, OCR, and Retrieval-Augmented Generation (RAG) to support faster, consistent, and explainable decision-making.


๐ŸŽฏ Key Objectives

  • Automate DPR evaluation for completeness, consistency, and risk
  • Handle scanned, multilingual, and unstructured PDFs
  • Provide explainable compliance validation
  • Enable offline deployment for low-connectivity regions

๐Ÿ”„ System Workflow

  1. DPR PDF upload via FastAPI
  2. Hybrid PDF extraction (PyMuPDF โ†’ pdfplumber โ†’ OCR fallback)
  3. Sentence-aware semantic chunking (200โ€“500 words)
  4. Retrieval-Augmented Generation (RAG) for context-aware analysis
  5. Rule-based compliance and consistency checks
  6. Evidence extraction from source text
  7. Structured JSON output for decision support

โœจ Core Features

  • Hybrid PDF extraction for digital and scanned DPRs
  • OCR fallback using PaddleOCR and Tesseract
  • Semantic chunking preserving contextual information
  • RAG-based context-aware analysis
  • Deterministic rule-based compliance engine
  • Offline-first design using local LLMs (Ollama)
  • FastAPI backend with a single /process endpoint

๐Ÿ›  Tech Stack

Main

  • Python
  • FastAPI
  • NLP (RAG)
  • Rule-Based Systems

Libraries & Tools

  • PyMuPDF, pdfplumber
  • PaddleOCR, pytesseract, Pillow
  • Sentence Transformers
  • Ollama
  • NLTK

๐Ÿ“Š Output

The system generates:

  • Extracted text in JSON format
  • Semantic text chunks
  • Answers with evidence snippets
  • Structured compliance and risk insights

๐Ÿ‘ฅ Team & Contribution

  • Developed as part of Smart India Hackathon (SIH).