BhaveshBytess/Research-Paper-Analyzer
Automated research paper analysis: PDF โ JSON with evidence extraction using LLMs (DeepSeek, Gemma). Extracts methods, results, datasets, and claims with precise evidence grounding.
Research Paper Analyzer
Automated extraction of structured data from scientific papers with evidence grounding and validation.
๐ Live Demo
Try it now: https://research-paper-analyzer-ack6bpdauvevnlnfbx7gpz.streamlit.app
Note: Demo uses DeepSeek v3.1 free tier. First run may take 30-60 seconds for model initialization.
Demo
Overview
Research Paper Analyzer transforms scientific PDFs into structured, machine-readable JSON with page-level evidence grounding. Built for researchers, ML engineers, and literature review automation, it extracts methods, results, datasets, and claims while maintaining traceability to source text.
Key differentiator: Evidence-grounded extraction with numeric consistency validation โ not just LLM scraping.
PDF Input โ Layout Analysis โ LLM Extraction โ Schema Validation โ Evidence Linking โ Structured JSON
Why This Exists
The Problem
- Manual paper analysis doesn't scale
- Existing tools extract text but lose structure
- LLM outputs are unreliable without validation
- No traceability from claims to source evidence
This Solution
- โ Structured extraction with enforced schema
- โ Evidence grounding โ every claim links to page + snippet
- โ Numeric consistency checks โ catches hallucinated metrics
- โ Model-agnostic โ works with DeepSeek, Gemma, Claude, GPT
- โ Production-validated โ 100% success rate on 10 diverse papers
Features
Core Pipeline
- PDF Parsing: Multi-layout understanding (text, figures, tables, equations)
- Context Building: Semantic chunking for 5 extraction heads (metadata, methods, results, limitations, summary)
- LLM Extraction: Parallel extraction with automatic repair
- Schema Enforcement: Pydantic models + JSON schema validation
- Evidence Attachment: Fuzzy matching (85% threshold) with page references
- Consistency Validation: Range checks, baseline logic, unit verification
Evaluation Metrics (Production-Validated)
| Metric | Score | Status |
|---|---|---|
| JSON Validity | 100% | โ Schema compliance |
| Evidence Precision | 81% | โ Grounding quality |
| Field Coverage | 100% | โ Complete extraction |
| Numeric Consistency | 100% | โ Zero hallucinations |
| Summary Alignment | 58% | ๐ก Context matching |
Benchmarked on 10 real papers (7-29 pages) including "Attention is All You Need"
User Interfaces
- Streamlit Web UI: Interactive upload, extraction, visualization
- CLI Tool: Batch processing with checkpoint/resume
- Python API: Programmatic access for pipelines
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ INPUT LAYER โ
โ PDF Upload โ PyMuPDF Parser โ Text + Layout Extraction โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PROCESSING LAYER โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ Metadata โ โ Methods โ โ Results โ โ
โ โ Extractor โ โ Extractor โ โ Extractor โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ LLM Backend (DeepSeek/Gemma) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ VALIDATION LAYER โ
โ JSON Repair โ Schema Validation โ Numeric Consistency โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ EVIDENCE LAYER โ
โ Fuzzy Matching โ Page Linking โ Snippet Extraction โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OUTPUT LAYER โ
โ Structured JSON + Evidence + Evaluation Metrics โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Quick Start
Installation
# Clone repository
git clone https://github.com/BhaveshBytess/research-paper-analyzer.git
cd research-paper-analyzer
# Create virtual environment (Python 3.10+)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set API key (OpenRouter for DeepSeek)
export OPENROUTER_API_KEY="your-key-here"Usage
Web UI (Recommended)
# Local
cd research-paper-analyzer
streamlit run app/app.py
# Or visit the live demo:
# https://research-paper-analyzer-ack6bpdauvevnlnfbx7gpz.streamlit.appCLI (Single Paper)
python run_now.py /path/to/paper.pdfCLI (Batch Processing)
python batch_deepseek_inline.py
# Processes 2 papers at a time with auto-resume
# Results saved to batch_eval_results/Python API
from research_paper_analyzer import extract_paper
result = extract_paper(
pdf_path="paper.pdf",
model="deepseek",
validate=True,
attach_evidence=True
)
print(result.json(indent=2))Output Schema
Core Fields
{
"title": "string",
"authors": ["string"],
"year": 2024,
"venue": "string | null",
"arxiv_id": "string | null",
"methods": [
{
"name": "string",
"category": "CNN | Transformer | GNN | ...",
"components": ["string"],
"description": "string"
}
],
"results": [
{
"dataset": "string",
"metric": "string",
"value": 0.95,
"unit": "%" | "points" | null,
"split": "test | val | train",
"higher_is_better": true,
"baseline": "string | null",
"ours_is": "string | null",
"confidence": 0.9
}
],
"tasks": ["string"],
"datasets": ["string"],
"limitations": "string | null",
"ethics": "string | null",
"summary": "string",
"evidence": {
"title": [{"page": 1, "snippet": "..."}],
"methods": [{"page": 3, "snippet": "..."}],
"results": [{"page": 7, "snippet": "..."}]
}
}Validation Rules
- โ
All numeric results must have valid
value(not null) - โ Percentages constrained to [0, 100]
- โ Confidence scores constrained to [0, 1]
- โ
higher_is_betterlogic enforced vs. baseline - โ Evidence keys must match extracted fields
Benchmarks
Performance (10 Papers, Mixed Domains)
| Metric | Target | Achieved | Notes |
|---|---|---|---|
| JSON Validity | 100% | 100% | All outputs schema-compliant |
| Evidence Precision | โฅ70% | 81% | Grounding to source text |
| Field Coverage | 100% | 100% | No missing required fields |
| Numeric Consistency | 100% | 100% | Zero hallucinated metrics |
| Processing Speed | <2 min/paper | ~2 min | On free-tier API |
Test Set Details
- Papers: 10 (GNN methods, transformers, graph learning)
- Page range: 7-29 pages
- Venues: ICLR, NIPS, arXiv
- Success rate: 100% (10/10 papers extracted)
- Perfect papers: 2 (all metrics = 1.00)
Landmark paper tested: "Attention is All You Need" (Vaswani et al.) โ successfully extracted all 8 authors, transformer components, and BLEU scores.
Project Structure
research-paper-analyzer/
โโโ research-paper-analyzer/
โ โโโ app.py # Streamlit UI
โ โโโ pdf_parser.py # PyMuPDF extraction
โ โโโ llm_extractor.py # LLM extraction logic
โ โโโ schema.py # Pydantic models
โ โโโ evidence_matcher.py # Fuzzy evidence linking
โ โโโ eval_metrics.py # Consistency validation
โโโ batch_deepseek_inline.py # Batch evaluation script
โโโ create_visualizations.py # Metric visualization
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
โโโ batch_eval_results/ # Evaluation results
โ โโโ results.csv # Metrics table
โ โโโ visualizations/ # 8 analysis charts
โ โโโ summary/ # Detailed reports
โโโ samples/ # Test papers + results
โโโ datastore/ # Cache + intermediate data
Development
Running Tests
# Unit tests (TODO: expand coverage)
pytest tests/
# Integration test on sample paper
python test_consistency.pyAdding a New LLM Backend
- Implement
BaseLLMExtractorinterface inllm_extractor.py - Add model config to
schema.py - Update
run_now.pywith new model option
Contributing
See CONTRIBUTING.md for:
- Code style (Black, isort)
- PR checklist
- Issue templates
- Architecture decisions
Known Limitations
Current Scope
- โ No OCR support โ requires digital PDFs (not scanned images)
- โ No figure extraction โ text-only for now
- โ English papers only โ no multilingual support yet
โ ๏ธ Free-tier rate limits โ 16 req/min on OpenRouter (manageable for batch)
Improvement Areas
- ๐ก Summary alignment (58%) โ threshold tuning needed
- ๐ก Complex table parsing โ nested tables occasionally missed
- ๐ก Citation extraction โ not yet implemented
Non-Issues
- โ Numeric consistency โ validated at 100% (production-ready)
- โ Schema compliance โ 100% across all tests
- โ Evidence grounding โ 81% precision (excellent)
Roadmap
v1.1 (Current)
- Core extraction pipeline
- Evidence grounding
- Numeric consistency validation
- Batch evaluation system
- Comprehensive benchmarks
v1.2 (Next)
- OCR support (scanned PDFs)
- Figure caption extraction
- Citation graph parsing
- Multi-paper comparison UI
- Active learning for uncertain extractions
v2.0 (Future)
- Multilingual support (non-English papers)
- Table structure extraction
- Equation parsing (LaTeX)
- Real-time collaboration (multi-user annotation)
- API service deployment (FastAPI + Docker)
Citation
If you use this tool in your research, please cite:
@software{research_paper_analyzer_2024,
author = {Bhavesh Bytess},
title = {Research Paper Analyzer: Evidence-Grounded PDF Extraction},
year = {2024},
url = {https://github.com/BhaveshBytess/research-paper-analyzer}
}License
MIT License - see LICENSE for details.
Acknowledgments
- PyMuPDF for robust PDF parsing
- OpenRouter for LLM API access
- DeepSeek for high-quality extraction
- Streamlit for rapid UI prototyping
Contact & Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: 10bhavesh7.11@gmail.com
Maintained by: Bhavesh Bytess
Status: Active development, production-validated, seeking contributors
Quick Links
- ๐ Live Demo โ Try it in your browser
- ๐ Batch Evaluation Results
- ๐ Visualizations
- ๐งช Test Papers
- ๐ Deployment Guide
- ๐ Project Completion Report
- ๐ API Documentation (coming soon)
- ๐ฏ Contribution Guide
Last Updated: 2025-11-03
Version: 1.1.0
Production Status: โ
Validated (100% success rate on 10 papers)
