"topic:vlm-ocr" — Search

14 results for “topic:vlm-ocr”

bytedance/Dolphin

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Python8.9k748Updated 1 day ago

document-analysislayout-analysisocrparserpdfpdf-converterpdf-parserpythonvlm-ocr

vlm-run/vlmrun-hub

A hub for various industry-specific schemas to be used with VLMs.

Python54122Updated 1 week ago

aicomputer-visionetlgenaijsonmultimodalpydanticpydantic-modelsvlmvlm-ocr

Roots-Automation/GutenOCR

Open-source tools for training and evaluating Vision Language Models for OCR

Python17317Updated 1 day ago

llmsmultigpuocrvllmvlm-ocrvlms

video-db/ocr-benchmark

Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments

Python474Updated 1 month ago

arxivbenchmarkeasyocrocrrapidocrresearch-papervideodbvlm-ocrvlms

seanpedrick-case/doc_redaction

Redact PDF/image-based documents, Word, or CSV/XLSX files using a graphical user interface. Demo: https://huggingface.co/spaces/seanpedrickcase/document_redaction or with try with VLMs: https://huggingface.co/spaces/seanpedrickcase/document_redaction_vlm

Python428Updated 1 week ago

documentsgradionlppdfpiipii-detectionredactionvlmvlm-ocr

OmarSamirz/ImageFromTextGenerator

IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.

Python201Updated 3 months ago

artificial-noiseaugmentationdata-augmentationdataset-generationdataset-generatorimageimage-processingnoisenoise-addernoise-additionnoise-augmentationocroptical-character-recognitionsyntheticsynthetic-datasynthetic-data-generationtexttraining-dataset-augmentationtraining-dataset-generatorvlm-ocr

aeilot/OCRit

OCR it on macOS with DeepSeek-OCR

Swift90Updated 3 months ago

aideepseekmacosocrswiftswiftuivlmvlm-ocr

morkev/vlm-yolo-detector

Document image retrieval via MCP or API for agentic systems using semantic embeddings, YOLO, and VLM classification.

Python21Updated 1 month ago

apidiagram-extractionimage-classificationimage-retrievalmachine-classificationmanufacturingmcpollamaollama-apipdf-document-processorpymupdfschematic-diagramschematicssemantic-embeddingvlmvlm-ocrvlm-yoloyolo

posicube-services/KolmOCR

Korean olmOCR

HTML22Updated 1 week ago

documentvlmvlm-ocr

Niraya666/DocuLingo

DocuLingo is a powerful document parsing tool built with multimodal large language models to enhance RAG (Retrieval Augmented Generation) workflows.

Python10Updated 5 months ago

document-convertingragvlm-ocr

davccavalcante/CyberTechVLMDetector

The CyberTech VLM Detector is a computer vision system designed to run entirely on edge devices, without requiring cloud access. The system uses vision-language models (VLM) to detect and locate objects in images based on natural language commands and development, including my creation of HIM™ and MAIC™

Python12Updated 5 months ago

cameradavidccavalcantedetectorpythonreadtakk-agtakk-designtakk8isviewvlmvlm-ocrvlms

CVidalG/HackathonHN-2026-HTR

Hackathons 2026 du master Humanités numériques (Enc-PSL) - résultats du pôle HTR

Python00Updated 1 month ago

digital-humanitiesfew-shot-learninghistorical-manuscriptshtrvlm-ocr

AGasthya283/Play-with-Images

Python, Streamlit, UNET, custom-neural-networks, VLMs, OpenCV

Jupyter Notebook00Updated 3 weeks ago

aiimage-processingpythonpytorchstreamlitunetvlmvlm-ocr

sevkaz/Vision_Language_Video_Scanner

VLM-first video frame scanner that analyzes video frames with a vision-language model and optional OCR.

Python00Updated 6 days ago

aiframesvlmvlm-ocryoutube