14 results for “topic:vlm-ocr”
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
A hub for various industry-specific schemas to be used with VLMs.
Open-source tools for training and evaluating Vision Language Models for OCR
Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments
Redact PDF/image-based documents, Word, or CSV/XLSX files using a graphical user interface. Demo: https://huggingface.co/spaces/seanpedrickcase/document_redaction or with try with VLMs: https://huggingface.co/spaces/seanpedrickcase/document_redaction_vlm
IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.
OCR it on macOS with DeepSeek-OCR
Document image retrieval via MCP or API for agentic systems using semantic embeddings, YOLO, and VLM classification.
Korean olmOCR
DocuLingo is a powerful document parsing tool built with multimodal large language models to enhance RAG (Retrieval Augmented Generation) workflows.
The CyberTech VLM Detector is a computer vision system designed to run entirely on edge devices, without requiring cloud access. The system uses vision-language models (VLM) to detect and locate objects in images based on natural language commands and development, including my creation of HIM™ and MAIC™
Hackathons 2026 du master Humanités numériques (Enc-PSL) - résultats du pôle HTR
Python, Streamlit, UNET, custom-neural-networks, VLMs, OpenCV
VLM-first video frame scanner that analyzes video frames with a vision-language model and optional OCR.