159 results for “topic:document-ai”
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
A Repo For Document AI
A curated list of resources for Document Understanding (DU) topic
PDF to markdown using vision LLMs — tables, layouts, and structure preserved
AI-powered StartUp Accelerator Engine built with Next.js, LangChain, PostgreSQL + pgvector. Upload, organize, and chat with documents. Includes predictive missing-document detection, role-based workflows, and page-level insight extraction.
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
Algorithms, papers, datasets, performance comparisons for Document AI.
Conversion from Excel to structured JSON (tables, shapes, charts) for LLM/RAG pipelines, and autonomous Excel reading and writing by AI agents through MCP integration.
ReadingBank: A Benchmark Dataset for Reading Order Detection
Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
AI Document Assistant for PSPDFKit Demo showcases how to interact with PDFs using natural language commands powered by AI, integrated with PSPDFKit for Web.
A Model Context Protocol (MCP) server implementation that integrates with the Nutrient Document Web Service (DWS) Processor API, providing powerful PDF processing capabilities for AI assistants.
[CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents
An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents. ICDAR, 2021"
This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-documentai-toolbox
Table detection (TD) and table structure recognition (TSR) using Yolov5/Yolov8, and you can get the same (even better) result compared with Table Transformer (TATR) with smaller models.
This repository includes all computer vision, audio, document AI, and multimodal projects.
🚀 100% local RAG system with one-command setup. Your data never leaves your server. AGPL-3.0
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
A curated list of resources on Table Structure Recognition
Ray-based accelerator for MinerU VLM inference pipeline. Lightweight, multi-GPU friendly PDF → Markdown processing. 基于 Ray 的 MinerU VLM 推理加速器,轻量、低侵入,面向多 GPU / 国产算力环境的 PDF → Markdown 处理方案。
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
🎙️ Voice-native document intelligence using Gemini, ElevenLabs STT/TTS, and Datadog observability — turning text documents into spoken conversations.
This repository contain the implementation of DANIEL. (A fast Document Attention Network for Information Extraction and Labeling of handwritten documents)
[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"
[Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"
FastAPI application for document classification using a multimodal LayoutLM model, designed to classify PDF documents into RVL-DCIP categories.
A Chatbot for the Document Analysis .