68 results for “topic:document-understanding”
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
A Repo For Document AI
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
A curated list of resources for Document Understanding (DU) topic
Parsing-free RAG supported by VLMs
Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
Sample applications and demos for Document AI, the end-to-end document processing platform on Google Cloud
A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating.
Algorithms, papers, datasets, performance comparisons for Document AI.
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.
ReadingBank: A Benchmark Dataset for Reading Order Detection
Object Detection Model for Scanned Documents
Checkbox Detection Model for Scanned Documents
🐊 Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project? Drop a star! ⭐
Datasets and Evaluation Scripts for CompHRDoc
3DCF / doc2dataset: token-efficient document layer with NumGuard numeric integrity and multi-framework exports for RAG & fine-tuning.
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
Docling4j brings the functionalities of Docling in document understanding to Java® projects
TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning
[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"
Implementation of the paper: Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer.
A monorepo containing various utility scripts, tools, and applications for development, automation, and AI-powered tasks.
Run optical character recognition with PyTesseract from the FiftyOne App!
将语雀知识库接入大语言模型,实现基于 RAG(检索增强生成)的智能问答系统,支持FastAPI,兼容OpenAI API与本地Ollama模型。
Official evaluation scripts and baseline prompts for the DocVQA 2026 (ICDAR 2026) Competition on Multimodal Reasoning over Documents.
Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.