"topic:layout-analysis" — Search

87 results for “topic:layout-analysis”

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

ai4sciencedocument-analysisextract-datalayout-analysisocrparserpdfpdf-converterpdf-extractor-llmpdf-extractor-pretrainpdf-extractor-ragpdf-parserpython

bytedance/Dolphin

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Python8.9k751Updated 3 months ago

document-analysislayout-analysisocrparserpdfpdf-converterpdf-parserpythonvlm-ocr

Layout-Parser/layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis

Python5.7k525Updated 1 year ago

computer-visiondeep-learningdetectron2document-image-processingdocument-layout-analysislayout-analysislayout-detectionlayout-parserobject-detectionocr

breezedeus/Pix2Text

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

Jupyter Notebook3.1k265Updated 1 month ago

image-to-markdownlatexlatex-pdflayout-analysismath-formulamath-formula-recognitionmath-ocrmathpixocrpythonpytorchtable-ocr

UglyToad/PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)

C#2.4k311Updated 6 days ago

alto-xmlcsharpdocument-analysishocrlayout-analysisnetstandardpage-xmlpdfpdf-documentpdf-document-processorpdf-extractorpdf-filespdf-generationpdfbox

kotaro-kinoshita/yomitoku

YomiTokuはAIを活用した日本語文書解析エンジンを提供するPythonパッケージです。 Yomitoku is an AI-powered document image analysis package designed specifically for the Japanese language.

Python1.4k51Updated 2 days ago

deep-learninglayout-analysisocrpythonpytorch

mittagessen/kraken

OCR engine for all the languages

Python967158Updated 2 days ago

alto-xmlhandwritten-text-recognitionhocrhtrlayout-analysisneural-networksocroptical-character-recognitionpage-xml

BobLd/DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

C#63369Updated 2 years ago

altoalto-xmlcsharpdocstrumdocument-layout-analysishocrhocr-documentslayout-analysispage-segmentationpage-xmlpdfpdfpigrecursive-xy-cuttable-extractionteixy-cutxycut

mindspore-lab/mindocr

A toolbox of ocr models and algorithms based on MindSpore

Python29862Updated 8 months ago

crnndbnetdeep-learningkey-information-extractionlayout-analysislayoutxlmmindsporeocrocr-large-modeltable-recognitiontablemastertext-detectiontext-recognitionvary-toy

RapidAI/RapidLayout

Analysis of Chinese and English layouts 中英文版面分析

Python26921Updated 2 weeks ago

cdladoclayout-yololayoutlayout-analysispp-structure

RapidAI/RapidDocEx

📝 针对文档类图像做内容提取，将文档类图像一比一输出到Word或者Txt中，便于进一步使用或处理。后续计划支持输入PDF/图像，输出对应json格式、Txt格式、Word格式和Markdown格式。

Python2058Updated 1 year ago

layout-analysislayout-recover

FreeOCR-AI/yolo-doclaynet

YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis

Python15320Updated 1 week ago

doclaynetdocument-analysislayout-analysisultralyticsyoloyolov8

andreagemelli/doc2graph

Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.

Jupyter Notebook13725Updated 5 months ago

deep-learningdocument-understandinggeometric-deep-learninggnnkey-information-extractionlayout-analysisnlppytorchtable-detection

xushengfeng/eSearch-OCR

基于paddleOCR的nodejs库

TypeScript12012Updated 1 month ago

layout-analysisnodejsocronnxpaddleocr

NormXU/Layout2Graph

An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"

Python8112Updated 2 years ago

gnn-frameworklayout-analysis

CycloneBoy/pdf_table

A Unified Toolkit for Deep Learning-Based Table Extraction

Python599Updated 1 year ago

aidocument-parsinglayout-analysisocrpdfpdf-to-htmltabletable-recognition

JPLeoRX/detectron2-publaynet

Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset

Python508Updated 2 years ago

artificial-intelligencecomputer-visiondeep-learningdetectron2document-analysisdocument-classificationdocument-layoutdocument-layout-analysisfaster-rcnninstance-segmentationlayout-analysismachine-learningneural-networkneural-networksobject-detectionpublaynetpythonpython3pytorch

MaitySubhajit/SelfDocSeg

[ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)

Python422Updated 2 years ago

computer-visiondocument-segmentationlayout-analysisself-supervised-learning

empressabyss/nordrassil

A keyboard layout that provides an elegant and balanced typing experience by its use of a thumb-alpha, emphasis on middle fingers, deprioritisation of pinkies, and arcane keys.

361Updated 4 months ago

arcanearcane-keydactylkeyboard-layoutkeyboardslayout-analysislayoutsqmkwarcraft

dell-research-harvard/HJDataset

A Large Dataset of Historical Japanese Documents with Complex Layouts

Jupyter Notebook364Updated 3 years ago

datasetdetectron2layout-analysispython

CaseDrive/publaynet-models

Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset

Python292Updated 2 years ago

BobLd/PdfPigMLNetBlockClassifier

Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.

C#286Updated 6 years ago

classifiercsharpdocument-layoutdocument-layout-analysislayout-analysislightgbmmachine-learningml-netpdfpdf-documentpdf-document-processorpdfpigpublaynet

jiangnanboy/layout_analysis4j

利用java-yolov8实现版面检测（Chinese layout detection），java-yolov8 is used to detect the layout of Chinese document images

Java2711Updated 2 years ago

cdlajavalayout-analysisyoloyolov8

MBAigner/PDFSegmenter

This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.

Python233Updated 5 years ago

annotationscluster-analysiscsvdetection-modeldocument-processinglayout-analysispage-segmentationpdfpythontable

aidayang/MinerU-OneClick

MinerU免安装部署一键启动整合包

183Updated 4 months ago

ai4sciencedocument-analysisextract-datalayout-analysismarkdownmineruocrparserpdfpdf-converterpdf-extractor-llmpdf-extractor-pretrainpdf-extractor-ragpdf-parserpdftojsonpdftomarkdownpython

qyhou/curated-document-layout-analysis

A curated list of resources on Document Layout Analysis

111Updated 7 months ago

document-aidocument-hierarchy-extractiondocument-intelligencedocument-layout-analysisdocument-structure-analysisdocument-structure-extractionlayout-analysispage-object-detection

FitLayout/FitLayout

An extensible web page segmentation and analysis framework.

Java104Updated 1 week ago

document-analysislayout-analysispage-segmentationwebweb-archiving

pleb631/pdfLayoutDet

pdfDet aims to simplify PDF layout detect tasks for users.

Python91Updated 1 year ago

document-analysislayout-analysislayout-detectionlayout-parserpdf-document-processor

calfa-co/rasam-dataset

Open Dataset for the Recognition and Analysis of Scripts in Arabic Maghrebi (ICDAR 2021, CHR 2024)

92Updated 1 year ago

arabicdatasethistorical-manuscriptshtrlayout-analysistext-recognition

yuvaraj-kannan/preocr

Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.

Python73Updated 3 weeks ago

computer-visiondocument-analysisdocument-classificationdocument-intelligencedocument-processingdocument-understandingfile-analysisimage-processinglayout-analysisocrocr-detectionopencvpdfpdf-analysispdf-parsingpreprocessingpythonpython-librarytext-detectiontext-extraction

Page 1 of 3