116 results for “topic:document-intelligence”
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 76+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
ContextGem: Effortless LLM extraction from documents
A curated list of resources for Document Understanding (DU) topic
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
AI-in-a-Box leverages the expertise of Microsoft across the globe to develop and provide AI and ML solutions to the technical community. Our intent is to present a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction.
Local-first AI-powered document intelligence platform for investigative journalism
A collection of samples demonstrating techniques for processing documents with Azure AI including AI Foundry, OpenAI, Document Intelligence, etc.
ReadingBank: A Benchmark Dataset for Reading Order Detection
The Doc Intelligence in-a-Box project leverages Azure AI Document Intelligence to extract data from PDF forms and store the data in a Azure Cosmos DB. This solution, part of the AI-in-a-Box framework by Microsoft Customer Engineers and Architects, ensures quality, efficiency, and rapid deployment of AI and ML solutions across various industries.
Knwler is a lightweight, single-file Python tool that extracts structured knowledge graphs from documents using AI. Feed it a PDF or text file and receive a richly connected network of entities, relationships, and topics — complete with an interactive HTML report and exports ready for your favorite graph analytics platform.
Course Website
A curated list of resources on Table Structure Recognition
This sample demonstrates how to use Document Intelligence's Layout model to convert a PDF document, such as invoices, into Markdown, then use GPT-3.5 Turbo to extract structured JSON data using the Azure OpenAI Service.
An explainable AI system that combines Graph Intelligence, Vector Search, and Retrieval-Augmented Generation (RAG) to deliver grounded answers and transparent reasoning paths. Includes a FastAPI backend, Streamlit UI, FAISS vector index, and an in-memory knowledge graph for hybrid retrieval and recommendations.
BoundaryNet - A Semi-Automatic Layout Annotation Tool
A curated list of resources on Document Layout Analysis
An experiment to provide the capabilities of Azure AI Document Intelligence Studio template training for feedback loop
The missing cognitive primitive for AI agents. Decompose any text into classified semantic units — authority, risk, attention, entities. No LLM. Deterministic.
Guidance on deploying a generative AI document analysis with Amazon Bedrock AgentCore. Auto-classifies, enhances, and aggregates multi-type documents using Gestalt-informed vision prompts. Custom analyzer creation wizard. Scripted CDK deployment. Gradio frontend included.
Enklayve is a free, local, private, and secure personal AI desktop application, built with Tauri and llama.cpp/Qwen, that provides robust document intelligence capabilities using fast embeddings.
Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.
🤖 Laravel AI Docs — Intelligent PDF & OCR Processing for Laravel. Convert PDF to JSON, Extract Tables, Ask PDF with AI, Image & Audio to Text (GPT, Claude, Gemini)
Advanced multimodal RAG system for querying PDF documents with text, images, and tables using vector embeddings, semantic chunking, and LLMs via Groq API
Langchain document loader for Kreuzberg
Using Azure Document Intelligence and Azure OpenAI services to automatically extract data from invoices.
A live, evolving collection of open-source AI agents and real examples showing how businesses can use AI to automate work, save time, and explore new ideas.
CV Shortlist is an AI-powered web portal designed to help professional recruiters and HR departments streamline their hiring process.
A next-gen AI document extraction system capable of parsing text, tables, and layouts from native PDFs, scanned images, and various document formats with high precision. Built with Docling & Streamlit.