Top Repositories
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
Preprocessing pipeline notebooks and API supporting text extraction from SEC documents
A Python client for the Unstructured Platform API
Repositories
41Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
No description provided.
No description provided.
No description provided.
Documentation for all Unstructured products and libraries
Store Dockerfiles and Packer configs for images to use as a base to build upon
No description provided.
No description provided.
A JavaScript/Typescript client for the Unstructured Platform API
No description provided.
A Python client for the Unstructured Platform API
No description provided.
No description provided.
Preprocessing pipeline notebooks and API supporting text extraction from SEC documents
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Two sources (S3, ElasticSearch) to RAG DB pipeline.
No description provided.
Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
No description provided.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
No description provided.
Pipeline for converting PDFs to raw text with PaddleOCR
⚡ Building applications with LLMs through composability ⚡
Reference architecture that provides a set of guidelines and best practices for implementing a central AI API gateway to empower various line-of-business units in an organization to leverage Azure AI services
A Python wrapper for Google Tesseract
FedRAMP formatted model cards
No description provided.
Script to accompany the AWS blog post on unstructured data ETL with Unstructured Ingest library
Preprocessing pipeline notebooks and API supporting text extraction from receipts images
Pairing Technical Challenge