GitHunt

Unstructured

Unstructured-IO

Languages

Python46%Jupyter Notebook27%HTML12%TypeScript8%MDX4%Shell4%

Top Repositories

Repositories

41
UN
Unstructured-IO/unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

HTML14.2k1.2kUpdated just now
data-pipelinesdeep-learningdocument-image-analysisdocument-image-processingdocument-parserdocument-parsingdocxdonutinformation-retrievallangchainllmmachine-learningmlnatural-language-processingnlpocrpdfpdf-to-jsonpdf-to-textpreprocessing
UN
Unstructured-IO/unstructured-ingest

No description provided.

HTML10557Updated 2 hours ago
UN
Unstructured-IO/unstructured-inference

No description provided.

Python20675Updated 11 hours ago
UN
Unstructured-IO/unstructured-api

No description provided.

Python885187Updated 2 days ago
UN
Unstructured-IO/docs

Documentation for all Unstructured products and libraries

MDX725Updated 3 days ago
UN
Unstructured-IO/base-images

Store Dockerfiles and Packer configs for images to use as a base to build upon

Shell63Updated 3 days ago
UN
Unstructured-IO/UNS-MCP

No description provided.

Jupyter Notebook4222Updated 4 days ago
UN
Unstructured-IO/unstructured-platform-plugins

No description provided.

Python63Updated 6 days ago
UN
Unstructured-IO/unstructured-js-client

A JavaScript/Typescript client for the Unstructured Platform API

TypeScript5815Updated 1 week ago
UN
Unstructured-IO/unstructured-eval-metrics

No description provided.

Python63Updated 2 weeks ago
UN
Unstructured-IO/unstructured-python-client

A Python client for the Unstructured Platform API

Python11420Updated 1 month ago
UN
Unstructured-IO/notebooks

No description provided.

Jupyter Notebook20Updated 1 month ago
UN
Unstructured-IO/unstructured-mlk-archive-public

No description provided.

HTML81Updated 1 month ago
UN
Unstructured-IO/pipeline-sec-filingsArchived

Preprocessing pipeline notebooks and API supporting text extraction from SEC documents

Jupyter Notebook14835Updated 1 month ago
UN
Unstructured-IO/unstructured.PaddleOCRFork

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Python416Updated 3 months ago
UN
Unstructured-IO/rag-over-hybrid-data-sources

Two sources (S3, ElasticSearch) to RAG DB pipeline.

Jupyter Notebook11Updated 4 months ago
UN
Unstructured-IO/rag-over-evolving-enterprise-knowledge

No description provided.

Jupyter Notebook00Updated 5 months ago
UN
Unstructured-IO/danswerFork

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

Python112Updated 5 months ago
UN
Unstructured-IO/.github

No description provided.

02Updated 6 months ago
UN
Unstructured-IO/communityArchived

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

297Updated 6 months ago
communitydata-pipelinedeep-learningdocument-aidocument-parsingmachine-learningnlp-parsingocr-pythonopen-sourcepreprocessing-data
UN
Unstructured-IO/irs-manual-demo

No description provided.

Python156Updated 9 months ago
UN
Unstructured-IO/pipeline-paddleocr

Pipeline for converting PDFs to raw text with PaddleOCR

Jupyter Notebook237Updated 10 months ago
UN
Unstructured-IO/langchainFork

⚡ Building applications with LLMs through composability ⚡

Python81Updated 10 months ago
UN
Unstructured-IO/azure-ai-hub-gateway-solution-acceleratorFork

Reference architecture that provides a set of guidelines and best practices for implementing a central AI API gateway to empower various line-of-business units in an organization to leverage Azure AI services

11Updated 11 months ago
UN
Unstructured-IO/unstructured.pytesseractFork

A Python wrapper for Google Tesseract

Python41Updated 1 year ago
UN
Unstructured-IO/model-cards

FedRAMP formatted model cards

10Updated 1 year ago
UN
Unstructured-IO/chat-isw-reportsFork

No description provided.

Python61Updated 1 year ago
UN
Unstructured-IO/aws-blog-post-example

Script to accompany the AWS blog post on unstructured data ETL with Unstructured Ingest library

Python00Updated 1 year ago
UN
Unstructured-IO/pipeline-receipts

Preprocessing pipeline notebooks and API supporting text extraction from receipts images

Jupyter Notebook23Updated 1 year ago
UN
Unstructured-IO/pairing-technical-challenge

Pairing Technical Challenge

TypeScript00Updated 1 year ago

Gists

Recent Activity