EPIC Data Lab
ucbepic
Effective Programming Interaction and Computation with Data
Languages
Top Repositories
A system for agentic LLM-powered data processing and ETL
TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents
Low-Cost LLM-Powered Data Processing with Theoretical Guarantees
Welcoming contributions from practitioners building AI/data systems - share your real-world problems, document where current tools fail, and help improve the benchmark taxonomy across the enterprise data categories.
Parse PDFs using computer vision, layout analysis, and other state-of-the-art document intelligence techniques. WebApp implemented in Flask/Jinja2 with infer and train pipelines managed by FlorDB
Repositories
9A system for agentic LLM-powered data processing and ETL
No description provided.
TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents
Low-Cost LLM-Powered Data Processing with Theoretical Guarantees
No description provided.
Welcoming contributions from practitioners building AI/data systems - share your real-world problems, document where current tools fail, and help improve the benchmark taxonomy across the enterprise data categories.
Parse PDFs using computer vision, layout analysis, and other state-of-the-art document intelligence techniques. WebApp implemented in Flask/Jinja2 with infer and train pipelines managed by FlorDB
Examples of docetl pipelines
Introduction to Flordb with PyTorch and TensorFlow