GitHunt

EPIC Data Lab

ucbepic

Effective Programming Interaction and Computation with Data

United States of America

Languages

Python78%JavaScript11%Jupyter Notebook11%

Top Repositories

Repositories

9
UC
ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

Python3.7k385Updated 1 hour ago
agentsdatadata-pipelinesdocument-analysisdocument-processingeltetlllmpythonsemantic-dataunstructured-dataunstructured-data-analysisworkflow
UC
ucbepic/DataAgentBench

No description provided.

Python20Updated 17 hours ago
UC
ucbepic/TWIX

TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents

Python21317Updated 2 weeks ago
document-data-extractiondocument-processing
UC
ucbepic/BARGAIN

Low-Cost LLM-Powered Data Processing with Theoretical Guarantees

Python356Updated 4 weeks ago
aidatadocument-processingllm
UC
ucbepic/task-cascades

No description provided.

Python20Updated 1 month ago
UC
ucbepic/data-agent-benchmark-study

Welcoming contributions from practitioners building AI/data systems - share your real-world problems, document where current tools fail, and help improve the benchmark taxonomy across the enterprise data categories.

Python112Updated 3 months ago
UC
ucbepic/pdf_parser

Parse PDFs using computer vision, layout analysis, and other state-of-the-art document intelligence techniques. WebApp implemented in Flask/Jinja2 with infer and train pipelines managed by FlorDB

JavaScript92Updated 6 months ago
UC
ucbepic/docetl-examples

Examples of docetl pipelines

Python21Updated 10 months ago
UC
ucbepic/ml_tutorial

Introduction to Flordb with PyTorch and TensorFlow

Jupyter Notebook00Updated 2 years ago

Gists

Recent Activity