886 results for “topic:data-transformation”
A high-performance observability data pipeline.
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Logical Replication extension for PostgreSQL 17, 16, 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
A block-based API for NSValueTransformer, with a growing collection of useful examples.
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Official Repository of "LLM × DATA" Survey Paper
Advanced and Fast Data Transformation in R
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK.
:lipstick: Durable and asynchronous data imports for consuming data at scale and publishing testable SDKs.
Like awk, but with SQL and table joins
An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R
📄 Concise selector to extract JSON from HTML.
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
A curated list of Clojure resources for dealing with domain-specific languages.
A simple Spark-powered ETL framework that just works 🍺
Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Data transformation and utility functions for R
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
A visual data pipeline builder with various backends
Wrangler Transform: A DMD system for transforming Big Data
Python package implementing ML feature engineering and pre-processing for polars or pandas dataframes.
A schema-aware Scala library for data transformation
Publisher is the open-source semantic model server for the Malloy data language. It lets you define semantic models once — and use them everywhere.
Reference Architectures for Datalakes on AWS
Data transformation toolkit
breadroll 🥟 is a simple lightweight library for data processing operations written in Typescript and powered by Bun.
A modular ecosystem under this. namespace.