"topic:data-ingestion" — Search

339 results for “topic:data-ingestion”

SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.

apachebatchcdcchange-data-capturedata-ingestiondata-integrationeltembeddingshigh-performancellmmultimodalofflinereal-timestreaming

bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

Python3.4k120Updated 3 hours ago

bigquerycopy-databasedata-ingestiondata-integrationdata-pipelineduckdbingestion-pipelinemssqlpostgresqlsnowflake

apache/paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

Java3.2k1.3kUpdated 3 hours ago

big-datadata-ingestionflinkpaimonreal-time-analyticssparkstreaming-datalaketable-store

dashbitco/broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

Elixir2.6k173Updated 2 hours ago

broadwayconcurrentdata-ingestiondata-processingelixirgenstage

pravega/pravega

Pravega - Streaming as a new software defined storage primitive

Java2.0k405Updated 2 weeks ago

data-ingestiondistributed-storagereal-time-datastreamingstreaming-data

bruin-data/bruin

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

Go1.5k64Updated 7 hours ago

analyticsbigquerydata-analysisdata-ingestiondata-modelingdata-pipelinesdata-platformdata-transformationpythonsnowflakesql

CrunchyData/pg_parquet

Copy to/from Parquet in S3, Azure Blob Storage, Google Cloud Storage, http(s) stores, local files or standard inout stream from within PostgreSQL

Rust66233Updated 1 week ago

azure-storagecolumnardata-ingestiondata-migrationgoogle-cloud-storagehttpsparquetpostgresqls3

unbody-io/unbody

The Supabase of AI era. A modular, open-source backend for building AI-native software — designed for knowledge, not static data.

TypeScript51546Updated 1 day ago

agentic-aiai-nativebackendchatbotdata-enhancementdata-ingestiondeveloper-toolsetl-pipelinegenerative-aiknowledge-basellmragsupabase-alternativevector-database

orbitalapi/orbital

Orbital automates integration between data sources (APIs, Databases, Queues and Functions). BFF's, API Composition and ETL pipelines that adapt as your specs change.

TypeScript35012Updated 1 day ago

api-gatewayapi-integrationapi-managementbffbff-apidata-engineeringdata-ingestiondata-piplinesetlintegrationjavakotlinmicroservicesrest-apisemantic-integrationtaxiqltypescript

cuebook/cuelake

Use SQL to build ELT pipelines on a data lakehouse.

JavaScript28928Updated 1 week ago

apache-icebergapache-sparkdata-engineeringdata-ingestiondata-integrationdata-lakedata-pipelinedata-transferdatalakedeltaeltetlincremental-updateslakehousepipelinesspark-sqlsqlupsertzeppelin-notebook

apache/paimon-rust

Apache Paimon Rust The rust implementation of Apache Paimon.

Rust14846Updated 1 day ago

big-datadata-ingestionpaimonreal-time-analyticsruststreaming-datalaketable-store

thedataengineeringbook/thedataengineeringbook

The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย

JavaScript11542Updated 1 week ago

bookdatadata-engineerdata-engineeringdata-infrastructuredata-ingestiondata-integrationdata-pipelinehacktoberfest

jgperrin/net.jgp.labs.spark

Apache Spark examples exclusively in Java

Java10351Updated 3 months ago

data-ingestiondataframeingestionjavasparkudf

paloaltodatabases/sequor

Build complete API integrations with YAML and SQL. Rapid development without vendor lock-in and per-row costs.

Python882Updated 1 month ago

api-integrationapp-integrationdata-engineeringdata-ingestiondata-integrationdata-piplinesetlipaasreverse-etlsequorsqlworkflow-automation

XavientInformationSystems/Data-Ingestion-Platform

No description provided.

Java5042Updated 7 months ago

apexbatch-processingdata-ingestiondipflinksamzasparkstorm

saskinosie/weaviate-claude-skills

Claude Skills for connecting Claude.ai to local Weaviate vector databases - manage collections, ingest data, and query with RAG

415Updated 3 weeks ago

aiclaudeclaude-aiclaude-skillsdata-ingestiondockerdocker-composedocument-searchembeddingsknowledge-basellmpythonragsemantic-searchvector-databaseweaviate

Dynatrace/OneAgent-SDK-for-Java

Enables custom tracing of Java applications in Dynatrace

Java4013Updated 4 months ago

agentapmdata-ingestiondev-programdynatraceoneagentsdksdk-java

aws-samples/amazon-kinesis-data-processor-aws-fargate

Sample code for the AWS Big Data Blog Post Building a scalable streaming data processor with Amazon Kinesis Data Streams on AWS Fargate

Python399Updated 3 months ago

amazon-kclamazon-kinesiscontainersdata-ingestiondata-processorkinesis-data-streamsscalable-data-stream

fremantle-industries/history

Download and warehouse historical trading data

Elixir345Updated 1 week ago

cryptocurrencydata-ingestiondata-sciencedata-visualizationdata-warehouseelixirtradingtrading-algorithms

Dynatrace/openkit-java

OpenKit Java Reference Implementation

Java3434Updated 1 hour ago

data-ingestiondev-programdynatracesdk

tuannd89/elasticsearch-full-text-search

No description provided.

326Updated 7 months ago

data-ingestionelasticsearchlucenesearch-engine

linkedin/data-integration-library

The Data Integration Library project provides a library of generic components based on a multi-stage architecture for data ingress and egress.

Java3218Updated 6 months ago

data-egressdata-ingestdata-ingestiondata-integrationgobblin

Dynatrace/OneAgent-SDK-for-Python

Enables custom tracing of Python applications in Dynatrace

Python2810Updated 2 months ago

agentapmdata-ingestiondev-programdynatraceoneagentsdksdk-python

oracle-samples/oracle-aidp-samples

Oracle AI Data Platform Workbench Samples

Jupyter Notebook2411Updated 3 days ago

ai-agentai-agentsai-assistantai-datadata-engineeringdata-ingestiondata-integrationdata-science

Dynatrace/OneAgent-SDK

Describes technical concepts of Dynatrace OneAgent SDK

Java246Updated 4 months ago

agentapmdata-ingestiondev-programdynatraceoneagentsdk

Dynatrace/OneAgent-SDK-for-dotnet

Enables custom tracing of .NET applications in Dynatrace

C#228Updated 1 month ago

agentapmdata-ingestiondev-programdynatraceoneagentsdksdk-dotnet

apache/seatunnel-tools

SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.

Java223Updated 13 hours ago

apachebatchcdcchange-data-capturedata-ingestiondata-integrationeltembeddingshigh-performancellmmultimodalofflinereal-timestreaming

Dynatrace/OneAgent-SDK-for-NodeJs

Enables custom tracing of Node.js applications in Dynatrace

TypeScript215Updated 5 days ago

agentapmdata-ingestiondev-programdynatraceoneagentsdksdk-nodejs

sethupavan12/Markdownify

Convert documents, images to high-quality Markdown using Vision LLMs. Built for RAG ingestion pipelines.

Python211Updated 3 days ago

data-ingestionmarkdownocrocr-pythonragvlms

zappzerapp/laravel-ingest

A robust, configuration-driven ETL and data import framework for Laravel. Handles CSV/Excel streaming, queues, validation, and relationships.

PHP212Updated 1 week ago

csvdata-ingestionetlexcelimportimporterlaravelqueuestreaming

Page 1 of 12