654 results for “topic:data-integration”
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Turns Data and AI algorithms into production-ready web applications in no time.
An orchestration platform for the development, production, and observation of data assets.
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Flink CDC is a streaming data integration tool
Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.
Upserts, Deletes And Incremental Processing on Big Data.
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
Privacy and Security focused Segment-alternative, in Golang and React
A data integration framework
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Hop Orchestration Platform
Insightful Tutorials and Papers about Knowledge Graphs
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
NicheNet: predict active ligand-target links between interacting cells
Fast, sensitive and accurate integration of single-cell data with Harmony
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
汇总Apache Hudi相关资料
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Continuously updated paper list on advancements in Data Agents. Companion repo to our paper "A Survey of Data Agents: Emerging Paradigm or Overstated Hype?"
Reference mapping for single-cell genomics