57 results for “topic:data-orchestration”
Event Driven Orchestration & Scheduling Platform for Mission Critical Applications
Alluxio, data orchestration for analytics and machine learning in the cloud
cloud-native distributed storage
An open source, standard data file format for graph data storage and retrieval.
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
A collection of AI skills for working with Dagster
Build and ship production ML pipelines faster: a pipeline library with an optional self-hosted visual layer for modular, reproducible workflows, local testing, and experiment tracking.
Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services
Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis
Data-aware orchestration with dagster, dbt, and airbyte
Asset-first data orchestration for Elixir/BEAM. Dagster-inspired with OTP fault tolerance, LiveView dashboard, lineage tracking, checkpoint gates, and distributed execution via Oban.
Powerful, developer-experience centric, blazingly fast and extensible job scheduler and workflow orchestration platform
This repo contains a dataset, exercises, and sample code for an end-to-end SAP BTP data-to-value bootcamp covering SAP HANA Cloud, SAP Data Warehouse Cloud, SAP Data Intelligence Cloud, and SAP Analytics Cloud.
A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran
Get started with Dagster ASAP
CI/CD repository template to automate deployments of your production flows
An operator for managing Alluxio system on Kubernetes cluster
A simple pipeline infrastructure with ETL pipeline contained in a Docker environment on Apache Airflow for orchestration and Postgres for data warehousing
Bring Infrastructure as Code best practices to your data workflows with Kestra and Terraform
Introduction to using and scaling dagster
A TCG-style web game for mastering Apache Airflow SQL check operators. Battle the DQ Monster, deploy Astro Observe, and learn about data quality patterns through a simple card game.
Develop a real-time data ingestion pipeline using Kafka and Spark. Collect minute-level stock data from Yahoo Finance, ingest it into Kafka, and process it with Spark Streaming, storing the results in Cassandra. Orchestrated the workflow using Airflow deployed on Docker.
MCP-native knowledge graph orchestrator that unifies data silos with GraphRAG, dynamic connectors, and local AI.
EHR pipeline that simulates MIMIC-IV patient data streams, performs advanced feature engineering and clinical severity scoring using machine learning (Random Forest Classifier), and prepares structured outputs for scalable downstream analytics
Code, scripts, and resources for the Data Engineering Fundamentals Course Webinar, covering Python, data pipelines, Apache Airflow, and more.
Data orchestration repo with Docker deployment
Build an ELT pipeline with dagster and dbt to schedule loading HDB resale transactions in Singapore into Google BigQuery data warehouse, then create Power BI dashboard to enhance insight exploration.
Prefect - Data orchestration tool practice & learning
A real-time BIM-to-LCA data orchestrator that calculates embodied carbon and ESG metrics directly within Autodesk Revit.
Cloud-agnostic Airflow MLOps sandbox combining parallelized data pipelines with ML engineering tooling (MinIO, MLflow, Qdrant, RAPIDS) for end-to-end experimentation and observability.