7,783 results for “topic:data-engineering”
Apache Superset is a Data Visualization and Data Exploration Platform
Learn how to develop, deploy and iterate on production-grade ML applications.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Turns Data and AI algorithms into production-ready web applications in no time.
Workflow Engine for Kubernetes
An orchestration platform for the development, production, and observation of data assets.
The Data Engineering Cookbook
Roadmap to becoming a data engineer in 2021
Always know what to expect from your data.
🐚 Python-powered shell. Full-featured, cross-platform and AI-friendly.
Event streaming platform for agents, apps, and analytics. Continuously ingest, transform, and serve event data in real time, at scale.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Fancy stream processing made operationally mundane
Open Source Feature Flags, Experimentation, and Product Analytics
The Open Source Feature Store for AI/ML
Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if you like it!
Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.
Business intelligence as code: build fast, interactive data visualizations in SQL and markdown
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
lakeFS - Data version control for your data lake | Git for data
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Privacy and Security focused Segment-alternative, in Golang and React
SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.
Data Science Roadmap from A to Z
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
A list of useful resources to learn Data Engineering from scratch