926 results for “topic:spark-sql”
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
A Scala kernel for Jupyter
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
电商用户行为分析大数据平台
🐍 Quick reference guide to common patterns & functions in PySpark.
Qubole Sparklens tool for performance tuning Apache Spark
The Internals of Spark SQL
New Generation Opensource Data Stack Demo
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Use SQL to build ELT pipelines on a data lakehouse.
End-to-end Data Lakehouse project built on Databricks, following the Medallion Architecture (Bronze, Silver, Gold). Covers real-world data engineering and analytics workflows using Spark, PySpark, SQL, Delta Lake, and Unity Catalog. Designed for learning, portfolio building, and job interviews.
Apache Spark™ and Scala Workshops
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype
An encrypted data analytics platform
Spark Structured Streaming / Kafka / Cassandra / Elastic
Spark SQL 实现 ItemCF,UserCF,Swing,推荐系统,推荐算法,协同过滤
Apache Spark 3 - Structured Streaming Course Material
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Spark Connector to read and write with Pulsar
Apache Spark Connect Client for Rust
电影推荐系统、电影推荐引擎、使用Spark完成的电影推荐引擎
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
A library for Spark DataFrame using MinIO Select API
Apache Spark Course Material
A cloud-native data pipeline and visualization project analyzing Formula 1 racing data using Azure, Databricks, Delta Lake, Tableau, and Python for insightful EDA and interactive dashboards.