"topic:spark-sql" — Search

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

C#30991Updated 2 months ago

apache-sparkazurebig-datacosmosdbdockereventhubhdinsightiotiothubkafkakafka-streamsnodejsreactservicefabricsparkspark-sqlspark-streamingsparksqlstreamingstreaming-data

cuebook/cuelake

Use SQL to build ELT pipelines on a data lakehouse.

JavaScript28928Updated 4 days ago

apache-icebergapache-sparkdata-engineeringdata-ingestiondata-integrationdata-lakedata-pipelinedata-transferdatalakedeltaeltetlincremental-updateslakehousepipelinesspark-sqlsqlupsertzeppelin-notebook

DataWithBaraa/databricks_bootcamp_2026

End-to-end Data Lakehouse project built on Databricks, following the Medallion Architecture (Bronze, Silver, Gold). Covers real-world data engineering and analytics workflows using Spark, PySpark, SQL, Delta Lake, and Unity Catalog. Designed for learning, portfolio building, and job interviews.

Jupyter Notebook284133Updated 22 hours ago

aiapache-sparkdata-analyticsdata-engineeringdata-engineering-projectdata-lakehousedata-pipelinedatabricksetllakehousemedallion-architectureprotfolio-projectpysparkpythonsparkspark-sqlunity-catalog

jaceklaskowski/spark-workshop

Apache Spark™ and Scala Workshops

HTML265149Updated 1 week ago

apache-sparksparkspark-mllibspark-sqlspark-structured-streamingspark-workshopsworkshop

Qbeast-io/qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

Scala23524Updated 3 months ago

big-datadata-lakehousedatasourcesamplingscalasparkspark-sql

Chabane/bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

TypeScript21074Updated 3 weeks ago

angularapache-flinkapache-sparkavrobig-datadockergraphqlhadoophbasekafkakopsmachine-learningmongodbnodejsparquetpythonscalaspark-sqlspark-streamingtwitter-api

bluishglc/bdp

A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype

Java198145Updated 2 months ago

bigdatademokafkamiddle-endmiddle-officeoozieprototypequickstartredissparkspark-demospark-examplesspark-sqlspark-streamingspark-streaming-examplessparksqlsqoopsqoop-import

mc2-project/opaque-sql

An encrypted data analytics platform

Scala18772Updated 1 week ago

analyticsenclavemachine-learningprivacysecuritysparkspark-sql

polomarcus/Spark-Structured-Streaming-Examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

Scala18675Updated 1 week ago

cassandrakafkasparkspark-sqlstructured-streaming

xiaogp/recsys_spark

Spark SQL 实现 ItemCF，UserCF，Swing，推荐系统，推荐算法，协同过滤

Scala14147Updated 2 months ago

collaborative-filteringrecommender-systemspark-sql

LearningJournal/Spark-Streaming-In-Python

Apache Spark 3 - Structured Streaming Course Material

Python126164Updated 1 week ago

apache-sparkbig-databigdatadata-lakepysparkpythonspark-sqlspark-streaming

izhangzhihao/Real-time-Data-Warehouse

Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi

Dockerfile11944Updated 3 weeks ago

cdcchange-data-capturedata-warehousedata-warehousingdatalakedebeziumdeltadelta-lakedeltalakeelasticsearchflinkflink-sqlhoodiehudiicebergkafkareal-time-data-warehousesparkspark-sqlsql

streamnative/pulsar-spark

Spark Connector to read and write with Pulsar

Scala11851Updated 2 weeks ago

apache-pulsarapache-sparkbatch-processingdata-processingdata-scienceflinksparkspark-sqlstream-processingstructured-streaming

sjrusso8/spark-connect-rs

Apache Spark Connect Client for Rust

Rust11723Updated 3 weeks ago

grpc-clientsparkspark-connectspark-sql

wangj1106/recommendMoteur

电影推荐系统、电影推荐引擎、使用Spark完成的电影推荐引擎

Scala11538Updated 1 month ago

alsflumekafkamoviesrecommendationrecommendation-enginerecommender-systemsparkspark-sqlspark-streaming

martandsingh/ApacheSpark

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

Python10468Updated 4 weeks ago

apachesparkdata-analysisdata-engineeringdatabasedatabricksdatalakedeltalakeetletl-pipelinehadoophivepysparksparkspark-sqlspark-streamingsqltimetravel

minio/spark-select

A library for Spark DataFrame using MinIO Select API

Scala10119Updated 1 week ago

amazon-s3bigdataminioparquet-filespysparksbtselectsparkspark-sql

LearningJournal/SparkProgrammingInScala

Apache Spark Course Material

Scala96160Updated 1 week ago

apache-sparkbig-databigdatadata-lakedatalakescalasparkspark-scalaspark-sql

Smars-Bin-Hu/azure-cloud-datapipeline-EDA

A cloud-native data pipeline and visualization project analyzing Formula 1 racing data using Azure, Databricks, Delta Lake, Tableau, and Python for insightful EDA and interactive dashboards.

Jupyter Notebook949Updated 5 days ago

azure-data-factoryazure-data-lake-storage-gen2azure-databricksbatch-and-stream-unificationbi-dashboardcloud-native-etldata-visualizationdelta-lakeexploratory-data-analysislakehousematplotlibmedallion-architecturemultivariate-analysispysparkseabornspark-sqlstorage-compute-separationtableauunity-catalog

Page 1 of 31