"topic:spark" — Search

9,341 results for “topic:spark”

Apache Spark - A unified analytics engine for large-scale data processing

Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼

Jupyter Notebook39.0k7.8kUpdated just now

coursedata-engineeringdbtdockerfreekafkakestraspark

donnemartin/data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Python28.9k8.0kUpdated 4 hours ago

awsbig-datacaffedata-sciencedeep-learninghadoopkagglekerasmachine-learningmapreducematplotlibnumpypandaspythonscikit-learnscipysparktensorflowtheano

getredash/redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Python28.3k4.6kUpdated 2 hours ago

analyticsathenabibigquerybusiness-intelligencedashboarddatabrickshacktoberfestjavascriptmysqlpostgresqlpythonredashredshiftsparkspark-sqlvisualization

yeasy/docker_practice

最新Docker容器技术，从真实案例中学习最佳实践！| Learn and understand Docker&Container technologies, with real DevOps practice!

Go25.9k5.8kUpdated 3 hours ago

bookcloud-computingcontainerdevopsdockerkuberneteslinuxmesossparkswarmvirtualization

heibaiying/BigData-Notes

大数据入门指南 :star:

Java16.9k4.3kUpdated 12 hours ago

azkabanbig-databigdataflumehadoophbasehdfshivekafkamapreducephoenixscalasparksqoopstormyarnzookeeper

FavioVazquez/ds-cheatsheets

List of Data Science Cheatsheets to rule the world

16.2k4.1kUpdated 1 day ago

cheatsheetdatasciencejupyterprogrammingpythonrspark

GaiZhenbiao/ChuanhuChatGPT

GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.

Python15.4k2.3kUpdated 8 hours ago

chatbotchatglmchatgpt-apiclaudedalle3erniegeminigemmainspuraillamamidjourneyminimaxmossollamaqwensparkstablelm

apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

Java15.1k3.7kUpdated 4 hours ago

agentaibigquerydatabasedbtdelta-lakeelthudiiceberglakehouseolappaimonquery-enginereal-timeredshiftsnowflakesparksql

zhisheng17/flink-learning

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》

Java15.1k4.0kUpdated 15 hours ago

clickhouseelasticsearchflinkhbaseinfluxdbkafkalokimysqlopentsdbrabbitmqredisrocketmqsparkstream-processingstreaming

aalansehaiyang/technology-talk

【大厂面试专栏】一份Java程序员需要的技术指南，这里有面试题、系统架构、职场锦囊、主流中间件等，让你成为更牛的自己！

14.7k3.8kUpdated 16 hours ago

dubboes6githbasejavakafkamycatsparkspringspringboot

horovod/horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Python14.7k2.3kUpdated 2 days ago

baidudeep-learningdeeplearningkerasmachine-learningmachinelearningmpimxnetpytorchraysparktensorflowuber

deeplearning4j/deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...

Java14.2k3.8kUpdated 14 hours ago

artificial-intelligenceclojuredeeplearningdeeplearning4jdl4jgpuhadoopintellijjavalinear-algebramatrix-libraryneural-netspythonscalaspark

wangzhiwubigdata/God-Of-BigData

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

10.4k3.3kUpdated 4 hours ago

azkabanbigdataflinkflumehadoophbasehdfshivekafkasparkzookeeper

tobymao/sqlglot

Python SQL Parser and Transpiler

Python9.0k1.1kUpdated just now

bigqueryclickhousedatabricksduckdbhivemysqloptimizerparserpostgresprestopythonredshiftsnowflakesparksqlsqlitesqlparsertranspilertrinotsql

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

Python8.7k911Updated 1 hour ago

artificial-intelligencedatadata-engineeringdata-integrationdata-pipelinesdata-sciencedbteltetlmachine-learningorchestrationpipelinepipelinespythonreverse-etlsparksqltransformation

delta-io/delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala8.6k2.0kUpdated just now

acidanalyticsbig-datadelta-lakespark

h2oai/h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Jupyter Notebook7.5k2.0kUpdated 5 hours ago

automlbig-datadata-sciencedeep-learningdistributedensemble-learninggbmgpuh2oh2o-automlhadoopjavamachine-learningnaive-bayesopensourcepcapythonrrandom-forestspark

Alluxio/alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Java7.2k3.0kUpdated 8 hours ago

alluxiodata-analysisdata-orchestrationhadoopmemory-speedprestosparktensorflowvirtual-distributed-filesystem

Angel-ML/angel

A Flexible and Powerful Parameter Server for large-scale machine learning

Java6.8k1.6kUpdated 4 days ago

high-dimensionalmachine-learningmodelonline-learningparameter-serverscalasparkspark-streaming

apache/zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Java6.6k2.8kUpdated 2 hours ago

big-datadatabaseflinkjavajavascriptnosqlscalasparkzeppelin

donnemartin/dev-setup

macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.

Python6.3k1.1kUpdated 8 hours ago

android-developmentawsbashclicloudelasticsearchgititerm2linuxmacmacosmongodbmysqlnodejspostgresqlpythonredissparksublime-textvim

microsoft/SynapseML

Simple and Distributed Machine Learning

Scala5.2k855Updated 1 day ago

aiapache-sparkazurebig-datacognitive-servicesdata-sciencedatabricksdeep-learninghttplightgbmmachine-learningmicrosoftmlmodel-deploymentonnxopencvpysparkscalasparksynapse

tencentmusic/cube-studio

cube studio开源云原生一站式机器学习/深度学习/大模型AI平台，mlops算法链路全流程，算力租赁平台，notebook在线开发，拖拉拽任务流pipeline编排，多机多卡分布式训练，超参搜索，推理服务VGPU虚拟化，边缘计算，标注平台自动化标注，deepseek等大模型sft微调/奖励模型/强化学习训练，vllm/ollama/mindie大模型多机推理，私有知识库，AI模型市场，支持国产cpu/gpu/npu 昇腾生态，支持RDMA，支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/ray/volcano等分布式

Python4.9k861Updated 3 hours ago