211 results for “topic:rdd”
C# and F# language binding and extensions to Apache Spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Ruby wrapper for Apache Spark
Spark RDD with Lucene's query and entity linkage capabilities
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
A framework for Spatio-Temporal Data Analytics on Spark
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
Code/Notes for the Data Engineering Zoomcamp by DataTalksClub
InfluxDB connector to Apache Spark on top of Chronicler
Causal Inference Using Quasi-Experimental Methods
SQLRDD for Harbour and Harbour++
Spark access to Common Information Model (CIM) files
pyspark dataframe made easy
Guide to Clojure REPL Driven Development with Emacs Doom
Sentiment Analysis and Data Visualization
:pencil: Preview your Markdown locally as it would appear on GitHub, with live updating
openmrs - mysql - debezium - kafka - spark - scala
rddapp: Regression Discontinuity Design Application
A bunch of low-level basic methods for data processing and monitoring with Scala Spark
One Ring is a framework to unify, unite and bind Apache Spark-based computing modules, and run them in parametrized chains
Imitate and rewrite Spark's RDD (core)
Big Data Recipes
Apache Spark Basics - Java Examples
PySpark es una biblioteca de procesamiento de datos distribuidos en Python que permite procesar grandes volúmenes de datos en clústeres utilizando el framework Apache Spark, ofreciendo un alto rendimiento y un conjunto de herramientas integradas para el análisis y manejo de datos a gran escala.
Apache Spark machine learning project using pyspark
A library having Java and Scala examples for Spark 2.x
sparkhit - analyzing large scale genomic data on the cloud
Package provides java implementation of big-data genetic programming for Apache Spark
Reading, writing and deleting from HBase with Spark RDD