"topic:apache-spark-cluster" — Search

11 results for “topic:apache-spark-cluster”

nchammas/flintrock

A command-line tool for launching Apache Spark clusters.

Python651119Updated 2 months ago

apache-sparkapache-spark-clusterec2orchestrationspark-ec2

PiercingDan/spark-Jupyter-AWS

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Jupyter Notebook26118Updated 3 weeks ago

apache-sparkapache-spark-clusterawsaws-ec2aws-s3ebs-volumesec2ec2-instancejupyterjupyter-notebooksparkspark-clusters

aamargajbhiye/big-data-projects

This project has customization likes custom data sources, plugins written for the distributed systems like Apache Spark, Apache Ignite etc

Java3422Updated 11 months ago

apache-igniteapache-sparkapache-spark-clusterigfsspark-java

josemarialuna/ExternalValidity

This package contains the code for calculating external clustering validity indices in Spark. The package includes Chi Index among others.

Scala91Updated 2 years ago

apache-sparkapache-spark-clusterclustering-evaluationclustering-validationcviscalaspark-mlspark-mllib

blnkoff/docker-spark-cluster

Apache Spark standalone cluster with JupyterLab on Docker. Local development and multi-worker setup ready.

Jupyter Notebook10Updated 2 months ago

apache-sparkapache-spark-clusterdockerdocker-composedocker-imagespark-cluster

ayush-adh/Distributed_Analytics_of_US_Residential_Zoning

This is a project that aims to do distributed analytics using clusters using a spatial dataset. Our goal with this project was to analyze the impact of single family rresidential zoning in the US and correlate it to quality of life measures in an effort to dissuade a segregation of zoning types and promote inclusivity.

Jupyter Notebook10Updated 10 months ago

apache-sparkapache-spark-clusterdistributed-computingdistributed-systemshdfsmachine-learningspark-sql

bjam24/agh-large-scale-data-analysis

This respository contains projects made for the Large Scale Data Analysis course at the AGH UST in 2024.

HTML10Updated 6 days ago

aghapache-sparkapache-spark-clustergraphframesrddspark-streamingsqlstructured-data

akaltsikis/Markov_Cluster_Algorithm

Implementations of Markov Clustrer Algorithm (MCL) and Regularized Markov Cluster Algorithm (R-MCL) in Apache Spark

Scala01Updated 8 years ago

apache-sparkapache-spark-clusterbig-datacluster-computingclustering-algorithmdistributed-computingmarkov-cluster-algorithmmclsparksparse-matrices

erjan/data_engineering_japan_visas_pyspark

data enginerring project - visualize visa numbers by country, time issued from japan

HTML00Updated 2 years ago

apache-spark-clusteraws-ec2data-engineeringec2-instanceprojectpyspark

SayamAlt/Bank-Customer-Churn-Prediction-using-PySpark

Successfully established a machine learning model using PySpark which can accurately classify whether a bank customer will churn or not up to an accuracy of more than 86% on the test set.

Jupyter Notebook00Updated 1 year ago

apache-sparkapache-spark-clusterazure-databricksbinary-classificationclassificationcross-validationdata-exploration-and-preprocessingdata-processing-pipelinesdata-visualizationfeature-engineeringfeature-transformationhyperparameter-tuningmachine-learningmodel-training-and-evaluationpysparkspark-ml

savvydatainsights/spark

Apache Spark cluster lab.

Java00Updated 2 years ago

ansibleapache-sparkapache-spark-clustervagrant