GitHunt

José María Luna

josemarialuna

PhD in Computer Science and AI at the University of Seville, Spain. #Python #Scala #Spark #DataScientist

University of Seville
Seville, Spain

Languages

Scala47%Python33%Jupyter Notebook13%Java7%

Top Repositories

Repositories

18
JO
josemarialuna/ing-datos-big-data-US

Entorno de Big Data basado en Docker con Hadoop, Hive, Trino, Kafka y Airflow. Incluye configuraciones, scripts y ejemplos de MapReduce para análisis de datos distribuidos.

Python00Updated 3 months ago
bigdatadockerhadooppython
JO
josemarialuna/Chi-Index

Clustering Validity Index based on Chi Square as Python package

Python120Updated 10 months ago
clusteringdata-sciencemachine-learning
JO
josemarialuna/ClusterIndices

This package contains the code for executing clustering validity indices in Spark. The package includes BD-Silhouette, BD-Dunn, Davies-Bouldin and WSSSE indices.

Scala103Updated 1 year ago
big-dataclustering-evaluationscalaspark-mllibvalidity-indices
JO
josemarialuna/ExternalValidity

This package contains the code for calculating external clustering validity indices in Spark. The package includes Chi Index among others.

Scala91Updated 2 years ago
apache-sparkapache-spark-clusterclustering-evaluationclustering-validationcviscalaspark-mlspark-mllib
JO
josemarialuna/RandomClustersGenerator

📊 Python tool for creating datasets with clusters using a normal distribution. Customize clusters, significant columns, and add variability with dummy columns. Ideal for testing clustering algorithms.

Python60Updated 2 years ago
clusteringdatasetsmachine-learning
JO
josemarialuna/Clustering-DatasetsFork

This repository contains the collection of UCI (real-life) datasets and Synthetic (artificial) datasets (with cluster labels and MATLAB files) ready to use with clustering algorithms.

00Updated 2 years ago
JO
josemarialuna/SeminarioDeRiquelme

Clustering

Jupyter Notebook00Updated 2 years ago
JO
josemarialuna/CreateRandomDataset

This package contains the code for generating Big Data random datasets in Spark.

Scala41Updated 2 years ago
big-databigdataclusteringdatasetrandomscalaspark
JO
josemarialuna/josemarialuna

No description provided.

00Updated 2 years ago
JO
josemarialuna/MethodComparisonsInPythonFork

Friedman tests for comparing multiple methods across datasets in python

Jupyter Notebook00Updated 3 years ago
JO
josemarialuna/ClassificationPython

No description provided.

Python40Updated 4 years ago
JO
josemarialuna/smallDataIndex

This package contains the code for executing clustering validity indices in Java by using K-means from Weka. The package includes the following clustering validity indices: Silhouette, Dunn, BD-Silhouette, BD-Dunn, Davies-Bouldin, Calinski-Harabasz, MaximumDiameter, SquaredDistance, AverageDistance, AverageBetweenClusterDistance, MinimumDistance.

Java21Updated 4 years ago
clusteringcviindexingindiceskmeansvalidationweka
JO
josemarialuna/electricFork

No description provided.

00Updated 4 years ago
JO
josemarialuna/SurvivalAnalysis

No description provided.

Python00Updated 6 years ago
JO
josemarialuna/matrixGeneration

No description provided.

Scala00Updated 7 years ago
JO
josemarialuna/BasicSpark

No description provided.

Scala10Updated 7 years ago
JO
josemarialuna/linkage

No description provided.

Scala11Updated 8 years ago
JO
josemarialuna/clusterEmpleo

No description provided.

Scala00Updated 9 years ago

Gists

Recent Activity