GitHunt
DA

Helpful Java Spark stuff

spark

Helpful Java Spark library.

Pipelines:

  • ConcatColumns - When you need to concat a bunch of columns together
  • DropColumns - Pipelines do get messy
  • ExplodeVector - All those models sure like results in vectors
  • FloorCeiling - For putting caps on your variables
  • MultiStringIndexer - For the lazy
  • TopCategories - When you want to limit the categorical variables
  • XGBoostEstimator - Just using the Scala one, but added all the missing "sets" and "gets"

Params:

  • MapParam - When you just need to use a Map

Misc:

  • FindContinuousColumns - Dataset helper
  • FindCategoricalColumns - Dataset helper
  • FindNullColumns - Annoyingly, I have some datasets where columns can always be NULL.
  • SmartUnion - For when your columns don't line up and the datasets don't always have the same ones.
  • JavaToScala helpers
  • SparkUnitTest helpers
  • SparkSettings sysouts
  • ...

Languages

Java100.0%

Contributors

MIT License
Created April 22, 2018
Updated October 11, 2022