mathijs81/java-dataframes
A quick test of a couple of data frame libraries for Java
Java dataframes test
This is the companion repository to the following medium post: Doing cool data science in Java: how 3 DataFrame libraries stack up
Data
The data was extracted from Eurostat in the beginning of September 2018. I opened the extracted CSV in LibreOffice and saved it again because there were some illegal UTF-8 characters in the Eurostat output that some csv importers couldn't handle directly.
Results [June 2025]
| Library | Maintained | Version | Time (ms) |
|---|---|---|---|
| DuckDb | Y | 1.3.0 | 93 |
| DFLib | Y | 1.3.0 | 226 |
| Kotlin DataFrame | Y | 1.0-beta2 | 816 |
| Tablesaw | Y | 0.44.1 | 820 |
| Joinery | n | 1.9 | 1,478 |
| Krangl | n | 0.18.4 | 1,796 |
| Morpheus | n | 0.9.23 | * |
- Morpheus is no longer maintained and doesn't seem to work on later java versions (error related to accessing
sun.util.calendar.ZoneInfo)
Code
The code for the three libraries is present in the Test{libraryname}.java files. They all use CheckResult.java to do a basic correctness check for the top-growing cities.
As described in the medium post, I couldn't find a good way to do the pivot step in datavec, but I included the code I wrote up until that point.