Polars
Blazingly fast DataFrames in Rust & Python
Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arrow as backend.
It currently consists of an eager API similar to pandas and a lazy API that is somewhat similar to spark.
Amongst more, Polars has the following functionalities.
To learn more about the inner workings of Polars read the WIP book.
| Functionality | Eager | Lazy (DataFrame) | Lazy (Series) |
|---|---|---|---|
| Filters | ✔ | ✔ | ✔ |
| Shifts | ✔ | ✔ | ✔ |
| Joins | ✔ | ✔ | |
| GroupBys + aggregations | ✔ | ✔ | |
| Comparisons | ✔ | ✔ | ✔ |
| Arithmetic | ✔ | ✔ | |
| Sorting | ✔ | ✔ | ✔ |
| Reversing | ✔ | ✔ | ✔ |
| Closure application (User Defined Functions) | ✔ | ✔ | |
| SIMD | ✔ | ✔ | |
| Pivots | ✔ | ✗ | |
| Melts | ✔ | ✗ | |
| Filling nulls + fill strategies | ✔ | ✗ | ✔ |
| Aggregations | ✔ | ✔ | ✔ |
| Moving Window aggregates | ✔ | ✗ | ✗ |
| Find unique values | ✔ | ✗ | |
| Rust iterators | ✔ | ✔ | |
| IO (csv, json, parquet, Arrow IPC | ✔ | ✗ | |
| Query optimization: (predicate pushdown) | ✗ | ✔ | |
| Query optimization: (projection pushdown) | ✗ | ✔ | |
| Query optimization: (type coercion) | ✗ | ✔ | |
| Query optimization: (simplify expressions) | ✗ | ✔ | |
| Query optimization: (aggregate pushdown) | ✗ | ✔ |
Note that almost all eager operations supported by Eager on Series/ChunkedArrays can be used in Lazy via UDF's
Documentation
Want to know about all the features Polars support? Read the docs!
Rust
Python
- installation guide:
pip install py-polars - the book
- Reference guide
Performance
Polars is written to be performant, and it is! But don't take my word for it, take a look at the results in
h2oai's db-benchmark.
Cargo Features
Additional cargo features:
temporal (default)- Conversions between Chrono and Polars for temporal data
simd (nightly)- SIMD operations
parquet- Read Apache Parquet format
json- Json serialization
ipc- Arrow's IPC format serialization
random- Generate array's with randomly sampled values
ndarray- Convert from
DataFrametondarray
- Convert from
lazy- Lazy api
strings- String utilities for
Utf8Chunked
- String utilities for
object- Support for generic ChunkedArray's called
ObjectChunked<T>(generic overT).
These will downcastable from Series through the Any trait.
- Support for generic ChunkedArray's called
parallel- ChunkedArrays can be used by rayon::par_iter()
Contribution
Want to contribute? Read our contribution guideline.
Env vars
POLARS_PAR_SORT_BOUND-> Sets the lower bound of rows at which Polars will use a parallel sorting algorithm.
Default is 1M rows.POLARS_FMT_MAX_COLS-> maximum number of columns shown when formatting DataFrames.POLARS_FMT_MAX_ROWS-> maximum number of rows shown when formatting DataFrames.POLARS_TABLE_WIDTH-> width of the tables used during DataFrame formatting.POLARS_MAX_THREADS-> maximum number of threads used in join algorithm. Default is unbounded.