RO
rolandtannous/cudf-pandas-polars-performance
Comparing performance of rapids cudf with pandas and polars
Data Science Workbench
This repository contains various data science experiments and tutorials using RAPIDS, Polars, Pandas, and other data processing tools.
Requirements
To run the code in this repository, you'll need:
- Python 3.8+
- RAPIDS cuDF (GPU-accelerated DataFrame library)
- Polars (Rust-based DataFrame library)
- Pandas
- NumPy
- Matplotlib
- Seaborn
Installation
-
Install RAPIDS cuDF following the official instructions:
# Follow installation guide at: https://github.com/rapidsai/cudf -
Install other Python dependencies:
pip install polars pandas numpy matplotlib seaborn
Project Details
10minstocudf
Contains practice files for the RAPIDS cuDF tutorial:
- Based on: https://docs.rapids.ai/api/cudf/nightly/user_guide/10min/
- Main dependencies:
cudf,cupy,dask_cudf,pandas
nvidia_summit
Contains practice files from NVIDIA Summit 2023 training:
- Based on: https://www.nvidia.com/en-us/on-demand/session/nvaidatasciencesummit23-02/
- Includes examples using Ibis and other GPU-accelerated tools
Usage
To run any of the notebooks or scripts:
- Ensure you have the required dependencies installed
- For GPU-accelerated code, make sure you have compatible NVIDIA hardware
- Download the QEDCorpus dataset for the Arabic dataset experiments
# Example: Running the 10minstocudf tutorial
cd 10minstocudf
python main.pyLicense
This project is licensed under the MIT License - see the LICENSE file for details.
On this page
Languages
Jupyter Notebook99.5%Python0.5%
Contributors
MIT License
Created March 20, 2025
Updated March 20, 2025