GitHunt
RO

rolandtannous/cudf-pandas-polars-performance

Comparing performance of rapids cudf with pandas and polars

Data Science Workbench

This repository contains various data science experiments and tutorials using RAPIDS, Polars, Pandas, and other data processing tools.

Requirements

To run the code in this repository, you'll need:

  • Python 3.8+
  • RAPIDS cuDF (GPU-accelerated DataFrame library)
  • Polars (Rust-based DataFrame library)
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn

Installation

  1. Install RAPIDS cuDF following the official instructions:

    # Follow installation guide at:
    https://github.com/rapidsai/cudf
  2. Install other Python dependencies:

    pip install polars pandas numpy matplotlib seaborn

Project Details

10minstocudf

Contains practice files for the RAPIDS cuDF tutorial:

nvidia_summit

Contains practice files from NVIDIA Summit 2023 training:

Usage

To run any of the notebooks or scripts:

  1. Ensure you have the required dependencies installed
  2. For GPU-accelerated code, make sure you have compatible NVIDIA hardware
  3. Download the QEDCorpus dataset for the Arabic dataset experiments
# Example: Running the 10minstocudf tutorial
cd 10minstocudf
python main.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Languages

Jupyter Notebook99.5%Python0.5%

Contributors

MIT License
Created March 20, 2025
Updated March 20, 2025
rolandtannous/cudf-pandas-polars-performance | GitHunt