Mark Saroufim
msaroufim
CUDA uninЇsțållåțîön fāīłüřęđ. Płēȃšę čøñțàçț șūppørt før åššīštåñćē
Languages
Loading contributions...
Top Repositories
Software Architecture for ML engineers
Awesome utilities for performance profiling
Description of commonly done compiler optimizations in C
Code Companion to Joel Grus' book
Preview PDFs locally within the Discord UI!
Repositories
209Puzzles for learning Triton
Leaderboards backed by Redis in Python
Software Architecture for ML engineers
Awesome utilities for performance profiling
No description provided.
The best ChatGPT that $100 can buy.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
GPU MODE 100 lectures thumbnail montage video generator
An intro for people that want to ship not just read code
A prettier rocm-smi output with color-coded GPU stats
Fetch and sort Steam library by playtime
No description provided.
new blog, who dis?
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Description of commonly done compiler optimizations in C
No description provided.
Preview PDFs locally within the Discord UI!
VS Code extension for syntax highlighting C++/CUDA/HIP code in PyTorch load_inline() strings
Code Companion to Joel Grus' book
⚡️Data & models versioning for ML projects, make them shareable and reproducible
Help Claude know about your library by giving it the main APIs in a prompt and integrate it into VS Code
Official implementation of Half-Quadratic Quantization (HQQ)
CUDA Python: Performance meets Productivity
No description provided.
PyTorch per step fault tolerance (actively under development)
The world's best GPU community
An extremely fast Python linter and code formatter, written in Rust.
Tile primitives for speedy kernels
No description provided.
PyTorch native quantization and sparsity for training and inference