Repositories
12PyTorch native quantization and sparsity for training and inference
A PyTorch native platform for training generative AI models
FlashInfer: Kernel Library for LLM Serving
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Fast and Furious AMD Kernels
No description provided.
Distributed Compiler based on Triton for Parallel Systems
Ongoing research training transformer models at scale
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
AI Tensor Engine for ROCm