Loading contributions...
Top Repositories
Open Neural Network Exchange
Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3
Benchmark SGLang on SLURM
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
Fast and memory-efficient exact attention
Transformer related optimization, including BERT, GPT
Repositories
24Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3
Benchmark SGLang on SLURM
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
Fast and memory-efficient exact attention
Transformer related optimization, including BERT, GPT
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
Fully open reproduction of DeepSeek-R1
CUDA Templates for Linear Algebra Subroutines
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
8-bit CUDA functions for PyTorch
Development repository for the Triton language and compiler
Inference code for LLaMA models
An innovation library for efficient LLM inference via low-bit quantization and sparsity
A high-throughput and memory-efficient inference and serving engine for LLMs
Robust Speech Recognition via Large-Scale Weak Supervision
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
No description provided.
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
ONNX Runtime: cross-platform, high performance scoring engine for ML models
Open Neural Network Exchange
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Tutorials for creating and using ONNX models
Samples for Windows ML.