Repositories
30vllm
PublicA high-throughput and memory-efficient inference and serving engine for LLMs
vllm-omni
PublicA framework for efficient model inference with omni-modality models
llm-compressor
PublicTransformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
tpu-inference
PublicTPU inference for vLLM, with unified JAX and PyTorch support.
vllm-metal
PublicCommunity maintained hardware plugin for vLLM on Apple Silicon
production-stack
PublicvLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
guidellm
PublicEvaluate and Enhance Your LLM Deployments for Real-World Inference Needs
speculators
PublicA unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
vllm-ascend
PublicCommunity maintained hardware plugin for vLLM on Ascend
semantic-router
PublicSystem Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
router
PublicA high-performance and light-weight router for vLLM large scale deployment
aibrix
PublicCost-efficient and pluggable Infrastructure components for GenAI inference
vllm-xpu-kernels
PublicThe vLLM XPU kernels for Intel GPU
vllm-daily
PublicvLLM Daily Summarization of Merged PRs
vllm-gaudi
PublicCommunity maintained hardware plugin for vLLM on Intel Gaudi
vllm-spyre
PublicCommunity maintained hardware plugin for vLLM on Spyre
compressed-tensors
PublicA safetensors extension to efficiently store sparse quantized tensors on disk
flash-attention
Public ForkFast and memory-efficient exact attention
vllm-neuron
PublicCommunity maintained hardware plugin for vLLM on AWS Neuron
vllm-project.github.io
Publicvllm-skills
PublicAgent skills for vLLM
recipes
PublicCommon recipes to run vLLM
ci-infra
PublicThis repo hosts code for vLLM CI & Performance Benchmark infrastructure.
dashboard
PublicvLLM performance dashboard
perf-dashboard
PublicPerformance dashboard for vLLM
bart-plugin
PublicvLLM Model plugin for the encoder-decoder BART model
FlashMLA
Public Forkvllm-openvino
Publicmedia-kit
PublicvLLM Logo Assets