GitHunt
VL

vLLM

vllm-project

2.8k followers0 following

Repositories

30

vllm

Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python72.4k14.1k

vllm-omni

Public

A framework for efficient model inference with omni-modality models

Python3.0k492

llm-compressor

Public

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python2.8k422

tpu-inference

Public

TPU inference for vLLM, with unified JAX and PyTorch support.

Python250118

vllm-metal

Public

Community maintained hardware plugin for vLLM on Apple Silicon

Python59659

production-stack

Public

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python2.2k372

guidellm

Public

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Python899136

speculators

Public

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python26248

vllm-ascend

Public

Community maintained hardware plugin for vLLM on Ascend

Python1.7k891

semantic-router

Public

System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

Go3.3k550

router

Public

A high-performance and light-weight router for vLLM large scale deployment

Rust14148

aibrix

Public

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go4.7k532

vllm-xpu-kernels

Public

The vLLM XPU kernels for Intel GPU

C++2227

vllm-daily

Public

vLLM Daily Summarization of Merged PRs

463

vllm-gaudi

Public

Community maintained hardware plugin for vLLM on Intel Gaudi

Python28113

vllm-spyre

Public

Community maintained hardware plugin for vLLM on Spyre

Python4744

compressed-tensors

Public

A safetensors extension to efficiently store sparse quantized tensors on disk

Python26265

flash-attention

Public Fork

Fast and memory-efficient exact attention

Python115125

vllm-neuron

Public

Community maintained hardware plugin for vLLM on AWS Neuron

Python249

vllm-project.github.io

Public
HTML3171

vllm-skills

Public

Agent skills for vLLM

Shell449

recipes

Public

Common recipes to run vLLM

Jupyter Notebook482163

ci-infra

Public

This repo hosts code for vLLM CI & Performance Benchmark infrastructure.

HCL3263

dashboard

Public

vLLM performance dashboard

Python439

perf-dashboard

Public

Performance dashboard for vLLM

Python11

bart-plugin

Public

vLLM Model plugin for the encoder-decoder BART model

Python94

FlashMLA

Public Fork
C++1117

vllm-openvino

Public
Python4111

media-kit

Public

vLLM Logo Assets

65

vLLM-in-PyTorch-Conference-2025

Public
111