"topic:continuous-batching" — Search

6 results for “topic:continuous-batching”

A High-Performance LLM Inference Engine with vLLM-Style Continuous Batching

ai-infracontinuous-batchinginference-enginellm-inferencemodern-cpppaged-attentionvllm

gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling

chunked-prefillcontinuous-batchingllm-inferencellm-servingpagedattentionpipeline-parallelismqwen3tensor-parallelismtoken-throttling

maxime-dlabai/mlx-continuous-batching

OpenAI-compatible server with continuous batching for MLX on Apple Silicon

Python10Updated 2 months ago

apple-siliconcontinuous-batchinginferencellmmacosmlxopenai-apitext-generation

LessUp/hetero-paged-infer

PagedAttention + Continuous Batching Inference Engine Prototype (Rust): Paged KV Cache & Dynamic Scheduling | PagedAttention + Continuous Batching 推理引擎原型（Rust），KV Cache 分页管理与动态调度

Rust00Updated 1 day ago

continuous-batchinggpu-computingllmllm-inferencepaged-attentionrust

AdaXL/adaptive-llm-scheduler

Adaptive LLM inference scheduler simulation — continuous batching, priority preemption, KV-cache routing, and speculative decoding in Python/asyncio.

Python00Updated 18 hours ago

asynciocontinuous-batchinggpuinferencekv-cachellmresearchschedulingspeculative-decoding

nagababumo/Efficiently-Serving-LLMs

No description provided.

Jupyter Notebook00Updated 1 year ago

batchingcontinuous-batchingloraloraxlow-rank-adaptationmulti-loraquantization