2,090 results for “topic:inference”
A high-throughput and memory-efficient inference and serving engine for LLMs
Port of OpenAI's Whisper model in C/C++
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Making large AI models cheaper, faster and more accessible
Cross-platform, customizable ML solutions for live and streaming media.
SGLang is a high-performance serving framework for large language models and multimodal models.
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Faster Whisper transcription with CTranslate2
Machine Learning Engineering Open Book
🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Nano vLLM
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
Large Language Model Text Generation Inference
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Production ready toolkit to run AI locally
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.
Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
Supercharge Your LLM with the Fastest KV Cache Layer
💎1MB lightweight face detection model (1MB轻量级人脸检测模型)
Runtime type system for IO decoding/encoding
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
On-device Speech Recognition for Apple Silicon
CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both open-source and on-premise/SaaS solutions, with features comparable to Hugging Face. Gain full control over the lifecycle of LLMs, datasets, and agents, with Python SDK compatibility with Hugging Face. Join us! ⭐️
Superduper: End-to-end framework for building custom AI applications and agents.
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,mlops算法链路全流程,算力租赁平台,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU虚拟化,边缘计算,标注平台自动化标注,deepseek等大模型sft微调/奖励模型/强化学习训练,vllm/ollama/mindie大模型多机推理,私有知识库,AI模型市场,支持国产cpu/gpu/npu 昇腾生态,支持RDMA,支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/ray/volcano等分布式