GitHunt — Discover GitHub Repositories

245 results for “topic:model-serving”

A high-throughput and memory-efficient inference and serving engine for LLMs

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python8.5k916Updated 10 hours ago

ai-inferencedeep-learninggenerative-aiinference-platformllm+10

kserve/kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Go5.2k1.4kUpdated 6 hours ago

artificial-intelligencecncfgenaihacktoberfestistio+15

ahkarami/Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

4.4k693Updated 1 week ago

angularjsc-plus-pluscaffe2convert-pytorch-modelsdeep-learning+15

beclab/Olares

Olares: An Open-Source Personal Cloud to Reclaim Your Data

Go4.2k227Updated 1 hour ago

ai-agentsai-privacyedge-aihome-automationhome-cloud+9

FedML-AI/FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

Python4.0k762Updated 2 days ago

ai-agentdeep-learningdistributed-trainingedge-aifederated-learning+6

ModelTC/LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python3.9k305Updated just now

deep-learninggptllamallmmodel-serving+2

HuaizhengZhang/AI-Infra-from-Zero-to-Hero

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

3.7k369Updated 1 day ago

ai-infragenailarge-language-modelsllmsysmlsys+2

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python3.7k309Updated 4 days ago

fine-tuninggptllamallmllm-inference+6

thu-pacman/chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python3.4k477Updated 16 hours ago

deepseekgpullmllm-servingmodel-serving+1

vllm-project/vllm-omni

A framework for efficient model inference with omni-modality models

Python3.0k492Updated just now

audio-generationdiffusionimage-generationinferencemodel-serving+4

tensorchord/envd

🏕️ Reproducible development environment for humans and agents

Go2.2k167Updated 5 days ago

agentbuildkitcode-agentcodexdeveloper-tools+7

microsoft/aici

AICI: Prompts as (Wasm) Programs

Rust2.1k82Updated 1 week ago

aiinferencelanguage-modelllmllm-framework+8

vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

Python1.7k891Updated 6 hours ago

ascendinferencellmllm-servingllmops+4

mlrun/mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

Python1.7k294Updated just now

data-engineeringdata-scienceexperiment-trackingkubernetesmachine-learning+5

kitops-ml/kitops

An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.

Go1.3k169Updated 2 hours ago

aicodedatasetsdevopsdevops-tools+15

logicalclocks/hopsworks

Hopsworks - Data-Intensive AI platform with a Feature Store

Java1.3k154Updated 2 days ago

awsazuredata-sciencefeature-engineeringfeature-management+12

basetenlabs/truss

The simplest way to serve AI/ML models in production

Python1.1k95Updated 1 day ago

artificial-intelligenceeasy-to-usefalconinference-apiinference-server+7

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Cuda1.1k158Updated 20 hours ago

gptinferencellamallmllm-serving+2

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook94846Updated 20 hours ago

cudainferencellama2llmllm-serving+1

zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++905102Updated 20 hours ago

cudadeepseek-r1gptinference-enginellama+5

mosecorg/mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

Python89270Updated 6 days ago

cvdeep-learninggpuhacktoberfestjax+13

openvinotoolkit/model_server

A scalable inference server for models optimized with OpenVINO™

C++836240Updated 23 hours ago

aiclouddagdeep-learningedge+7

bentoml/Yatai

Model Deployment at Scale on Kubernetes 🦄️

TypeScript83676Updated 2 weeks ago

bentomlk8skubernetesmachine-learningmlops+2

ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

Python66266Updated 1 day ago

cudahuggingface-transformerslarge-language-modelsmodel-as-a-servicemodel-serving+2

kossisoroyce/timber

Ollama for classical ML models. AOT compiler that turns XGBoost, LightGBM, scikit-learn, CatBoost & ONNX models into native C99 inference code. One command to load, one command to serve. 336x faster than Python inference.

Python61017Updated 10 hours ago

c99catboostcompilerdecision-treesgradient-boosting+9

eightBEC/fastapi-ml-skeleton

FastAPI Skeleton App to serve machine learning models production-ready.

Python60091Updated 11 hours ago

fastapimachine-learningmodel-servingpythonpython3

underneathall/pinferencia

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

Python54582Updated 3 weeks ago

aiartificial-intelligencecomputer-visiondata-sciencedeep-learning+15

intel/xFasterTransformer

No description provided.

C++43676Updated 20 hours ago

chatglminferenceintelllamallm+4

AI-Hypercomputer/JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Python41558Updated 2 days ago

gemmagptgpuinferencejax+11

Page 1 of 9