"topic:inference-engine" — Search

356 results for “topic:inference-engine”

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

Python4.0k762Updated 10 hours ago

ai-agentdeep-learningdistributed-trainingedge-aifederated-learninginference-enginemachine-learningmlopsmodel-deploymentmodel-servingon-device-training

zjhellofss/KuiperInfer

校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

C++3.3k357Updated just now

caffeconvolutiondeep-learningdeep-neural-networksdiygraph-algorithmsinferenceinference-enginemaxpoolingncnnpnnxpytorchreluresnetsigmoidyoloyolov5

hyperjumptech/grule-rule-engine

Rule engine implementation in Golang

Go2.5k372Updated 4 hours ago

golanghacktoberfesthacktoberfest2021inference-enginerulerule-basedrule-based-enginerule-engine

siliconflow/onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

Jupyter Notebook2.0k127Updated 9 hours ago

aigc-servingcomfyuicomfyui-workflowcudadiffusersdiffusion-modelsinference-enginelcmlcm-loraloraperformance-optimizationpytorchsd-webuisdxlsdxl-turbostable-diffusionstable-video-diffusion

aphrodite-engine/aphrodite-engine

Large-scale LLM inference engine

C++1.7k187Updated 13 hours ago

api-restcudainference-engineinferentiaintelloramachine-learningrocmspeculative-decodingtpu

Tencent/FeatherCNN

FeatherCNN is a high performance inference engine for convolutional neural networks.

C++1.2k279Updated 2 days ago

androidarm-neoncaffeconvolutional-neural-networksinference-engineios

PaddlePaddle/Paddle.js

Paddle.js is a web project for Baidu PaddlePaddle, which is an open source deep learning framework running in the browser. Paddle.js can either load a pre-trained model, or transforming a model from paddle-hub with model transforming tools provided by Paddle.js. It could run in every browser with WebGL/WebGPU/WebAssembly supported. It could also run in Baidu Smartprogram and WX miniprogram.

JavaScript1.1k152Updated 20 hours ago

deep-learninginference-enginemodelocrpaddlepaddlewebassemblywebglwebgpu

jd-opensource/xllm

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

C++1.1k145Updated 1 hour ago

deepseekglminferenceinference-enginelarge-language-modelsllm-inferenceqwen

qualcomm/ai-hub-models

Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

Python937162Updated 6 hours ago

deeplearningdemosinferenceinference-apiinference-enginemachine-learningmachinelearningonnxpytorchqnntensorflow-lite

zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++905102Updated 2 days ago

cudadeepseek-r1gptinference-enginellamallmllm-inferencellm-servingmodel-servingpytorch

Adlik/Adlik

Adlik: Toolkit for Accelerating Deep Learning Inference

C++80782Updated 3 days ago

compilerdeep-learningdocker-imagesinferenceinference-enginemodel-optimizeropenvinotensorflow-servingtensorrt

ovg-project/kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python80189Updated 1 hour ago

elastic-kvcachegpu-mutiplexinggpu-sharinginference-enginekvcachekvcache-optimizationkvcachedllmllm-frameworkllm-inferencellm-servingollamaonline-offline-coserveserverlesssglangvllm

insight-platform/Savant

Python Computer Vision & Video Analytics Framework With Batteries Included

Python78772Updated 5 hours ago

computer-visioncudadeep-learningdeepstreamedge-computinginference-engineinstance-segmentationmachine-learningnvidianvidia-deepstream-sdkobject-detectionopencvpeoplenettensorrtvideoyoloyolov5-faceyolov8yolov8-face

msnh2012/Msnhnet

🔥 (yolov3 yolov4 yolov5 unet ...)A mini pytorch inference framework which inspired from darknet.

C++738145Updated 2 weeks ago

darknetinference-enginejetson-nxmobilenetv2mobilenetyolopytorchyolov3yolov4yolov5

nobodywho-ooo/nobodywho

NobodyWho is an inference engine that lets you run LLMs locally and efficiently on any device.

Rust71939Updated 2 hours ago

flutterflutter-aigodotgodot-enginegodot-plugingodot4inference-enginellmpythonpython-llmslm

pylint-dev/astroid

A common base representation of python source code for pylint and other projects

Python573316Updated 1 day ago

astclosemberhacktoberfestinference-engineparserstatic-analysisstatic-code-analysis

Tencent/Forward

A library for high performance deep learning inference on NVIDIA GPUs.

C++55563Updated 1 month ago

cudadeep-learningforwardgpuinferenceinference-enginekerasneural-networkonnxpytorchtensorflowtensorrt

andrewkchan/yalm

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

C++55556Updated 2 days ago

cppcudainference-enginellamallamacppllmllm-inferencemachine-learningmistral

PaddlePaddle/AnakinArchived

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

C++537135Updated 3 weeks ago

aiamdarmbitmaincambriconcross-platformhigh-performanceinference-engineintelnvidia

HoloClean/holoclean

A Machine Learning System for Data Enrichment.

Python533131Updated 1 week ago

data-enrichmentdata-scienceinference-enginemachine-learningpytorch

zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

C++509129Updated 1 hour ago

cppcudainference-enginellama2llama3llmllm-inferenceqwenqwen2

buguroo/pyknow

PyKnow: Expert Systems for Python

Python497152Updated 4 days ago

expert-systeminference-enginepython3

ulfurinn/wongi-engine

A rule engine written in Ruby.

Ruby49041Updated 2 months ago

inference-enginereterubyrule-engine

chengzeyi/ParaAttention

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

Python42545Updated 4 days ago

attentiondiffusersfluxhunyuan-videoinferenceinference-engineparallel-computingtransformers

ReactiveBayes/RxInfer.jl

Julia package for automated Bayesian inference on a factor graph with reactive message passing

Jupyter Notebook38634Updated 5 days ago

bayesian-inferenceinference-enginejulia-languagemachine-learningmessage-passingprobabilistic-programmingvariational-inference

qualcomm/ai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

Java38493Updated 14 hours ago

deeplearningdemosinferenceinference-apiinference-enginemachine-learningmachinelearningonnxpytorchqnntensorflow-lite

SearchSavior/OpenArc

Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.

Python33518Updated 16 hours ago

agentic-aifastapiinference-engineopenvino-genaiopenvino-toolkitoptimum-inteltransformers

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python31635Updated 3 days ago

cudagptinferenceinference-enginellamallmllm-inferencellm-servingllmopsmlopsmodel-servingpytorchtransformertransformers

EfficientMoE/MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

Python28625Updated 2 days ago

huggingfaceinference-enginelarge-language-modelsllm-inferencemixture-of-expertspytorch

gottingen/kumo-search

docs for search system and ai infra

22222Updated 1 month ago

aideep-learninginference-engineneural-networkperformancepythonsearch-enginetensorflowtensorflow2

Page 1 of 12