GitHunt — Discover GitHub Repositories

34 results for “topic:blackwell”

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python72.4k14.1kUpdated just now

amdblackwellcudadeepseekdeepseek-v3+15

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python24.2k4.7kUpdated just now

attentionblackwellcudadeepseekdiffusion+13

NVIDIA/TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

Python13.0k2.2kUpdated just now

blackwellcudallm-servingmoepytorch

GradientHQ/parallax

Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere

Python1.1k115Updated 1 day ago

blackwellchatbotdecentralized-inferencedeepseekdistributed-systems+12

IST-DASLab/qutlass

QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning

C++17017Updated 4 days ago

blackwellcudapost-training-quantizationquantization-aware-training

eelbaz/dgx-spark-vllm-setup

One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs (sm_121 architecture)

Shell677Updated 16 hours ago

aiarm64blackwellcudadeep-learning+9

6Morpheus6/deepspeed-windows-wheels

Prebuilt DeepSpeed wheels for Windows with NVIDIA GPU support. Supports GTX 10 - RTX 50 series. Compiled with pytorch 2.7, 2.8 and cuda 12.8

454Updated 4 days ago

blackwelldeepspeedprebuilt-wheelswindows

dougeeai/llama-cpp-python-wheels

Pre-built wheels for llama-cpp-python across platforms and CUDA versions

383Updated 9 hours ago

adaada-architectureampereblackwellblackwell-architecture+15

egaoharu-kensei/flash-attention-triton

Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with custom configuration mode

Python210Updated 6 days ago

ampereattentionattention-mechanismblackwelldeep-learning+15

actypedef/ARCQuant

Code for the paper "ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs"

Cuda172Updated 4 days ago

blackwellllmllm-inferencemicroscalingmixed-precision+2

waybarrios/dgx-spark-finetune-llm

LLM fine-tuning with LoRA + NVFP4/MXFP8 on NVIDIA DGX Spark (Blackwell GB10)

Python110Updated 5 days ago

blackwelldeep-learningdgx-sparkfine-tuningllm+7

kekzl/imp

High-performance LLM inference engine in C++/CUDA for NVIDIA Blackwell GPUs (RTX 5090)

Cuda71Updated 2 hours ago

blackwellcppcudaggufinference+5

hiroki-abe-58/ComfyUI-Win-Blackwell

No description provided.

PowerShell70Updated 2 days ago

aiaigcblackwellcomfyuicomfyui-workflow+3

Mekopa/whisperx-blackwell

GPU-accelerated WhisperX on NVIDIA Blackwell (SM_121) - DGX Spark compatible

Python60Updated 2 days ago

asraudioblackwellcudadeep-learning+13

dataforgex/dgx_spark

Multi-model LLM serving for NVIDIA DGX Spark with vLLM, web UI, and tool calling

Python51Updated 5 days ago

blackwellchatbotdgx-sparkocropenai-api+3

informatico-madrid/blackwell-linux-infra-optimizer

Optimized vLLM deployment for NVIDIA Blackwell (RTX 5090) on Linux Kernel 6.14. Resolves SM_120 kernel incompatibilities, P2P deadlocks, and memory fragmentation for high-performance LLM inference.

Dockerfile41Updated 3 weeks ago

blackwellcudadeepseekinfrastructurelinux-kernel+4

xxrjun/gb200-kvcache-offload-study

An empirical study of benchmarking LLM inference with KV cache offloading using vLLM and LMCache on NVIDIA GB200 with high-bandwidth NVLink-C2C .

Python30Updated 2 months ago

blackwellgb200kvcacheoffloading

Justus0405/Nvidiainstall

📦 A fully automated method for installing Nvidia drivers on Arch Linux

Shell21Updated 1 month ago

ada-lovelaceamperearch-linuxautomatic-installbash+15

m96-chan/PyGPUkit

Minimal GPU runtime for Python - high-performance CUDA kernels, memory management, and LLM inference without heavy dependencies

Python10Updated 2 weeks ago

ampereblackwellcudagpuhopper+7

prateekshukla1108/pytorch-distributed-gemm

Pytorch Operation for distributed gemm in nvidia blackwell gpus

Cuda10Updated 5 months ago

blackwellcudagemm

aiurion/zigCUDA

Blackwell ready pure Zig (0.15.2) bindings to the NVIDIA CUDA Driver API – dynamic loading, clean wrappers, no toolkit required at runtime.

Zig10Updated 1 month ago

blackwellcuda-bindingsdriver-apigpugpu-computing+5

MoHussein197/dgx-spark-finetune-llm

🔧 Fine-tune large language models efficiently on NVIDIA DGX Spark with LoRA adapters and optimized quantization for high performance.

Python11Updated 1 hour ago

blackwelldeep-learningdgx-sparkfine-tuningllm+7

ridanuae/dgx-spark-sglang-qwen35

Run Qwen3.5-35B-A3B on NVIDIA DGX Spark (GB10) with SGLang - Ready-to-use Docker image + complete guide

Shell10Updated 2 days ago

blackwelldgx-sparkdockergb10llm+5

abuttan1979/VLN-YuanNav

🧭 Enhance navigation with VLN-YuanNav, a visual-language model using advanced memory and decision-making for effective exploration.

Python10Updated 1 hour ago

amdandroidandroid-frameworkblackwellflutter+14

MGD-Ben/GPT-OSS

🚀 Build and explore OpenAI's GPT-OSS model from scratch in Python, unlocking the mechanics of large language models.

Python10Updated 1 hour ago

agentai-agentsamdblackwellchatgpt+13

Fortnumsound/LaQuisha_complete-chat-browser_model-loader_and-backend_for-running-GGUF-models_with-Llama.cpp

A fast API booty-licious back-end for running GGUF models with Llama.cpp

Python00Updated 5 months ago

50805090apiblackwellcuda+6

Natfii/onnxruntime-gpu-blackwell

Pre-built onnxruntime-gpu 1.24.1 with Blackwell sm_120 CUDA kernels (RTX 5090/5080/5070)

00Updated 3 weeks ago

blackwellcudagpumachine-learningnvidia+6

informatico-madrid/Sovereign-Blackwell-vLLM-Stack

Enterprise-grade Sovereign AI Stack optimized for NVIDIA Blackwell (sm_120) & vLLM. Features 256K context window, 5.8k tok/s prefill, and integrated observability via Langfuse.

Python00Updated 1 month ago

blackwellcudalangfuselitellmllm-infrastructure+5

kurcontko/blackwell-infer

LLM inference setup for NVIDIA Blackwell GPUs with FP4 quantization

Python00Updated 3 weeks ago

blackwelldockerinferencellmnvidia+2

PrimitiveContext/blackwell

Production LLM deployment specs for NVIDIA Blackwell GPUs (RTX Pro 6000, DGX Spark). Includes vLLM configurations, benchmarks, load balancer, and throughput calculators for NVFP4/FP8/MoE models.

Python00Updated 1 week ago

benchmarkblackwelldgx-sparkmoemsi-edgexpert+6

Page 1 of 2