266 results for “topic:rocm”
Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
Open Machine Learning Compiler Framework
NumPy & SciPy for GPU
Supercharge Your LLM with the Fastest KV Cache Layer
Performance-optimized AI inference on your GPUs. Unlock superior throughput by selecting and tuning engines like vLLM or SGLang.
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk
A deep learning package for many-body potential energy representation and molecular dynamics
Large-scale LLM inference engine
No description provided.
stdgpu: Efficient STL-like Data Structures on the GPU
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.
Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3
Dockerfiles for the various software layers defined in the ROCm software platform
Abstraction Library for Parallel Kernel Acceleration :llama:
[DEPRECATED] Moved to ROCm/rocm-libraries repo
Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster
Go with your own intelligence - Go applications that directly integrate llama.cpp for local inference using hardware acceleration.
Agenium Scale vectorization library for CPUs and GPUs
Exascale multiphase flow solver — 2025 Gordon Bell Prize Finalist | 200T grid points on 43K+ GPUs
AMD GPU (ROCm) programming in Julia
AOMP is an open source Clang/LLVM based compiler with added support for the OpenMP® API on Radeon™ GPUs. Use this repository for releases, issues, documentation, packaging, and examples.
HPC solver for nonlinear optimization problems
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
Zero-knowledge template library
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
[DEPRECATED] Moved to ROCm/rocm-libraries repo
Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.
Simple yet fancy GPU architecture fetching tool
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming