Top Repositories
A lightweight pure C++ Text-to-Speech (TTS) pipeline with OpenVINO, supporting multiple languages.
Hands-on examples for optimizing and deploying AI models with OpenVINO.
Awesome resources for GPUs
AI Tensor Engine for ROCm
A python interface for ROCM HIP language
Repositories
63No description provided.
A lightweight pure C++ Text-to-Speech (TTS) pipeline with OpenVINO, supporting multiple languages.
AI Tensor Engine for ROCm
Awesome resources for GPUs
A python interface for ROCM HIP language
amdgpu example code in hip/asm
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
LLVM Code Generation, published by Packt
Playground for Domain-Specific Languages (DSL)
Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.
No description provided.
Fast and Furious AMD Kernels
LeetGPU Challenges
Triton kernel profile and debug tool among vllm & aiter for internal usage
ASTER 💫 : Assembly Tooling and Representations
Hands-on examples for optimizing and deploying AI models with OpenVINO.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
No description provided.
FlashInfer: Kernel Library for LLM Serving
Development repository for the Triton language and compiler
SGLang is a fast serving framework for large language models and vision language models.
No description provided.
Fast and memory-efficient exact attention
A high-throughput and memory-efficient inference and serving engine for LLMs
No description provided.
No description provided.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Sample codes for my CUDA programming book
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.