/

JZ

Jzz24/LLM_Kernels

MoE, Group GEMM, MHA, Quantization

LLM_Kernels

A simple implementation and verification toolkit for LLM kernels.

Quantization

fp8 blockwise gemm
int8 gemm
w4a8 gemm（triton）
int4 weight pack/unpack
w4a16 gemm (cuda simple Marlin)
w4a8 gemm (cuda, simple Qserve)
fp4/6/8 fake quantize function

MoE

Multiple communication strategies (All-to-All, AllGather)
Group GEMM acceleration
Quantized Group GEMM

Attention

sage attention

On this page

LLM_Kernels
Quantization
MoE
Attention

Languages

Python81.9%Cuda16.5%C++1.5%

Contributors

Created March 19, 2025

Updated September 4, 2025

© 2026 GitHunt · tansuasici

Jzz24/LLM_Kernels | GitHunt