GitHunt
JZ

Jzz24/LLM_Kernels

MoE, Group GEMM, MHA, Quantization

LLM_Kernels

A simple implementation and verification toolkit for LLM kernels.

Quantization

  • fp8 blockwise gemm
  • int8 gemm
  • w4a8 gemm(triton)
  • int4 weight pack/unpack
  • w4a16 gemm (cuda simple Marlin)
  • w4a8 gemm (cuda, simple Qserve)
  • fp4/6/8 fake quantize function

MoE

  • Multiple communication strategies (All-to-All, AllGather)
  • Group GEMM acceleration
  • Quantized Group GEMM

Attention

  • sage attention

Languages

Python81.9%Cuda16.5%C++1.5%

Contributors

Created March 19, 2025
Updated September 4, 2025
Jzz24/LLM_Kernels | GitHunt