JZ
Jzz24/LLM_Kernels
MoE, Group GEMM, MHA, Quantization
LLM_Kernels
A simple implementation and verification toolkit for LLM kernels.
Quantization
- fp8 blockwise gemm
- int8 gemm
- w4a8 gemm(triton)
- int4 weight pack/unpack
- w4a16 gemm (cuda simple Marlin)
- w4a8 gemm (cuda, simple Qserve)
- fp4/6/8 fake quantize function
MoE
- Multiple communication strategies (All-to-All, AllGather)
- Group GEMM acceleration
- Quantized Group GEMM
Attention
- sage attention