Top Repositories
Repositories
11JZ
Jzz24/triton_tutorials
No description provided.
Python50Updated 4 months ago
JZ
Jzz24/gemliteFork
Fast low-bit matmul kernels in Triton
00Updated 4 months ago
JZ
Jzz24/LLM_Kernels
MoE, Group GEMM, MHA, Quantization
Python50Updated 6 months ago
JZ
Jzz24/fp8_quantization
No description provided.
Python10Updated 8 months ago
JZ
Jzz24/CUDA-Learn-NotesFork
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
Cuda00Updated 9 months ago
JZ
Jzz24/lecturesFork
Material for gpu-mode lectures
Jupyter Notebook10Updated 10 months ago
JZ
Jzz24/torch_distributed_demos
No description provided.
Python00Updated 11 months ago
JZ
Jzz24/pytorch_quantization
A pytorch implementation of dorefa quantization
Python11311Updated 1 year ago
bn-folddorefaimagenetnvidia-daliquantizationresnet
JZ
Jzz24/ppqFork
No description provided.
Python10Updated 1 year ago
JZ
Jzz24/ppl.pmxFork
No description provided.
Python10Updated 1 year ago
JZ
Jzz24/microxcalingFork
PyTorch emulation library for Microscaling (MX)-compatible data formats
Python10Updated 1 year ago