Repos
17
Stars
10
Forks
0
Top Language
Python
Loading contributions...
Top Repositories
A tool to classify and statistic GPU kernel information.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
A machine learning compiler for GPUs, CPUs, and ML accelerators
A simple, performant and scalable Jax LLM!
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
Repositories
17Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
A machine learning compiler for GPUs, CPUs, and ML accelerators
A tool to classify and statistic GPU kernel information.
A simple, performant and scalable Jax LLM!
No description provided.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
JAX-Toolbox
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
Documentations for PaddlePaddle
paddle_allreduce_issues_reproduce
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
An NLP library with Awesome pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.
Pre-trained and Reproduced Deep Learning Models (『飞桨』官方模型库,包含多种学术前沿和工业场景验证的深度学习模型)
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
No description provided.
No description provided.
No description provided.