Perkz Zheng
PerkzZheng
Currently work as an AI Technology Developer Engineer @ Nvidia
Languages
Repos
10
Stars
0
Forks
1
Top Language
Python
Loading contributions...
Repositories
10FlashInfer: Kernel Library for LLM Serving
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
CUDA Templates for Linear Algebra Subroutines
No description provided.
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Transformer related optimization, including BERT, GPT
๐ค Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Run MNIST inference in Apache Flink
This repository contains compilation of some chosen topics.