Yao Matrix
yao-matrix
A realistic idealist.
Languages
Repos
66
Stars
60
Forks
58
Top Language
Python
Loading contributions...
Top Repositories
End-to-end speech recognition using TensorFlow
TF rnn ops w/ MKL-DNN kernel
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
A high-throughput and memory-efficient inference and serving engine for LLMs
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Repositories
66A high-throughput and memory-efficient inference and serving engine for LLMs
End-to-end speech recognition using TensorFlow
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
No description provided.
Offline optimization of your disaggregated Dynamo graph
A Datacenter Scale Distributed Inference Serving Framework
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
Train transformer language models with reinforcement learning.
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Achieve state of the art inference performance with modern accelerators on Kubernetes
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
A pytorch quantization backend for optimum
APOLLO: SGD-like Memory, AdamW-level Performance; MLSys'25 Oustanding Paper Honorable Mention
No description provided.
LOMO: LOw-Memory Optimization
Accessible large language models via k-bit quantization for PyTorch.
Public repo for HF blog posts
TF rnn ops w/ MKL-DNN kernel
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Notebooks using the Hugging Face libraries 🤗
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Tools for merging pretrained large language models.
Efficient Triton Kernels for LLM Training
Large Language Model Text Generation Inference
Inference server benchmarking tool
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
LLM inference in C/C++
All about new to the 抱抱脸 localization volunteer collaboration team.
🤗 Optimum Intel: Accelerate inference with Intel optimization tools