Yufeng Li

yufenglee

@Microsoft

Sunnyvale, CA

Organizations

Languages

Python50%C++50%

Loading contributions...

Top Repositories

Open Neural Network Exchange

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3

Benchmark SGLang on SLURM

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

flash-attention

Fast and memory-efficient exact attention

FasterTransformer

Transformer related optimization, including BERT, GPT

Repositories

24

yufenglee/InferenceXFork

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3

00Updated 3 weeks ago

yufenglee/srt-slurmFork

Benchmark SGLang on SLURM

00Updated 3 weeks ago

yufenglee/EAGLEFork

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

00Updated 3 months ago

yufenglee/flash-attentionFork

Fast and memory-efficient exact attention

Python00Updated 4 months ago

yufenglee/FasterTransformerFork

Transformer related optimization, including BERT, GPT

C++00Updated 5 months ago

yufenglee/unslothFork

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

00Updated 6 months ago

yufenglee/open-r1Fork

Fully open reproduction of DeepSeek-R1

00Updated 7 months ago

yufenglee/cutlassFork

CUDA Templates for Linear Algebra Subroutines

C++00Updated 10 months ago

yufenglee/transformersFork

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

00Updated 1 year ago

yufenglee/bitsandbytesFork

8-bit CUDA functions for PyTorch

00Updated 2 years ago

yufenglee/tritonFork

Development repository for the Triton language and compiler

00Updated 3 years ago

yufenglee/llamaFork

Inference code for LLaMA models

00Updated 2 years ago

yufenglee/neural-speedFork

An innovation library for efficient LLM inference via low-bit quantization and sparsity

00Updated 2 years ago

yufenglee/vllmFork

A high-throughput and memory-efficient inference and serving engine for LLMs

Python00Updated 2 years ago

yufenglee/whisperFork

Robust Speech Recognition via Large-Scale Weak Supervision

Python00Updated 2 years ago

yufenglee/optimumFork

🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

Python00Updated 3 years ago

yufenglee/diffusersFork

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch

00Updated 3 years ago

yufenglee/docker_files

No description provided.

00Updated 3 years ago

yufenglee/mmperfFork

MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.

00Updated 4 years ago

yufenglee/onnxruntimeFork

ONNX Runtime: cross-platform, high performance scoring engine for ML models

C++00Updated 3 years ago

yufenglee/onnxFork

Open Neural Network Exchange

C++10Updated 3 years ago

yufenglee/pytorchFork

Tensors and Dynamic neural networks in Python with strong GPU acceleration

00Updated 6 years ago

yufenglee/tutorialsFork

Tutorials for creating and using ONNX models

00Updated 6 years ago

yufenglee/Windows-Machine-LearningFork

Samples for Windows ML.

00Updated 7 years ago

Gists

Recent Activity