Repos
93
Stars
2
Forks
0
Top Language
Python
Loading contributions...
Top Repositories
A RLHF Framework Enhances OpenRLHF.
Puzzles for learning Triton
Infrastructure for building a supervised, self-improving agent organization. Run Claude Code from Feishu & Telegram with shared memory, agent factory, task scheduling, and an agent communication bus.
JAX - A curated list of resources https://github.com/google/jax
A curated list of resources related to linear attention mechanisms.
Repositories
93Puzzles for learning Triton
Infrastructure for building a supervised, self-improving agent organization. Run Claude Code from Feishu & Telegram with shared memory, agent factory, task scheduling, and an agent communication bus.
JAX - A curated list of resources https://github.com/google/jax
No description provided.
A curated list of resources related to linear attention mechanisms.
Efficient Long-context Language Model Training by Core Attention Disaggregation
No description provided.
🚀 Efficient implementations of state-of-the-art linear attention models
A PyTorch native platform for training generative AI models
一个基于nano banana pro🍌的原生AI PPT生成应用,迈向真正的"Vibe PPT"; 支持上传任意模板图片;上传任意素材&智能解析;一句话/大纲/页面描述自动生成PPT;口头修改指定区域、一键导出 - An AI-native PPT generator based on nano banana pro🍌
An interface library for RL post training with environments.
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Minimalistic 4D-parallelism distributed training framework for education purpose
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
A RLHF Framework Enhances OpenRLHF.
MLIR-based partitioning system
The PPO implementation for base station traffic decision.
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Flax is a neural network library for JAX that is designed for flexibility.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
Distributed Triton for Parallel Systems
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
Tensor library for machine learning
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
No description provided.
Official Repo for Open-Reasoner-Zero
My learning notes/codes for ML SYS.
A PyTorch Native LLM Training Framework
MLIR For Beginners tutorial