Costa Huang
vwxyzjn
Exploiting physical rewards @periodiclabs. Prev: RL @allenai @huggingface.
Languages
Loading contributions...
Top Repositories
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
Create Encrypted Backups of Your Bitwarden Vault with Attachments
RLHF implementation details of OAI's 2019 codebase
Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
Repositories
209High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Create Encrypted Backups of Your Bitwarden Vault with Attachments
Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
verl: Volcano Engine Reinforcement Learning for LLMs
CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
The source code for the gym-microrts paper.
No description provided.
RLHF implementation details of OAI's 2019 codebase
No description provided.
Training library for Megatron-based models
slime is an LLM post-training framework for RL Scaling.
Pretrained universal neural network potential for charge-informed atomistic modeling https://chgnet.lbl.gov
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to failures and interruptions.
wheels for TransformerEngine
No description provided.
No description provided.
A PyTorch native platform for training generative AI models
Ongoing research training transformer models at scale
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
No description provided.
A high-throughput and memory-efficient inference and serving engine for LLMs
Parallel S3 and local filesystem execution tool.
Scalable toolkit for efficient model reinforcement
LeanRL is a fork of CleanRL, where selected PyTorch scripts optimized for performance using compile and cudagraphs.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Official Repo for Open-Reasoner-Zero
No description provided.
A2C is a special case of PPO!
No description provided.