GitHunt

Costa Huang

vwxyzjn

Exploiting physical rewards @periodiclabs. Prev: RL @allenai @huggingface.

@huggingface
Philadelphia, PA

Languages

Python89%Go5%Shell5%

Loading contributions...

Top Repositories

Repositories

209
VW
vwxyzjn/cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

Python9.3k1.0kUpdated 1 hour ago
a2cactor-criticadvantage-actor-criticaleatarideep-learningdeep-reinforcement-learninggymmachine-learningphasic-policy-gradientppoproximal-policy-optimizationpythonpytorchreinforcement-learningwandb
VW
vwxyzjn/portwarden

Create Encrypted Backups of Your Bitwarden Vault with Attachments

Go63236Updated 1 week ago
bitwardendockerencryptionk8s
VW
vwxyzjn/invalid-action-masking

Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Python16722Updated 1 week ago
VW
vwxyzjn/ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

Python921122Updated 1 week ago
VW
vwxyzjn/verlFork

verl: Volcano Engine Reinforcement Learning for LLMs

10Updated 1 month ago
VW
vwxyzjn/cleanba

CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL

Python12211Updated 1 month ago
VW
vwxyzjn/gym-microrts-paper

The source code for the gym-microrts paper.

Python424Updated 2 months ago
VW
vwxyzjn/summarize_from_feedback_details

No description provided.

Python16021Updated 2 months ago
VW
vwxyzjn/lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase

Python19712Updated 2 months ago
VW
vwxyzjn/ppo-atari-metrics

No description provided.

Python70Updated 2 months ago
VW
vwxyzjn/Megatron-BridgeFork

Training library for Megatron-based models

Python00Updated 3 months ago
VW
vwxyzjn/slimeFork

slime is an LLM post-training framework for RL Scaling.

00Updated 3 months ago
VW
vwxyzjn/chgnetFork

Pretrained universal neural network potential for charge-informed atomistic modeling https://chgnet.lbl.gov

00Updated 3 months ago
VW
vwxyzjn/nvidia-resiliency-extFork

NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to failures and interruptions.

00Updated 4 months ago
VW
vwxyzjn/TransformerEngine_wheelsFork

wheels for TransformerEngine

Python00Updated 4 months ago
VW
vwxyzjn/minimal-uv-deep-ep-gemm-installation

No description provided.

Python24Updated 4 months ago
VW
vwxyzjn/envpool-cleanrl

No description provided.

Python91Updated 5 months ago
VW
vwxyzjn/torchtitanFork

A PyTorch native platform for training generative AI models

Python00Updated 5 months ago
VW
vwxyzjn/Megatron-LMFork

Ongoing research training transformer models at scale

Python00Updated 5 months ago
VW
vwxyzjn/Megatron-MoE-ModelZooFork

Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.

Shell00Updated 5 months ago
VW
vwxyzjn/apex_wheelsFork

No description provided.

00Updated 6 months ago
VW
vwxyzjn/vllmFork

A high-throughput and memory-efficient inference and serving engine for LLMs

Python00Updated 6 months ago
VW
vwxyzjn/s5cmdFork

Parallel S3 and local filesystem execution tool.

00Updated 6 months ago
VW
vwxyzjn/RLFork

Scalable toolkit for efficient model reinforcement

00Updated 6 months ago
VW
vwxyzjn/LeanRLFork

LeanRL is a fork of CleanRL, where selected PyTorch scripts optimized for performance using compile and cudagraphs.

80Updated 8 months ago
VW
vwxyzjn/transformersFork

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

20Updated 8 months ago
VW
vwxyzjn/Open-Reasoner-ZeroFork

Official Repo for Open-Reasoner-Zero

20Updated 8 months ago
VW
vwxyzjn/reamFork

No description provided.

00Updated 8 months ago
VW
vwxyzjn/a2c_is_a_special_case_of_ppo

A2C is a special case of PPO!

Python222Updated 9 months ago
VW
vwxyzjn/costa-utils

No description provided.

Python100Updated 11 months ago

Gists

Recent Activity