Reinforcement Learning From Human Feedback

A conceptual and hands-on introduction to tuning and evaluating large language models (LLMs) using Reinforcement Learning from Human Feedback.

Get a conceptual understanding of Reinforcement Learning from Human Feedback (RLHF), as well as the datasets needed for this technique
Fine-tune the Llama 2 model using RLHF with the open source Google Cloud Pipeline Components Library
Evaluate tuned model performance against the base model with evaluation methods

ArmaanSethi/Reinforcement-Learning-From-Human-Feedback