eliashossain001/gpt-oss-20b-finetune
GPT-OSS 20B — LoRA SFT (Unsloth)
Setup# GPT-OSS-20B Finetuning with Unsloth (LoRA + MoE)
This repository contains a complete VS Code–ready pipeline for finetuning the GPT-OSS-20B Mixture-of-Experts (MoE) architecture using Unsloth, optimized with LoRA parameter-efficient fine-tuning.
Architecture Overview
Image credit to @ Xiaoli Shen, Senior AI/ML Specialist at Microsoft
The GPT-OSS-20B is a Mixture-of-Experts Transformer model that integrates:
- Embedding Layer — Converts input tokens into dense vector embeddings.
- 36 Transformer Blocks, each containing:
- Attention Block
- RMSNorm normalization
- QKV linear projection
- Rotary Position Embeddings (RoPE)
- GQA / sliding window attention (even layers)
- Output projection + residual connection
- MoE Block
- RMSNorm
- Expert router gate (Top-k=4)
- SwiGLU-activated feed-forward layers (two MLPs per expert)
- Weighted sum over selected experts
- Residual connection
- Attention Block
- Final RMSNorm
- Output Linear Projection — Produces logits for the vocabulary.
This design enables high scalability while reducing compute usage via sparse activation in MoE layers.
Training
You can fine-tune GPT-OSS-20B with LoRA adapters in 4-bit precision:
python scripts/train.py --config configs/train.yamlTraining configuration is stored in configs/train.yaml, including LoRA parameters, dataset settings, and SFTTrainer arguments.
Example Dataset
We use the HuggingFaceH4/Multilingual-Thinking dataset, which contains reasoning chain-of-thought examples in multiple languages.
🔍 Inference Example
Once trained, run inference with:
python scripts/infer.py --base unsloth/gpt-oss-20b --adapter outputs --user "Solve x^5 + 3x^4 - 10 = 3." --reasoning_effort medium --max_new_tokens 128Sample Output:
User: Solve x^5 + 3x^4 - 10 = 3
We are given the equation:
x^5 + 3x^4 - 10 = 3.
Rewriting:
x^5 + 3x^4 - 13 = 0
This is a fifth-degree polynomial, which generally does not have a simple closed-form solution. Numerical methods such as Newton-Raphson can approximate the root...
⚙️ Features
- 4-bit Quantization with
bitsandbytesfor lower memory footprint. - LoRA Fine-tuning with Unsloth optimizations.
- Modularized data, model, and training scripts for maintainability.
- Reasoning Effort Control: Low / Medium / High, affecting thinking depth before output.
📂 Project Structure
gpt-oss-finetune/
├── configs/ # Training config YAML
├── scripts/ # Train / inference entry points
├── src/gpt_oss_finetune/ # Core training logic
├── outputs/ # Saved LoRA adapters + tokenizer
├── requirements.txt
├── README.md
└── .gitignore
📜 License
This project is licensed under the MIT License.
Installation and Setup
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtIf you need a specific CUDA build of PyTorch, install per https://pytorch.org/get-started/locally/ and then edit/remove
torchinrequirements.txt.
Train
python scripts/train.py --config configs/train.yamlOutputs (LoRA adapter + tokenizer) land in outputs/.
Inference (with adapter)
python scripts/infer.py --base unsloth/gpt-oss-20b --adapter outputs --user "Solve x^5 + 3x^4 - 10 = 3." --reasoning_effort medium --max_new_tokens 128Notes
- Uses 4-bit quantization via
bitsandbytesand Unsloth memory optimizations. reasoning_effortis passed totokenizer.apply_chat_template; we fall back if unsupported.- To deploy merged weights, modify
infer.pyto callmerge_and_unload()after loading the adapter.
🙌 Acknowledgements
- Unsloth AI for optimization library.
- Hugging Face for datasets and model hosting.
👨💼 Author
Elias Hossain
Machine Learning Researcher | PhD Student | AI x Reasoning Enthusiast
