ShubhammS18/finetune-json-extractor
Fine-tuned Qwen2.5-7B on Fireworks AI for structured JSON extraction from job postings. LoRA SFT + DPO | FastAPI | +47% over baseline.
finetune-json-extractor
finetune-json-extractor
Fine-tuned Qwen2.5-7B on Fireworks AI to extract structured JSON from unstructured job postings — using LoRA SFT + DPO. Includes a FastAPI inference endpoint, custom evaluation suite, and full benchmark results.
What it does
Job postings are messy. They're written in a dozen different formats, bury salary info in paragraphs, and mix requirements with marketing copy. This project fine-tunes a 7B model to reliably extract structured fields from any job posting — title, company, location, remote status, salary range, required skills, deadline, and benefits — and return clean JSON every time.
The fine-tuned model significantly outperforms a stronger general-purpose baseline (Qwen3-8B) on this specific extraction task, which is the whole point of fine-tuning.
Demo
Extracting a Stripe job posting — all fields populated:
Handling a minimal posting — refusal accuracy in action:
Results
Evaluated on 100 holdout examples never seen during training:
| Metric | Baseline (Qwen3-8B) | SFT | DPO | SFT Δ | DPO Δ |
|---|---|---|---|---|---|
| JSON Validity | 53.0% | 100.0% | 100.0% | +47.0% | +47.0% |
| Field Exact Match | 27.3% | 74.3% | 74.3% | +47.1% | +47.1% |
| Refusal Accuracy | 45.0% | 95.0% | 94.0% | +50.0% | +49.0% |
JSON Validity — did the model output parseable JSON? Baseline models often wrap output in markdown or reasoning blocks. Fine-tuned model hits 100%.
Field Exact Match — for each field present in the ground truth, did the model extract the correct value? String comparison is case-insensitive. Lists are order-insensitive set comparison.
Refusal Accuracy — for fields absent from the posting, did the model return null instead of hallucinating a value? This is the hardest metric to get right and the most important one for production use.
The baseline uses Qwen3-8B (a newer, stronger model than the fine-tuned Qwen2.5-7B) because Qwen2.5-7B is not available on Fireworks serverless inference. Using a stronger baseline makes the fine-tuning results more conservative and honest — the fine-tuned smaller model still outperformed it by +47% on extraction accuracy.
Training curves
SFT training run:
DPO training run:
Architecture
┌─────────────────────────────────────────────────────────────┐
│ DATA PIPELINE │
│ │
│ Claude Haiku ──► 1,200 synthetic job postings │
│ + ground truth JSON labels │
│ │
│ Split: 800 SFT train │ 100 SFT val │ 100 holdout │
│ 160 DPO train │ 40 DPO val │
└─────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ TRAINING (Fireworks AI) │
│ │
│ Qwen2.5-7B-Instruct │
│ │ │
│ ▼ LoRA SFT (800 examples, 3 epochs) │
│ json-extractor-sft-v2 │
│ │ │
│ ▼ DPO (160 preference pairs) │
│ json-extractor-dpo-v2 ◄── final model │
└─────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ INFERENCE (Fireworks AI) │
│ │
│ Base deployment (Qwen2.5-7B) + Multi-LoRA addon │
│ │ │
│ ▼ │
│ FastAPI ──► POST /extract ──► structured JSON │
└─────────────────────────────────────────────────────────────┘
Stack
| Component | Tool |
|---|---|
| Dataset generation | Claude Haiku (claude-haiku-4-5) |
| Base model | Qwen2.5-7B-Instruct |
| Training platform | Fireworks AI |
| Training method | LoRA SFT → DPO |
| Experiment tracking | Weights & Biases |
| Inference | Fireworks AI Multi-LoRA deployment |
| API | FastAPI + uvicorn |
| Evaluation | Custom Python suite |
Project structure
finetune-json-extractor/
├── app/
│ ├── config.py # pydantic-settings config
│ ├── extractor.py # core inference function
│ └── api.py # FastAPI endpoints
├── data/
│ ├── generate_dataset.py # synthetic data generation via Claude Haiku
│ ├── convert_dpo.py # converts SFT format to Fireworks DPO format
│ └── *.jsonl # generated datasets (gitignored)
├── training/
│ ├── launch_sft.py # launches SFT job on Fireworks
│ └── launch_dpo.py # launches DPO job on Fireworks
├── evaluation/
│ ├── metrics.py # json_validity, field_exact_match, refusal_accuracy
│ ├── eval_baseline.py # evaluates Qwen3-8B (no fine-tuning)
│ ├── eval_sft.py # evaluates SFT model
│ ├── eval_dpo.py # evaluates DPO model
│ ├── generate_report.py # produces benchmark_report.md
│ └── results/
│ ├── baseline_scores.json
│ ├── sft_scores.json
│ ├── dpo_scores.json
│ └── benchmark_report.md
├── assets/
│ ├── swagger_stripe.png
│ ├── swagger_berlin.png
│ ├── wandb_sft.png
│ └── wandb_dpo.png
├── .env.example
├── requirements.txt
└── README.md
API
GET /health
Returns model config and server status.
{
"status": "ok",
"dpo_model": "accounts/.../models/json-extractor-dpo-v2#accounts/.../deployments/...",
"sft_model": "accounts/.../models/json-extractor-sft-v2#accounts/.../deployments/..."
}POST /extract
Takes a raw job posting, returns structured JSON.
Request:
{
"posting": "Senior Data Engineer at Stripe, Remote US. 6+ years experience with Python, Apache Spark, dbt, Snowflake, Airflow. Salary $180,000-$220,000. Apply before May 1 2026."
}Response:
{
"success": true,
"extracted": {
"title": "Senior Data Engineer",
"company": "Stripe",
"location": null,
"remote": true,
"experience_years": 6,
"salary_min": 180000,
"salary_max": 220000,
"skills": ["Python", "Apache Spark", "dbt", "Snowflake", "Airflow"],
"deadline": "2026-05-01",
"benefits": ["Medical coverage", "Dental coverage", "Vision coverage", "401k", "Unlimited PTO"]
},
"error": null,
"latency_ms": 3199,
"model_used": "accounts/.../models/json-extractor-dpo-v2#accounts/.../deployments/..."
}Setup
1. Clone and install
git clone https://github.com/ShubhammS18/finetune-json-extractor
cd finetune-json-extractor
python -m venv myenv
myenv\Scripts\activate # Windows
pip install -r requirements.txt2. Configure environment
cp .env.example .env
# Fill in your API keysRequired keys in .env:
FIREWORKS_API_KEY=fw_...
ANTHROPIC_API_KEY=sk-ant-...
WANDB_API_KEY=...
FIREWORKS_ACCOUNT_ID=your-account-id
3. Deploy the model on Fireworks
The fine-tuned models are LoRA adapters — they need a base model deployment to run inference:
- Go to
app.fireworks.ai→ Deployments → Create Deployment - Select Qwen2.5-7B-Instruct → A100 instance → Deploy
- Once Ready, click Edit Deployment → enable Multi-LoRA
- Register the LoRA addon via API:
import requests
requests.post(
'https://api.fireworks.ai/v1/accounts/YOUR_ACCOUNT/deployedModels',
headers={'Authorization': 'Bearer YOUR_FIREWORKS_API_KEY'},
json={
'model': 'accounts/YOUR_ACCOUNT/models/json-extractor-dpo-v2',
'deployment': 'accounts/YOUR_ACCOUNT/deployments/YOUR_DEPLOYMENT_ID',
'displayName': 'dpo-eval',
},
)- Update
.envwith the combined model + deployment ID:
DPO_MODEL_ID=accounts/YOUR_ACCOUNT/models/json-extractor-dpo-v2#accounts/YOUR_ACCOUNT/deployments/YOUR_DEPLOYMENT_ID
4. Run the API
uvicorn app.api:app --reloadOpen http://localhost:8000/docs for the Swagger UI.
⚠️ Remember to delete the deployment when done — it bills per hour.
Reproducing the training
⚠️ Running these scripts incurs real GPU costs on Fireworks AI. SFT ~$0.70, DPO ~$0.54. Do not run twice.
# Step 1 — Generate dataset (~35 min, ~$4 Anthropic credits)
python data/generate_dataset.py
# Step 2 — Upload datasets to Fireworks via UI
# Go to app.fireworks.ai → Datasets → Create Dataset
# Upload each of the 4 files: sft_train.jsonl, sft_val.jsonl, dpo_train.jsonl, dpo_val.jsonl
# Copy the dataset IDs into .env as SFT_TRAIN_DATASET_ID, SFT_VAL_DATASET_ID, etc.
# Step 3 — Launch SFT training
python training/launch_sft.py
# Step 4 — Launch DPO training (after SFT completes)
python training/launch_dpo.py
# Step 5 — Run evaluation
python evaluation/eval_baseline.py
python evaluation/eval_sft.py
python evaluation/eval_dpo.py
python evaluation/generate_report.pyCost breakdown
| Phase | Cost |
|---|---|
| Dataset generation (Claude Haiku) | ~$4.00 |
| SFT training (Fireworks) | ~$0.70 |
| DPO training (Fireworks) | ~$0.54 |
| Evaluation + inference testing | ~$2.00 |
| Total | ~$7.24 |
Built by Shubham Suradkar as part of a portfolio of production-grade AI engineering projects.



