yolocc

YOLO training toolkit — dataset validation, HP tuning, intelligent experiment
automation, CVAT active learning, ONNX export. 11 CLI commands + 14 Claude Code skills.

What It Does

yolocc manages the full YOLO training lifecycle — from dataset preparation through deployment. It combines standard CLI tools for every step of the pipeline with an AI-driven experiment loop that diagnoses bottlenecks and optimizes hyperparameters autonomously.

Dataset prep → Training → HP optimization → Analysis → Active learning → Export

Quickstart

1. Install

pip install -e .

2. Prepare Your Dataset

You need a YOLO-format dataset:

your_dataset/
├── images/
│   ├── train/
│   └── val/
├── labels/
│   ├── train/
│   └── val/
└── data.yaml

3. Initialize Project

In Claude Code:

/setup

Or manually — copy yolo-project.example.yaml to yolo-project.yaml and edit.

4. Configure Your Training Plan

Edit training-plan.md — defines goals, constraints, and allowed actions:

## Goal
Maximize mAP50-95.

## Hard Constraints
- Budget: 10 experiments, 50 epochs each
- Model: yolo11n
- Don't modify the dataset

## Allowed Actions
### HP Optimization (via model.tune)
- Presets: lr, augmentation, loss, optimizer, all

5. Run Experiments

In Claude Code:

/experiment

Or via CLI:

yolo-experiment baseline --budget 5
yolo-experiment tune --space lr --iterations 20 --epochs 10
yolo-experiment run --strategy learning_rate --budget 10
yolo-experiment summary

6. Review Results

cat experiments/summary.md

Or in Claude Code:

/analyze

CLI Commands

Command	Purpose
`yolo-train`	Train a model
`yolo-finetune`	Fine-tune with transfer learning
`yolo-validate`	Validate dataset integrity
`yolo-experiment`	Run experiments + HP tuning
`yolo-analyze`	Active learning analysis
`yolo-export`	Export to ONNX
`yolo-split`	Stratified train/val/test split
`yolo-clean`	Remove duplicates and corrupted files
`yolo-merge`	Merge annotation files
`yolo-autolabel`	Auto-annotate with trained model
`yolo-cvat`	CVAT integration (pull/push/deploy)
`yolocc-doctor`	Preflight health check (env, deps, config)

All commands support --help.

Claude Code Skills

Skill	Purpose
`/experiment`	Experiment loop (assess → tune → report)
`/analyze`	Training analysis + recommendations
`/setup`	Project initialization wizard
`/review-dataset`	Dataset quality audit
`/train`	Managed training with reporting
`/review-annotations`	AI-assisted annotation review
`/annotate`	Claude vision annotation correction
`/cvat-pull`	Pull annotations from CVAT
`/cvat-push`	Push uncertain images to CVAT for review
`/cvat-deploy`	Deploy trained model to CVAT via Nuclio
`/compare-models`	Compare 2+ models side-by-side
`/benchmark`	Profile model speed, FPS, and size
`/explain-results`	Plain-English training report
`/active-learning`	Full active learning loop

Intelligent Experiment Automation

The /experiment skill is an AI-driven optimization loop. It reads your training plan, diagnoses what's limiting model performance, and acts:

Reads context: your training plan (goals + constraints), experiment history, dataset profile
Assesses bottleneck: data quality issue? architecture mismatch? unoptimized HPs?
Acts: delegates HP search to model.tune(), swaps architecture configs, runs strategic experiments
Reports: session report with before/after metrics, per-class AP deltas, and next-step recommendations

Guardrails: checkpoint backup before architecture changes, immutable original data, hard constraint enforcement (budget, epochs, regression limits).

Full walkthrough: see WORKFLOW.md.

CVAT Integration

yolocc integrates with CVAT for the full active learning loop: annotate, train, find uncertain predictions, get human review, and retrain.

Prerequisites

Self-hosted CVAT with Nuclio (see CVAT setup guide)
CVAT_ACCESS_TOKEN environment variable (create a Personal Access Token in CVAT UI)
Install with CVAT extras: pip install -e ".[cvat]"

Active Learning Loop

┌─────────────────────────────────────────────────────┐
│                                                     │
│   Annotate in CVAT                                  │
│        ↓                                            │
│   yolo-cvat pull         (pull annotations)         │
│        ↓                                            │
│   yolo-train / /experiment  (train model)           │
│        ↓                                            │
│   yolo-analyze           (find uncertain images)    │
│        ↓                                            │
│   yolo-cvat push         (push to CVAT for review)  │
│        ↓                                            │
│   Human reviews in CVAT → repeat                    │
│                                                     │
└─────────────────────────────────────────────────────┘

Deploy a Trained Model to CVAT

yolo-cvat deploy --model best.pt

This packages your model as a Nuclio serverless function and deploys it to CVAT, enabling auto-annotation directly in the CVAT UI.

Configuration

`yolo-project.yaml`

The project config file controls training defaults, dataset paths, and integrations:

project:
  name: my-project
  description: "Custom detection project"

classes:
  0: cat
  1: dog

defaults:
  base_model: yolo11n.pt        # Ultralytics model
  epochs: 100                    # Max training epochs
  imgsz: 640                    # Input resolution
  dataset: datasets/my_data     # Path to YOLO-format dataset

# Named variants for fine-tuning (optional)
variants:
  indoor:
    dataset: datasets/indoor
    epochs: 30

# CVAT integration (optional — pip install "yolocc[cvat]")
cvat:
  url: http://localhost:8080
  project_id: 1

Environment Variables

Variable	Purpose
`YOLO_WORKSPACE_PATH`	Override workspace directory (default: current directory)
`CVAT_ACCESS_TOKEN`	Personal access token for CVAT API

Ecosystem

yolocc is part of a three-repo toolkit for object detection workflows:

Repo	Purpose
yolocc	Training, experimentation, active learning
CVAT Setup	Self-hosted annotation platform with Nuclio auto-annotation
Dataset Converter	Convert YOLO datasets for CVAT/Roboflow import

What You Need

Requirement	Why
YOLO dataset (images + labels + data.yaml)	Data to train on
GPU (NVIDIA, 4GB+ VRAM)	Training requires GPU
Python 3.10+ with torch + ultralytics	Dependencies
Claude Code (optional)	For guided workflows via skills

File Map

File	Who	Purpose
`training-plan.md`	You edit	Training goals + constraints
`yolo-project.yaml`	You edit	Project config
`experiments/summary.md`	Generated	Experiment dashboard
`experiments/session_*.md`	Generated	Session reports
`experiments/analysis.md`	Generated	Recommendations

Contributing

See CONTRIBUTING.md.

License

MIT — see LICENSE.

MacroMan5/autotrain-yolo