autoresearch-local-llm

Run Karpathy's autoresearch with a local LLM instead of Claude Code. Zero API cost. Single GPU. Fully autonomous.

What This Is

Autoresearch is an experiment where an LLM autonomously modifies a GPT training script, runs 5-minute experiments, keeps what improves val_bpb, and discards what doesn't. The original uses Claude Code (cloud API) as the researcher.

This fork replaces Claude Code with Qwen 3.5 9B running locally via ollama. The LLM and training share the same GPU. No API keys, no cloud dependencies, no per-experiment cost.

What Changed

Component	Original	This Fork
AI Researcher	Claude Code (cloud API)	Qwen 3.5 9B via ollama (local)
Cost per experiment	API tokens	$0
Depth	8 layers	4 layers
Device batch size	128	64
Total batch tokens	524K	65K
Window pattern	SSSL	L

Model size is reduced because the LLM agent (~12GB VRAM) and training share the same GPU. The agent compensates by running more experiments.

Files

File	Purpose
`agent.py`	Local LLM agent — replaces Claude Code in the autoresearch loop
`train.py`	GPT training script (modified hyperparameters for shared VRAM)
`prepare.py`	Data preparation (unchanged from original)
`program.md`	Experiment instructions for the agent
`run_pipeline.sh`	Orchestrator: prepare data, create branch, start agent
`nosana_setup.sh`	Container bootstrap for Nosana GPU deployment
`job.json`	Nosana job definition

How It Works

ollama serves Qwen 3.5 9B locally on the GPU (~12GB VRAM)
agent.py reads train.py and experiment history, asks Qwen to propose a modification
Qwen outputs a modified train.py
Agent validates syntax, git commits, runs uv run train.py (5-min experiment)
If val_bpb improved — keep. If not — git reset.
Loop forever.

GPU (48GB VRAM)
├── Qwen 3.5 9B via ollama (~12GB)
└── GPT training via train.py (~35GB)
    ├── Propose modification
    ├── Validate syntax
    ├── Run 5-min experiment
    ├── Keep if val_bpb improved
    └── Discard if not → loop

Deploy on Nosana

Option 1: Dashboard

Go to nosana.io dashboard
Create a new deployment, select NVIDIA Pro 6000 (SOC2)
Click Configure and paste the contents of job.json
Create Deployment

Option 2: CLI

nosana job post --file job.json --market nvidia-pro6000 --timeout 480 --wait

Run Locally (if you have a GPU)

# Install ollama and pull the model
curl -fsSL https://ollama.com/install.sh | sh
ollama serve &
ollama pull qwen3.5:9b

# Clone and setup
git clone https://github.com/SohniSwatantra/autoresearch-local-llm.git
cd autoresearch-local-llm
pip install uv
uv sync

# Run
bash run_pipeline.sh

Requires a GPU with at least 24GB VRAM (48GB recommended for full-size experiments).

Cost

Setup	Cost per experiment	100 experiments
Original (Claude Code API)	~$0.05-0.20	$5-20
This fork (Nosana Pro 6000)	$0.08 (5min at $1/hr)	~$8 total
This fork (own GPU)	$0	$0

Configuration

Edit agent.py to change the local LLM:

MODEL = "qwen3.5:9b"  # Any ollama model works

Edit train.py hyperparameters to adjust for your GPU's available VRAM:

DEPTH = 4              # Increase if you have more VRAM
DEVICE_BATCH_SIZE = 64 # Increase if you have more VRAM
TOTAL_BATCH_SIZE = 2**16

Credits

karpathy/autoresearch — original framework
Qwen 3.5 — local LLM
ollama — local LLM serving
Nosana — decentralized GPU compute

msitarzewski/autoresearch-local-llm