GitHunt
CH

chanind/claude-auto-research-synthsaebench

Letting Claude do Autonomous Research on Improving SAEs

This repository contains the code accompanying the post "Letting Claude do Autonomous Research on Improving SAEs".

We pointed Claude at the SynthSAEBench-16k synthetic SAE benchmark, told it to improve SAE performance (see TASK.md), and left it running in a Ralph Wiggum loop. By the next morning it had boosted F1 score from 0.88 to 0.95, and within another day — with occasional hints from us — it matched the logistic regression probe ceiling of 0.97 F1.

The SAE architecture that resulted is in the autoresearch directory along with tests written by Claude, and (one version) of the original task specification is in TASK.md. We ran claude code with the simple prompt "Follow the instructions in TASK.md." There's a sample sprint report that Claude generated from a sprint where it explored decreasing K during training in sample_sprint_report.pdf

Results

LISTA-Matryoshka vs Published Baselines on SynthSAEBench-16k

The LISTA-Matryoshka SAE that resulted from this Claude-driven autoresearch substantially outperforms all published baselines (BatchTopK, JumpReLU, Standard, Matryoshka, MatryoshkaTopK) on both F1 score and MCC across L0 values on SynthSAEBench-16k.

It's still not clear if this will work for LLM SAEs, but it did an amazing job at the task we set out for it!

Architecture

The final architecture is a LISTA-Matryoshka SAE with Decreasing K, combining several improvements Claude discovered or refined:

Improvement Description
LISTA encoder A single iteration of LISTA (neural approximation to sparse coding) as the SAE encoder. Claude found this paper autonomously.
Linearly decrease K Anneal K from a higher initial value down to the target during training.
Detach inner Matryoshka levels Detach gradients between Matryoshka levels except the outermost.
TERM loss Tilted ERM to up-weight high-loss samples (tilt ~2e-3). Also found by Claude.
Dynamic Matryoshka by frequency Sort latents by firing frequency before applying Matryoshka losses.

Repository structure

├── autoresearch/
│   ├── sae.py       # LISTA-Matryoshka SAE implementation (training + inference)
│   └── train.py     # Training script with predefined experiments and ablations
├── tests/
│   └── test_sae.py  # Test suite (~40 tests)
├── assets/          # Result plots
├── TASK.md          # Task specification used to guide Claude's research sprints
├── main.py          # Entry point
└── pyproject.toml   # Project config (requires sae-lens>=6.37.6)

Usage

Install dependencies:

uv sync

Run the default LISTA-Matryoshka recipe:

python autoresearch/train.py default

Run ablations (removes one component at a time):

python autoresearch/train.py ablations

Run an L0 sweep:

python autoresearch/train.py l0_sweep

Run tests:

uv run pytest tests/

Languages

Python100.0%

Contributors

MIT License
Created March 9, 2026
Updated March 16, 2026