Inside Tiny Aya: Cross-Lingual Concept Representations in Multilingual Variants

Mechanistic interpretability analysis of how regional fine-tuning affects cross-lingual concept representations in Tiny Aya (3.35B) model variants.

Key Finding

Tiny Aya builds shared cross-lingual concepts mid-network (layers 18-20) then destroys them at the output layers. All variants follow this rise-peak-collapse architecture, but regional fine-tuning (Fire, Earth) determines how much alignment the model builds: +15% for Hindi, +40% for Amharic. For most Latin-script languages (French, Spanish, Swahili), alignment is near-maximal at the embedding layer — a tokenizer artifact, not a learned capability. Yoruba is a notable exception, suggesting script alone does not guarantee embedding-level alignment.

Cross-lingual alignment curves for three languages. Swahili (Latin script) shows 1.0 from layer 0 (tokenizer artifact). Hindi shows Fire/Earth building ~15% more alignment than Base. Amharic shows the largest gains (+40%), with Earth slightly ahead of Fire.

Quick Start

# 1. Install dependencies
uv sync

# 2. Build stimulus manifest (combines concept probes + FLORES-200)
uv run build_stimuli.py

# 3. Run activation extraction (requires GPU — use Modal for A10G)
modal run batch_runner.py
# OR for local CPU/MPS execution (slower):
uv run batch_runner.py --local

# 4. Compute cross-lingual alignment curves and commitment matrix
uv run concept_alignment.py

# 5. Classify failure cases
uv run failure_classifier.py

# 6. Generate visualizations
uv run viz/heatmap.py
uv run viz/alignment_curves.py

# 7. Launch interactive demo (demo mode — no GPU required)
uv run app.py --demo

Project Structure

aya-cross-lingual-probes/
├── data/
│   ├── concept_probes.json      # 20 medical concepts x 10 languages (hand-verified)
│   └── stimulus_manifest.json   # Combined stimulus set (generated by build_stimuli.py)
├── activations/                 # Residual stream activations (float16, not in git)
│   ├── base/
│   ├── fire/
│   └── earth/
├── results/
│   ├── alignment_curves.json    # Cosine similarity by layer (primary output)
│   ├── commitment_matrix.csv    # Commitment layer per (language x model)
│   └── failure_cases.csv        # Failure taxonomy instances
├── assets/
│   ├── tweet_three_panel.png    # Primary visualization (three-panel line chart)
│   ├── alignment_curve_avg_hi.png
│   ├── alignment_curve_avg_am.png
│   ├── annotated_rise_peak_collapse.png
│   └── concept_alignment_heatmap.png
├── viz/
│   ├── heatmap.py               # Commitment heatmap visualization
│   ├── alignment_curves.py      # Alignment curve line plots
│   └── annotated_curve.py       # Annotated rise-peak-collapse diagram
├── docs/
│   ├── adr/
│   │   ├── 001-framework.md     # HF Transformers activation extraction
│   │   ├── 002-model-loading.md # Sequential over parallel loading
│   │   ├── 003-stimuli.md       # FLORES-200 over machine translation
│   │   ├── 004-commitment-def.md# Commitment layer definition
│   │   └── 005-storage.md       # float16 over float32
│   └── failure_taxonomy.md      # 5 failure categories documented
├── model_loader.py              # HF Transformers model loading + activation extraction
├── build_stimuli.py             # Combines concept probes + FLORES into manifest
├── batch_runner.py              # Sequential activation extraction (Modal/local)
├── concept_alignment.py         # Primary analysis: alignment curves + commitment
├── failure_classifier.py        # Failure taxonomy classification
├── modal_config.py              # Modal compute configuration
├── app.py                       # Gradio interactive demo
├── p1_edge_cases.json           # 32 edge cases for language routing test suite
├── REPORT.md                    # Research report (arXiv-style)
├── PRODUCTION_DELTA.md          # Research-to-production gap analysis
├── pyproject.toml               # Dependencies (managed by uv)
└── README.md

Research Question

Primary: Does Tiny Aya develop language-agnostic concept representations mid-network, or does it just translate at the final layers? Does regional fine-tuning (Fire, Earth) increase or decrease cross-lingual concept sharing compared to Base?

Secondary: Where does Base outperform Fire/Earth and why? This produces a failure taxonomy for language routing edge cases.

Models

Variant	Region	Languages Emphasized
Base	Global	Broad multilingual coverage
Fire	South Asia	Hindi, Bengali, Tamil
Earth	Sub-Saharan Africa	Swahili, Amharic, Yoruba

Languages

English (en), Hindi (hi), Bengali (bn), Swahili (sw), Amharic (am), French (fr), Spanish (es), Arabic (ar), Yoruba (yo), Tamil (ta)

Outputs

Artifact	Description
REPORT.md	Full research report with methodology, results, and limitations
PRODUCTION_DELTA.md	Research-to-production gap analysis
docs/failure_taxonomy.md	5 failure categories for routing edge cases
`p1_edge_cases.json`	32 edge cases formatted for the language router test suite
`results/alignment_curves.json`	Cross-lingual alignment data (3 variants x 20 concepts x 9 languages x 37 layers)
`results/commitment_matrix.csv`	Commitment layer per (concept, language, variant)
`assets/tweet_three_panel.png`	Primary visualization: three-panel alignment curve line chart

How to Reproduce

Requirements

Python 3.11+
uv package manager
24GB RAM minimum (for sequential model loading)
GPU recommended for batch extraction (Modal A10G or local CUDA/MPS)

Full Pipeline

# Clone and setup
git clone <repo-url>
cd aya-cross-lingual-probes
uv sync

# Run batch extraction (20 min on Modal A10G, 4-6 hrs on CPU)
modal run batch_runner.py

# Run analysis
uv run concept_alignment.py
uv run failure_classifier.py

# Generate visualizations
uv run viz/heatmap.py
uv run viz/alignment_curves.py

# Launch demo
uv run app.py --demo

Compute Requirements

Stage	Hardware	Time
Data prep	Any CPU	< 5 min
Activation extraction	Modal A10G GPU	~20 min
Activation extraction	Apple Silicon / CPU	4-6 hrs
Analysis	Any CPU	< 10 min
Visualization	Any CPU	< 5 min

Connection to Language Routing

This analysis feeds directly into a language routing system. Specifically:

p1_edge_cases.json provides test cases for the router's edge case handling
The failure taxonomy informs routing rules (when to fall back to Base)
Alignment curve analysis shows that routing adds value only for non-Latin-script languages
Final-layer collapse means routing decisions should not rely on output embeddings

License

MIT

Author

Saumil Srivastava

s4um1l/aya-cross-lingual-probe

Inside Tiny Aya: Cross-Lingual Concept Representations in Multilingual Variants

Key Finding

Quick Start

Project Structure

Research Question

Models

Languages

Outputs

How to Reproduce

Requirements

Full Pipeline

Compute Requirements

Connection to Language Routing

License

Author

On this page

Languages

Contributors