Polyarray Metacompiler Specification
Status: Normative
Version: 0.9-candidate
Document precedence: When documents disagree, Normative specs take precedence over Informational/Conceptual drafts. entities.md is conceptual (structural intent); the canonical field/derivation contracts live in annotation-fields.md, derivation-registry.md, and pipeline.md. Version numbers in those normative docs SHOULD remain aligned; mismatches must be treated as defects to reconcile.
Overview
The Polyarray Metacompiler generates native-feeling code across programming paradigms from semantic specifications. It separates what computation means (semantic forms) from how that translates to syntax (stencils).
The system currently targets three computational domains—BLAS, LAPACK, and SLEEF—but the architecture is domain-agnostic. The same pipeline handles vector reductions, matrix factorizations, and elementwise transcendentals.
Core separations:
| Concern | Mechanism |
|---|---|
| What a method does | Facts — language-agnostic behavioral statements |
| How it maps to syntax | Stencils — paradigm-specific renderers |
| Where implementations come from | Providers — backend resolution with fallback chains |
| Which representations to use | Substrates — language idioms and type mappings |
Supports Algol-family languages (Python, Rust, Go, C) and Lisp-family languages from the same semantic forms. A sequence_form lowers to a statement block in Python or a let chain in Lisp—same IR, different renderings.
Domains
| Domain | Operations | Provider Options |
|---|---|---|
| BLAS | Level 1-3 linear algebra (dot, gemm, gemv, ...) | OpenBLAS, MKL, BLIS, Accelerate |
| LAPACK | Matrix factorization and solvers (gesv, getrf, potrf, ...) | OpenBLAS, MKL, LAPACKE |
| SLEEF | Vectorized transcendentals (exp, log, sin, tanh, sqrt, ...) | SLEEF, libmvec, compiler builtins |
Planned: FFT (FFTW, MKL FFT), RNG (PCG, xoshiro).
Each domain uses the same pipeline: YAML annotations → derivation enrichment → semantic forms → stencil rendering. Domain-specific knowledge lives in annotations, not codegen.
Specification Documents
entities.md - Entity Model (Normative)
Defines the fourteen core entities and their relationships:
- Library, Operation, Variant, DType, Interface, InterfaceBinding, Protocol, Method (what exists)
- Provider, NativeLibrary, Substrate (who provides it, how it's expressed)
- Language, Binding, Platform (target environment)
Key concept: GenerationTarget = (PackageSpec, Language, Binding, Platform)
Critical FFI correctness: The NativeLibrary entity includes the Integer Model (ILP64 vs LP64) section, which is essential for correct FFI code generation. Mismatched integer models cause silent data corruption.
facts.md - Facts System (Normative)
The heart of the metacompiler. Facts are language-agnostic statements about method behavior:
- Precondition facts (what must hold)
- Parameter facts (what inputs)
- Operation facts (what to call)
- Result facts (what to return)
- Call pattern facts (how to call — computed from binding)
Key insight: Templates don't compute. They render facts that enrichment computed.
composition.md - Composition Model (Normative)
Defines how layers compose:
- FFI Layer (raw bindings to C ABI)
- Wrapper Layer (ergonomic dispatch functions)
- Array Layer (high-level API using facts)
Key insight: Layers are independent artifacts. You can generate just FFI, or FFI + wrappers, or all three.
pipeline.md - Generation Pipeline (Normative)
Defines the transformation stages:
- Loading - YAML → internal representation (mechanical)
- Enrichment - Derive computable fields (mechanical)
- Availability - What can be generated for this target (mechanical)
- Re-Enrichment - Refine derived fields based on availability (mechanical)
- Form Lowering - Transform paradigm-neutral forms to paradigm-specific forms (mechanical)
- Rendering - Stencils + Doc algebra → source code (mechanical)
Runtime Forms note: Algorithms in schema/runtime.yml are rendered in Stage 6 via runtime stencils and are not subject to Form Lowering.
Runtime index type: int in runtime forms is the runtime index type (int64). Arrays store shape, strides, and offsets in this type; conversions to language-native indexing or BLAS types happen only at boundaries via checked helpers.
Key insight: ALL stages are mechanical. Intelligence lives in specs, not codegen.
annotation-fields.md - Primitives and Derivation (Normative)
Defines what must be stated vs derived:
- Primitive fields (level, pattern, dtype, role)
- Principled derivations (tier from level, test_shape from pattern+return_type)
- The derivation registry
Key insight: Derivation is allowed if it works from primitives via documented rules, not from operation names.
type-system.md - Type System (Normative)
Formalizes the vocabulary of valid values:
- 57 enum types across all domains (Operation, Variant, Parameter, Protocol, Method, Interface, etc.)
- Refinement types (Identifier, CType, JMESPath queries)
- Aliases and normalization (float32 -> f32)
- Validation rules and timing
Key insight: Derivations are total functions over well-defined domains. Schema validation ensures inputs are valid before derivations run.
interfaces.md - Interfaces and Interface Bindings (Normative)
Defines the interface abstraction for decoupling operations from FFI signatures:
- Interface = ABI contract for a family of operations (symbol naming, calling conventions, parameters)
- InterfaceBinding = FFI signatures for an operation on a specific interface
- Interface matching: operation declares required interface, backend declares provided interface
- Supports multiple interfaces per library (LAPACKE vs Fortran LAPACK, CBLAS vs Fortran BLAS, FFTW vs vDSP)
Key insight: Same operation, different interfaces. Enables Accelerate (Fortran LAPACK) and OpenBLAS (LAPACKE) to provide the same logical operations.
substrates.md - Language Substrates (Normative)
Defines what "idiomatic" means per language:
- Type mappings (float32 → f32, np.float32, float32)
- Error handling (Result, exceptions, error returns)
- Naming conventions (snake_case, camelCase, PascalCase)
- Memory patterns (ownership, GC, manual, arena)
- Generics strategy (monomorphization, type erasure, runtime dispatch)
Key insight: Same fact, different substrate → native-feeling code.
stencils.md - Stencils and Semantic Forms (Normative)
Defines the semantic form model and stencil system for paradigm-neutral code generation:
- Semantic forms (bind, invoke, yield, sequence, guard, etc.)
- Paradigms (Algol, Lisp, Stack) and paradigm-aware lowering
- Stencil dispatch mechanism and Doc algebra
- Overlay system for language-specific customization
Key insight: Semantic forms capture computational intent; stencils translate intent to syntax.
Design Principles
Mandatory Primitives, No Code Maps
- Every operation and variant states
patternin YAML, which carries computational metadata. Benchmark configuration (bench_complexity,bench_dimensions,bench_size_progression,bench_flops_formula) is derived from pattern metadata, never hard-coded in code. - Every method states
array_method_pattern. NoARRAY_METHOD_PATTERNSmap exists in codegen. - The derivation registry is the only source of derived fields (
tier,test_shape,supported,wrapper_signature, etc.). Templates render what the registry computed; they never recompute or override. - Schema-backed validation is fail-fast: missing primitives or unresolved references halt generation.
Where Domain Knowledge Lives
Domain knowledge appears in two places, both in specs—never scattered through codegen:
- Per-operation primitives in YAML annotations (
level,pattern,dtype,role) - Generic derivation rules that operate only on primitives, never on operation names
This is the key distinction: derive_tier(op) that reads op.level is acceptable (rule over primitives). TEST_SHAPE_MAP["dot"] is not (hardcoded per-op knowledge).
Core Principles
-
Specs are the source of truth. Domain knowledge lives in YAML primitives and documented derivation rules.
-
Codegen is rule-based. All transformations work from field values, never from operation names. No
if op_name == 'gemm'. Templates MAY branch on derived fields (e.g.,protocol_shape,call_style) but MUST NOT derive new semantic classifications from raw primitives. All such derivations must live in the registry. -
Facts separate semantics from syntax. "Lengths must be equal" is a fact.
assert_eq!is rendering. -
Primitives are stated, composites are derived. State
level: 1. Derivetier: vectorvia rulelevel == 1 → vector. -
Layers compose independently. Changing the binding doesn't require changing the Array layer.
-
Each output should look hand-written. If a native speaker of the language would say "that's weird," it's wrong.
Escape Hatches
Some operations don't fit nice derivation patterns. Rather than pretending otherwise:
generation_mode |
Meaning |
|---|---|
normal |
Standard templates, all derivations apply |
special |
Generated with operation-specific template overrides |
manual |
No generation; hand-written code in languages/{lang}/manual/ |
Operations that are special or manual are explicitly marked in annotations. The goal is to minimize them, not pretend they don't exist.
The Litmus Test
Can someone who knows nothing about the target domain read the codegen and understand what it does? If yes, you've succeeded. If no, domain knowledge has leaked into code.
How It Fits Together
┌─────────────────────────────────────────────────────────────────┐
│ SPECIFICATIONS (where all knowledge lives) │
│ annotations/*.yml spec/array.yml backends/*.yml │
│ │
│ level: 1 # stated primitive │
│ pattern: reduction # stated primitive │
│ fixture_expr: "np.dot(x, y)" # oracle expression │
└────────────────────────────┬────────────────────────────────────┘
│ Loading (mechanical: parse YAML)
▼
┌─────────────────────────────────────────────────────────────────┐
│ ENRICHMENT (rule-based transformation over primitives) │
│ - Derive composites from primitives (tier from level) │
│ - Produce semantic forms (method_body: sequence_form) │
│ - Evaluate fixture expressions (run numpy) │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ FORM LOWERING (paradigm-aware transformation) │
│ - sequence_form → block_form (Algol) │
│ - sequence_form → let_form (Lisp) │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ STENCILS + OVERLAYS + DOC ALGEBRA (mechanical: format forms) │
│ paradigms/algol/stencils/*.stencil.j2 │
│ languages/{lang}/overlays/*.yml │
│ No conditionals on operation names or types │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ GENERATED CODE │
│ FFI Layer + Wrapper Layer + Array Layer │
│ Feels native to each language │
└─────────────────────────────────────────────────────────────────┘
The test: If you need domain expertise to understand the codegen, you've failed. All domain knowledge lives in annotations.
Adding Domains
The metacompiler is domain-agnostic. BLAS, LAPACK, and SLEEF are the current domains, not special cases.
To add a new domain:
| Step | What | Where |
|---|---|---|
| Define operations | YAML annotations with patterns, parameters, variants | annotations/<domain>.yml |
| Add providers | Module implementation units with native deps | packaging/providers.yml |
| (Optional) Add protocols | Lifecycle patterns if stateful | annotations/protocols/<domain>.yml |
If operations fit existing patterns (elementwise_unary, reduction, matmul, solve), no template changes needed. The derivation registry computes everything from primitives.
Extension Points
| Extension Point | How to Extend | Documentation |
|---|---|---|
| Providers | Add entry to packaging/providers.yml with module and native deps |
entities.md, module-composition.md |
| Languages | Create languages/<lang>/config.yml with substrate definition |
substrates.md |
| Modules | Define module in spec/modules/<name>.yml |
module-composition.md |
| Operations | Add to annotations/<library>.yml |
annotation-fields.md |
| Platforms | Define platform constraints | entities.md (Platform entity) |
| Derivations | Register in derivation registry | derivation-registry.md |
| Paradigms | Add paradigm stencils in paradigms/<name>/ |
stencils.md |
Stability: Extension points are documented but not formally versioned. Extensions may require updates if core entities change.
Additional Documents
protocol-algebra.md - Protocol Algebra (Normative)
Formal model of stateful API protocols and lifecycle patterns. Defines phase types (Create, Use, Configure, Consume, Query, MutateGlobal) and protocol shapes (query, global_state, raii_type, session_with_children, etc.).
derivation-registry.md - Derivation Registry (Normative)
The DAG-based derivation mechanism. Defines scopes (OPERATION, VARIANT, PROTOCOL, METHOD), JMESPath queries, and topological execution.
error-model.md - Error Model (Normative)
Error categories (precondition, creation, execution, destruction), propagation patterns, and LAPACK info code handling.
module-composition.md - Module Composition (Normative)
Multi-provider bindings, fallback chains, and module resolution. Defines how modules from different domains (BLAS, SLEEF, FFT) compose with provider selection and dependency resolution.
testing-strategy.md - Testing Strategy (Normative)
How to test the metacompiler itself: unit tests, property tests, and integration validation.
debug-tooling.md - Debug Tooling and CLI (Normative)
Debug and inspection commands for the metacompiler. Defines polyarray inspect, polyarray trace, polyarray validate, and other CLI tools for understanding derivation logic and debugging the pipeline.
testing-tiers.md - Testing Tier Matrix (Normative)
Defines which (Provider × Language × Binding × Platform) combinations are guaranteed to work. Three tiers: Tier 1 (tested every commit), Tier 2 (tested on release), Tier 3 (community-supported).
walkthrough.md - End-to-End Walkthrough (Normative)
Concrete trace of dot operation through all 4 pipeline stages, showing exact data transformations.
glossary.md - Glossary (Reference)
Quick reference for metacompiler terminology with links to defining documents.
Cross-References
- Memory Model: See
docs/memory-model/ - Array API: See
spec/array.yml