GitHunt
CT

ctoth/polyarray-spec

Polyarray Metacompiler Specification

Status: Normative
Version: 0.9-candidate


Document precedence: When documents disagree, Normative specs take precedence over Informational/Conceptual drafts. entities.md is conceptual (structural intent); the canonical field/derivation contracts live in annotation-fields.md, derivation-registry.md, and pipeline.md. Version numbers in those normative docs SHOULD remain aligned; mismatches must be treated as defects to reconcile.

Overview

The Polyarray Metacompiler generates native-feeling code across programming paradigms from semantic specifications. It separates what computation means (semantic forms) from how that translates to syntax (stencils).

The system currently targets three computational domains—BLAS, LAPACK, and SLEEF—but the architecture is domain-agnostic. The same pipeline handles vector reductions, matrix factorizations, and elementwise transcendentals.

Core separations:

Concern Mechanism
What a method does Facts — language-agnostic behavioral statements
How it maps to syntax Stencils — paradigm-specific renderers
Where implementations come from Providers — backend resolution with fallback chains
Which representations to use Substrates — language idioms and type mappings

Supports Algol-family languages (Python, Rust, Go, C) and Lisp-family languages from the same semantic forms. A sequence_form lowers to a statement block in Python or a let chain in Lisp—same IR, different renderings.


Domains

Domain Operations Provider Options
BLAS Level 1-3 linear algebra (dot, gemm, gemv, ...) OpenBLAS, MKL, BLIS, Accelerate
LAPACK Matrix factorization and solvers (gesv, getrf, potrf, ...) OpenBLAS, MKL, LAPACKE
SLEEF Vectorized transcendentals (exp, log, sin, tanh, sqrt, ...) SLEEF, libmvec, compiler builtins

Planned: FFT (FFTW, MKL FFT), RNG (PCG, xoshiro).

Each domain uses the same pipeline: YAML annotations → derivation enrichment → semantic forms → stencil rendering. Domain-specific knowledge lives in annotations, not codegen.


Specification Documents

entities.md - Entity Model (Normative)

Defines the fourteen core entities and their relationships:

  • Library, Operation, Variant, DType, Interface, InterfaceBinding, Protocol, Method (what exists)
  • Provider, NativeLibrary, Substrate (who provides it, how it's expressed)
  • Language, Binding, Platform (target environment)

Key concept: GenerationTarget = (PackageSpec, Language, Binding, Platform)

Critical FFI correctness: The NativeLibrary entity includes the Integer Model (ILP64 vs LP64) section, which is essential for correct FFI code generation. Mismatched integer models cause silent data corruption.

facts.md - Facts System (Normative)

The heart of the metacompiler. Facts are language-agnostic statements about method behavior:

  • Precondition facts (what must hold)
  • Parameter facts (what inputs)
  • Operation facts (what to call)
  • Result facts (what to return)
  • Call pattern facts (how to call — computed from binding)

Key insight: Templates don't compute. They render facts that enrichment computed.

composition.md - Composition Model (Normative)

Defines how layers compose:

  • FFI Layer (raw bindings to C ABI)
  • Wrapper Layer (ergonomic dispatch functions)
  • Array Layer (high-level API using facts)

Key insight: Layers are independent artifacts. You can generate just FFI, or FFI + wrappers, or all three.

pipeline.md - Generation Pipeline (Normative)

Defines the transformation stages:

  1. Loading - YAML → internal representation (mechanical)
  2. Enrichment - Derive computable fields (mechanical)
  3. Availability - What can be generated for this target (mechanical)
  4. Re-Enrichment - Refine derived fields based on availability (mechanical)
  5. Form Lowering - Transform paradigm-neutral forms to paradigm-specific forms (mechanical)
  6. Rendering - Stencils + Doc algebra → source code (mechanical)

Runtime Forms note: Algorithms in schema/runtime.yml are rendered in Stage 6 via runtime stencils and are not subject to Form Lowering.

Runtime index type: int in runtime forms is the runtime index type (int64). Arrays store shape, strides, and offsets in this type; conversions to language-native indexing or BLAS types happen only at boundaries via checked helpers.

Key insight: ALL stages are mechanical. Intelligence lives in specs, not codegen.

annotation-fields.md - Primitives and Derivation (Normative)

Defines what must be stated vs derived:

  • Primitive fields (level, pattern, dtype, role)
  • Principled derivations (tier from level, test_shape from pattern+return_type)
  • The derivation registry

Key insight: Derivation is allowed if it works from primitives via documented rules, not from operation names.

type-system.md - Type System (Normative)

Formalizes the vocabulary of valid values:

  • 57 enum types across all domains (Operation, Variant, Parameter, Protocol, Method, Interface, etc.)
  • Refinement types (Identifier, CType, JMESPath queries)
  • Aliases and normalization (float32 -> f32)
  • Validation rules and timing

Key insight: Derivations are total functions over well-defined domains. Schema validation ensures inputs are valid before derivations run.

interfaces.md - Interfaces and Interface Bindings (Normative)

Defines the interface abstraction for decoupling operations from FFI signatures:

  • Interface = ABI contract for a family of operations (symbol naming, calling conventions, parameters)
  • InterfaceBinding = FFI signatures for an operation on a specific interface
  • Interface matching: operation declares required interface, backend declares provided interface
  • Supports multiple interfaces per library (LAPACKE vs Fortran LAPACK, CBLAS vs Fortran BLAS, FFTW vs vDSP)

Key insight: Same operation, different interfaces. Enables Accelerate (Fortran LAPACK) and OpenBLAS (LAPACKE) to provide the same logical operations.

substrates.md - Language Substrates (Normative)

Defines what "idiomatic" means per language:

  • Type mappings (float32 → f32, np.float32, float32)
  • Error handling (Result, exceptions, error returns)
  • Naming conventions (snake_case, camelCase, PascalCase)
  • Memory patterns (ownership, GC, manual, arena)
  • Generics strategy (monomorphization, type erasure, runtime dispatch)

Key insight: Same fact, different substrate → native-feeling code.

stencils.md - Stencils and Semantic Forms (Normative)

Defines the semantic form model and stencil system for paradigm-neutral code generation:

  • Semantic forms (bind, invoke, yield, sequence, guard, etc.)
  • Paradigms (Algol, Lisp, Stack) and paradigm-aware lowering
  • Stencil dispatch mechanism and Doc algebra
  • Overlay system for language-specific customization

Key insight: Semantic forms capture computational intent; stencils translate intent to syntax.


Design Principles

Mandatory Primitives, No Code Maps

  • Every operation and variant states pattern in YAML, which carries computational metadata. Benchmark configuration (bench_complexity, bench_dimensions, bench_size_progression, bench_flops_formula) is derived from pattern metadata, never hard-coded in code.
  • Every method states array_method_pattern. No ARRAY_METHOD_PATTERNS map exists in codegen.
  • The derivation registry is the only source of derived fields (tier, test_shape, supported, wrapper_signature, etc.). Templates render what the registry computed; they never recompute or override.
  • Schema-backed validation is fail-fast: missing primitives or unresolved references halt generation.

Where Domain Knowledge Lives

Domain knowledge appears in two places, both in specs—never scattered through codegen:

  1. Per-operation primitives in YAML annotations (level, pattern, dtype, role)
  2. Generic derivation rules that operate only on primitives, never on operation names

This is the key distinction: derive_tier(op) that reads op.level is acceptable (rule over primitives). TEST_SHAPE_MAP["dot"] is not (hardcoded per-op knowledge).

Core Principles

  1. Specs are the source of truth. Domain knowledge lives in YAML primitives and documented derivation rules.

  2. Codegen is rule-based. All transformations work from field values, never from operation names. No if op_name == 'gemm'. Templates MAY branch on derived fields (e.g., protocol_shape, call_style) but MUST NOT derive new semantic classifications from raw primitives. All such derivations must live in the registry.

  3. Facts separate semantics from syntax. "Lengths must be equal" is a fact. assert_eq! is rendering.

  4. Primitives are stated, composites are derived. State level: 1. Derive tier: vector via rule level == 1 → vector.

  5. Layers compose independently. Changing the binding doesn't require changing the Array layer.

  6. Each output should look hand-written. If a native speaker of the language would say "that's weird," it's wrong.

Escape Hatches

Some operations don't fit nice derivation patterns. Rather than pretending otherwise:

generation_mode Meaning
normal Standard templates, all derivations apply
special Generated with operation-specific template overrides
manual No generation; hand-written code in languages/{lang}/manual/

Operations that are special or manual are explicitly marked in annotations. The goal is to minimize them, not pretend they don't exist.

The Litmus Test

Can someone who knows nothing about the target domain read the codegen and understand what it does? If yes, you've succeeded. If no, domain knowledge has leaked into code.


How It Fits Together

┌─────────────────────────────────────────────────────────────────┐
│  SPECIFICATIONS (where all knowledge lives)                      │
│  annotations/*.yml  spec/array.yml  backends/*.yml               │
│                                                                  │
│  level: 1              # stated primitive                        │
│  pattern: reduction    # stated primitive                        │
│  fixture_expr: "np.dot(x, y)"   # oracle expression              │
└────────────────────────────┬────────────────────────────────────┘
                             │ Loading (mechanical: parse YAML)
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│  ENRICHMENT (rule-based transformation over primitives)          │
│  - Derive composites from primitives (tier from level)           │
│  - Produce semantic forms (method_body: sequence_form)           │
│  - Evaluate fixture expressions (run numpy)                      │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│  FORM LOWERING (paradigm-aware transformation)                   │
│  - sequence_form → block_form (Algol)                            │
│  - sequence_form → let_form (Lisp)                               │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│  STENCILS + OVERLAYS + DOC ALGEBRA (mechanical: format forms)    │
│  paradigms/algol/stencils/*.stencil.j2                           │
│  languages/{lang}/overlays/*.yml                                 │
│  No conditionals on operation names or types                     │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│  GENERATED CODE                                                  │
│  FFI Layer + Wrapper Layer + Array Layer                         │
│  Feels native to each language                                   │
└─────────────────────────────────────────────────────────────────┘

The test: If you need domain expertise to understand the codegen, you've failed. All domain knowledge lives in annotations.


Adding Domains

The metacompiler is domain-agnostic. BLAS, LAPACK, and SLEEF are the current domains, not special cases.

To add a new domain:

Step What Where
Define operations YAML annotations with patterns, parameters, variants annotations/<domain>.yml
Add providers Module implementation units with native deps packaging/providers.yml
(Optional) Add protocols Lifecycle patterns if stateful annotations/protocols/<domain>.yml

If operations fit existing patterns (elementwise_unary, reduction, matmul, solve), no template changes needed. The derivation registry computes everything from primitives.

Extension Points

Extension Point How to Extend Documentation
Providers Add entry to packaging/providers.yml with module and native deps entities.md, module-composition.md
Languages Create languages/<lang>/config.yml with substrate definition substrates.md
Modules Define module in spec/modules/<name>.yml module-composition.md
Operations Add to annotations/<library>.yml annotation-fields.md
Platforms Define platform constraints entities.md (Platform entity)
Derivations Register in derivation registry derivation-registry.md
Paradigms Add paradigm stencils in paradigms/<name>/ stencils.md

Stability: Extension points are documented but not formally versioned. Extensions may require updates if core entities change.


Additional Documents

protocol-algebra.md - Protocol Algebra (Normative)

Formal model of stateful API protocols and lifecycle patterns. Defines phase types (Create, Use, Configure, Consume, Query, MutateGlobal) and protocol shapes (query, global_state, raii_type, session_with_children, etc.).

derivation-registry.md - Derivation Registry (Normative)

The DAG-based derivation mechanism. Defines scopes (OPERATION, VARIANT, PROTOCOL, METHOD), JMESPath queries, and topological execution.

error-model.md - Error Model (Normative)

Error categories (precondition, creation, execution, destruction), propagation patterns, and LAPACK info code handling.

module-composition.md - Module Composition (Normative)

Multi-provider bindings, fallback chains, and module resolution. Defines how modules from different domains (BLAS, SLEEF, FFT) compose with provider selection and dependency resolution.

testing-strategy.md - Testing Strategy (Normative)

How to test the metacompiler itself: unit tests, property tests, and integration validation.

debug-tooling.md - Debug Tooling and CLI (Normative)

Debug and inspection commands for the metacompiler. Defines polyarray inspect, polyarray trace, polyarray validate, and other CLI tools for understanding derivation logic and debugging the pipeline.

testing-tiers.md - Testing Tier Matrix (Normative)

Defines which (Provider × Language × Binding × Platform) combinations are guaranteed to work. Three tiers: Tier 1 (tested every commit), Tier 2 (tested on release), Tier 3 (community-supported).

walkthrough.md - End-to-End Walkthrough (Normative)

Concrete trace of dot operation through all 4 pipeline stages, showing exact data transformations.

glossary.md - Glossary (Reference)

Quick reference for metacompiler terminology with links to defining documents.


Cross-References

  • Memory Model: See docs/memory-model/
  • Array API: See spec/array.yml