GitHunt
JG

jgarzik/vibe99-old

C99 Compiler

A compiler for the C99 programming language implemented in Rust, focused on standards compliance, cross-platform support, and educational clarity.

Overview

This project implements a compiler for the C99 standard that translates C source code to assembly for multiple target architectures. The compiler is designed both as a practical tool and as a reference implementation that demonstrates modern compiler construction techniques.

Current Capabilities

The compiler supports many core C99 language features:

  • Language Support:

    • Basic types (char, int, float, double, etc.)
    • Control flow (if/else, for, while, switch)
    • Functions with proper calling conventions
    • Pointers and arrays
    • Structures and unions
    • Typedefs and type qualifiers
  • Preprocessing:

    • Macro definition and expansion
    • Conditional compilation (#if, #ifdef, etc.)
    • File inclusion
    • Token concatenation
    • Standard predefined macros
  • Target Support:

    • x86-64 assembly generation (primary target)
    • AArch64 (ARM64) support
    • Target-specific optimizations
  • Architecture:

    • Modular design with clear separation between stages
    • SSA-based Intermediate Representation
    • Well-defined interfaces between compilation phases
    • Comprehensive error reporting

Project Structure

c99_compiler/
├── src/
│   ├── lexer/          # Lexical analysis
│   ├── preprocessor/   # C preprocessor
│   ├── parser/         # Syntax analysis
│   ├── ir/             # Intermediate Representation
│   ├── arch/           # Architecture-specific code generation
│   │   ├── x86_64/     # x86-64 backend
│   │   ├── aarch64/    # AArch64 backend
│   │   └── common.rs   # Shared code generation utilities
│   ├── os/             # OS-specific macros and functionality
│   │   ├── linux.rs    # Linux-specific macros
│   │   └── darwin.rs   # Darwin-specific macros
│   ├── error.rs        # Error handling
│   ├── symbol.rs       # Symbol table management
│   ├── target.rs       # Target architecture specification
│   ├── lib.rs          # Library interface
│   └── main.rs         # Command-line interface
├── tests/              # Test suite
│   ├── files/          # Test C files
│   ├── tmp/            # Generated assembly output
│   ├── aarch64_tests.rs # AArch64-specific tests
│   └── integration_tests.rs # Integration tests
├── docs/               # Documentation
│   └── IR_INSTRUCTIONS.md # IR instruction reference
├── scripts/            # Utility scripts
│   ├── run_integer_tests.sh # Script for running integer tests
│   └── run_new_integer_tests.sh # Script for running bit-width specific tests

Usage

Building the Compiler

cargo build --release

Compiling a C File

# Compile to x86-64 assembly
./target/release/c99 input.c -o output.s

# Compile to AArch64 assembly
./target/release/c99 input.c -o output.s --target=aarch64

Running Tests

# Run all tests
cargo test

# Run specific test suite
cargo test -- integration_tests
cargo test -- aarch64_tests

Technical Implementation

The compiler follows established compiler design principles while emphasizing educational clarity and practical functionality:

Compilation Pipeline

  1. Lexical Analysis: Tokenizes input using a custom lexer implemented with Rust's pattern matching
  2. Preprocessing: Performs macro expansion, conditional compilation, and include file resolution
  3. Syntax Analysis: Builds a comprehensive AST (Abstract Syntax Tree) using recursive descent parsing
  4. Semantic Analysis: Validates types and checks semantic correctness
  5. IR Generation: Translates the AST to an SSA-based Intermediate Representation
  6. Code Generation: Translates IR to target-specific assembly code

Intermediate Representation (IR)

The compiler employs a Static Single Assignment (SSA) form IR that:

  • Acts as the central interface between front-end and back-end
  • Provides explicit control flow through basic blocks
  • Uses Phi nodes to manage data flow at control flow merge points
  • Enables architecture-independent optimizations
  • Preserves type information throughout compilation
  • Simplifies code generation through linearized form

Type System

The type system implements the complete C99 type semantics:

  • Primitive types with architecture-specific sizes and alignments
  • Derived types (pointers, arrays, structs, unions)
  • Type qualifiers (const, volatile)
  • Type compatibility rules
  • Implicit and explicit conversion rules

Cross-platform Support

Architecture-specific code is isolated in dedicated modules:

  • Abstract interfaces for code generation via traits
  • Separate ABISpec implementations for each target
  • Architecture-dependent type information
  • Platform-specific predefined macros

Documentation

The project includes comprehensive documentation:

Future Work

The compiler continues to evolve toward full C99 compliance. Current areas of development include:

  • Complete Standard Library Support: Integration with standard C library
  • Optimization Passes: Implementation of IR-level optimizations
  • Extended Preprocessor Functionality: Support for additional GNU extensions
  • Full AArch64 Support: Enhanced compatibility for ARM64 targets with more complex parameter passing
  • Inline Assembly: Support for embedded assembly in C code
  • Full Floating-Point Support: Complete implementation of all floating-point operations
  • Additional Warning Levels: Enhanced diagnostic capabilities

For those interested in contributing, the best starting points are:

  1. Adding test cases for existing functionality
  2. Improving error diagnostics
  3. Implementing missing C99 features