C99 Compiler
A compiler for the C99 programming language implemented in Rust, focused on standards compliance, cross-platform support, and educational clarity.
Overview
This project implements a compiler for the C99 standard that translates C source code to assembly for multiple target architectures. The compiler is designed both as a practical tool and as a reference implementation that demonstrates modern compiler construction techniques.
Current Capabilities
The compiler supports many core C99 language features:
-
Language Support:
- Basic types (char, int, float, double, etc.)
- Control flow (if/else, for, while, switch)
- Functions with proper calling conventions
- Pointers and arrays
- Structures and unions
- Typedefs and type qualifiers
-
Preprocessing:
- Macro definition and expansion
- Conditional compilation (#if, #ifdef, etc.)
- File inclusion
- Token concatenation
- Standard predefined macros
-
Target Support:
- x86-64 assembly generation (primary target)
- AArch64 (ARM64) support
- Target-specific optimizations
-
Architecture:
- Modular design with clear separation between stages
- SSA-based Intermediate Representation
- Well-defined interfaces between compilation phases
- Comprehensive error reporting
Project Structure
c99_compiler/
├── src/
│ ├── lexer/ # Lexical analysis
│ ├── preprocessor/ # C preprocessor
│ ├── parser/ # Syntax analysis
│ ├── ir/ # Intermediate Representation
│ ├── arch/ # Architecture-specific code generation
│ │ ├── x86_64/ # x86-64 backend
│ │ ├── aarch64/ # AArch64 backend
│ │ └── common.rs # Shared code generation utilities
│ ├── os/ # OS-specific macros and functionality
│ │ ├── linux.rs # Linux-specific macros
│ │ └── darwin.rs # Darwin-specific macros
│ ├── error.rs # Error handling
│ ├── symbol.rs # Symbol table management
│ ├── target.rs # Target architecture specification
│ ├── lib.rs # Library interface
│ └── main.rs # Command-line interface
├── tests/ # Test suite
│ ├── files/ # Test C files
│ ├── tmp/ # Generated assembly output
│ ├── aarch64_tests.rs # AArch64-specific tests
│ └── integration_tests.rs # Integration tests
├── docs/ # Documentation
│ └── IR_INSTRUCTIONS.md # IR instruction reference
├── scripts/ # Utility scripts
│ ├── run_integer_tests.sh # Script for running integer tests
│ └── run_new_integer_tests.sh # Script for running bit-width specific tests
Usage
Building the Compiler
cargo build --releaseCompiling a C File
# Compile to x86-64 assembly
./target/release/c99 input.c -o output.s
# Compile to AArch64 assembly
./target/release/c99 input.c -o output.s --target=aarch64Running Tests
# Run all tests
cargo test
# Run specific test suite
cargo test -- integration_tests
cargo test -- aarch64_testsTechnical Implementation
The compiler follows established compiler design principles while emphasizing educational clarity and practical functionality:
Compilation Pipeline
- Lexical Analysis: Tokenizes input using a custom lexer implemented with Rust's pattern matching
- Preprocessing: Performs macro expansion, conditional compilation, and include file resolution
- Syntax Analysis: Builds a comprehensive AST (Abstract Syntax Tree) using recursive descent parsing
- Semantic Analysis: Validates types and checks semantic correctness
- IR Generation: Translates the AST to an SSA-based Intermediate Representation
- Code Generation: Translates IR to target-specific assembly code
Intermediate Representation (IR)
The compiler employs a Static Single Assignment (SSA) form IR that:
- Acts as the central interface between front-end and back-end
- Provides explicit control flow through basic blocks
- Uses Phi nodes to manage data flow at control flow merge points
- Enables architecture-independent optimizations
- Preserves type information throughout compilation
- Simplifies code generation through linearized form
Type System
The type system implements the complete C99 type semantics:
- Primitive types with architecture-specific sizes and alignments
- Derived types (pointers, arrays, structs, unions)
- Type qualifiers (const, volatile)
- Type compatibility rules
- Implicit and explicit conversion rules
Cross-platform Support
Architecture-specific code is isolated in dedicated modules:
- Abstract interfaces for code generation via traits
- Separate ABISpec implementations for each target
- Architecture-dependent type information
- Platform-specific predefined macros
Documentation
The project includes comprehensive documentation:
- IR Instructions Reference: Complete reference for the IR instruction set
Future Work
The compiler continues to evolve toward full C99 compliance. Current areas of development include:
- Complete Standard Library Support: Integration with standard C library
- Optimization Passes: Implementation of IR-level optimizations
- Extended Preprocessor Functionality: Support for additional GNU extensions
- Full AArch64 Support: Enhanced compatibility for ARM64 targets with more complex parameter passing
- Inline Assembly: Support for embedded assembly in C code
- Full Floating-Point Support: Complete implementation of all floating-point operations
- Additional Warning Levels: Enhanced diagnostic capabilities
For those interested in contributing, the best starting points are:
- Adding test cases for existing functionality
- Improving error diagnostics
- Implementing missing C99 features