Cuckoo Hashing Performance Benchmarks
This repository contains a comparative study of various Cuckoo Hashing implementations. The project uses a dataset of 200 model names fetched from Hugging Face to evaluate the load factor, timing, and displacement (collision) characteristics of different hashing variants.
Overview of Variants
| Variant | Number of Tables | Number of Hash Functions | Key Features | Theoretical Load Factor |
|---|---|---|---|---|
| Original Cuckoo Hashing | 2 | 2 | Basic displacement of keys, constant-time lookups | ~50% |
| d-ary Cuckoo Hashing | 3 | 3 | Uses multiple tables and hash functions, higher stability | ~91% |
| Cuckoo Hashing with a Stash | 2 | 2 | Uses a small auxiliary stash for overflow keys | ~99% |
| Blocked Cuckoo Hashing | 2 | 2 | Buckets hold multiple keys, improves cache locality | ~95% |
Benchmark Results
Testing was performed by inserting 200 items into a system with a total capacity of approximately 400 slots, targeting a 50% load factor.
GCC Performance Summary (-O3)
| Variant | Avg Time (s) | Avg Load Factor | Avg Displacements |
|---|---|---|---|
| Blocked Cuckoo (b=4) | 0.00000690 | 0.5000 | 41.00 |
| d-ary Cuckoo (d=3) | 0.00000820 | 0.4975 | 143.00 |
| Cuckoo Hashing with Stash | 0.00001350 | 0.5000 | 346.00 |
| Original Cuckoo Hashing | 0.00001440 | 0.4975 | 308.00 |
Compiler Comparison: GCC vs. CCC
A comparison was conducted between the standard GCC and the Claude C Compiler (CCC).
| Variant | GCC Avg Time (s) | CCC Avg Time (s) | Performance Gap |
|---|---|---|---|
| Original | 0.00001440 | 0.00004230 | ~2.9x (GCC faster) |
| d-ary | 0.00000820 | 0.00003150 | ~3.8x (GCC faster) |
| Stash | 0.00001350 | 0.00004120 | ~3.0x (GCC faster) |
| Blocked | 0.00000690 | 0.00001900 | ~2.7x (GCC faster) |
Requirements
To run the full benchmark and automation suite, you need the following:
- GCC: The standard GNU C Compiler.
- CCC: The Claude C Compiler (optional, script will auto-detect if missing).
- Python 3.x: Used for data generation and report formatting.
- Python
requestslibrary: Required forgenerate_csv.pyto fetch model names.
pip install requestsGetting Started
The easiest way to regenerate the data, rebuild the binaries, and run the benchmarks is via the master automation script:
./run.shOptions
--no-ccc: Force skip the CCC compiler benchmarks even ifcccis found in your path.
Repository Structure
run.sh: Master automation script that manages data generation, compilation, and reporting.generate_csv.py: Python script used to fetch 200 model names from Hugging Face and generate test data.models_test_data.csv: The generated dataset used for testing.variants/:original.c,dary.c,stash.c,blocked.c: C implementations of each variant.report.py: Format the raw execution results into a side-by-side compiler comparison.asm/: Assembly outputs generated by both GCC and CCC.compare_compilers.sh: Legacy shell-based benchmark script.
cuckoo_testbed.c: Primary C testbed for initial variant benchmarking.LICENSE: MIT License.
Assembly Analysis
Assembly outputs are included in variants/asm/ to allow for close inspection of how each compiler optimizes the tight hashing and eviction loops. GCC generally produces more compact code (7-9KB) compared to CCC (14-16KB), reflecting its more aggressive optimization strategies.
License
This project is licensed under the MIT License - see the LICENSE file for details.