go-false-sharing-bench
A small Go micro-benchmark to demonstrate and measure false sharing vs cache-line padding at varying numbers of goroutines and OS threads.
Project Layout
.
├── main.go # Go benchmark driver
├── plot_fs.py # Python script to plot results
├── results.csv # Generated benchmark data
├── README.md # This documentation
└── .gitignore
Prerequisites
- Go 1.21+
- Python 3 with pandas & matplotlib
- bash (Linux/macOS)
Build
go build -o bench-fsCollect Benchmark Data
-
Determine your logical CPU count:
cpu_count=$(sysctl -n hw.logicalcpu) -
Create (or clear)
results.csvwith header:echo "writers,procs,unpadded_ms,padded_ms" > results.csv
-
Sweep over writer counts (2, 4, 8) and procs from 1 to
cpu_count:for w in 2 4 8; do for p in $(seq 1 $cpu_count); do ./bench-fs -writers=$w -procs=$p >> results.csv done done
This produces lines like:
writers,procs,unpadded_ms,padded_ms
2,1,47.890,36.026
2,2,38.120,28.514
...
8,10,80.123,30.456
Plotting
python plot_fs.pyA chart will pop up showing runtime vs GOMAXPROCS for 2, 4, and 8 writers—solid lines for unpadded and dashed for padded.
What You’ll Learn
- False sharing: multiple cores writing adjacent fields on the same cache line causes costly coherence traffic.
- Padding: inserting 56 bytes (
[7]uint64) between fields aligns each on its own 64-byte cache line, restoring full parallel throughput. - How parallel efficiency scales (or stalls) as you crank up
GOMAXPROCS.
License
MIT © Your Name
On this page
Created April 25, 2025
Updated April 25, 2025