tombelieber/go-false-sharing-bench

A small Go micro-benchmark to demonstrate and measure false sharing vs cache-line padding at varying numbers of goroutines and OS threads.

Project Layout

.
├── main.go         # Go benchmark driver
├── plot_fs.py      # Python script to plot results
├── results.csv     # Generated benchmark data
├── README.md       # This documentation
└── .gitignore

Prerequisites

Go 1.21+
Python 3 with pandas & matplotlib
bash (Linux/macOS)

Build

go build -o bench-fs

Collect Benchmark Data

Determine your logical CPU count:
```
cpu_count=$(sysctl -n hw.logicalcpu)
```

Create (or clear) results.csv with header:

echo "writers,procs,unpadded_ms,padded_ms" > results.csv

Sweep over writer counts (2, 4, 8) and procs from 1 to cpu_count:

for w in 2 4 8; do
  for p in $(seq 1 $cpu_count); do
    ./bench-fs -writers=$w -procs=$p >> results.csv
  done
done

This produces lines like:

writers,procs,unpadded_ms,padded_ms
2,1,47.890,36.026
2,2,38.120,28.514
...
8,10,80.123,30.456

Plotting

python plot_fs.py

A chart will pop up showing runtime vs GOMAXPROCS for 2, 4, and 8 writers—solid lines for unpadded and dashed for padded.

What You’ll Learn

False sharing: multiple cores writing adjacent fields on the same cache line causes costly coherence traffic.
Padding: inserting 56 bytes ([7]uint64) between fields aligns each on its own 64-byte cache line, restoring full parallel throughput.
How parallel efficiency scales (or stalls) as you crank up GOMAXPROCS.