GitHunt
TO

tombelieber/go-false-sharing-bench

go-false-sharing-bench

A small Go micro-benchmark to demonstrate and measure false sharing vs cache-line padding at varying numbers of goroutines and OS threads.

Project Layout

.
├── main.go         # Go benchmark driver
├── plot_fs.py      # Python script to plot results
├── results.csv     # Generated benchmark data
├── README.md       # This documentation
└── .gitignore

Prerequisites

  • Go 1.21+
  • Python 3 with pandas & matplotlib
  • bash (Linux/macOS)

Build

go build -o bench-fs

Collect Benchmark Data

  1. Determine your logical CPU count:

    cpu_count=$(sysctl -n hw.logicalcpu)
  2. Create (or clear) results.csv with header:

    echo "writers,procs,unpadded_ms,padded_ms" > results.csv
  3. Sweep over writer counts (2, 4, 8) and procs from 1 to cpu_count:

    for w in 2 4 8; do
      for p in $(seq 1 $cpu_count); do
        ./bench-fs -writers=$w -procs=$p >> results.csv
      done
    done

This produces lines like:

writers,procs,unpadded_ms,padded_ms
2,1,47.890,36.026
2,2,38.120,28.514
...
8,10,80.123,30.456

Plotting

python plot_fs.py

A chart will pop up showing runtime vs GOMAXPROCS for 2, 4, and 8 writers—solid lines for unpadded and dashed for padded.

What You’ll Learn

  • False sharing: multiple cores writing adjacent fields on the same cache line causes costly coherence traffic.
  • Padding: inserting 56 bytes ([7]uint64) between fields aligns each on its own 64-byte cache line, restoring full parallel throughput.
  • How parallel efficiency scales (or stalls) as you crank up GOMAXPROCS.

License

MIT © Your Name