Cross-GPU IPC Test Results on Intel Arc Pro B60
Test 1: Bypass-Cache LSC (Atomic Flag Replacement)
Binary: test_ipc_bypass_lsc
Description: GPU A performs a bypass-cache (lsc_store.ugm.uc.uc) store of a 64-bit value. GPU B spins using bypass-cache (lsc_load.ugm.uc.uc) load on the same IPC-mapped address until the value appears.
$ mpirun -np 2 ./test_ipc_bypass_lsc
[Rank 0] Device: Intel(R) Arc(TM) Pro B60 Graphics
[Rank 1] Device: Intel(R) Arc(TM) Pro B60 Graphics
[Rank 0] IPC mapping established.
[Rank 1] IPC mapping established.
[Rank 1] Submitting while-load kernel (spinning on remote memory)...
[Rank 0] Bypass-cache storing value: 956397711170
[Rank 0] Uncached store done.
[Rank 1] While-load completed!
[Rank 1] Spin count: 765 iterations
[Rank 1] Loaded value: 956397711170 (expected: 956397711170)
[Rank 1] PASSED!
=== Test completed ===
Test 2: Acquire-Release Pattern (Data + Flag with Fence)
Binary: test_ipc_rel_acq
Description: GPU A writes 1024 data elements, issues a system-scope release fence (lsc_fence.ugm.evict.sysrel), then sets a flag via bypass-cache store. GPU B spins on the flag via bypass-cache load, issues a system-scope acquire fence (lsc_fence.ugm.invalidate.sysacq), then reads and verifies all 1024 data elements.
$ mpirun -np 2 ./test_ipc_rel_acq
[Rank 0] Device: Intel(R) Arc(TM) Pro B60 Graphics
[Rank 1] Device: Intel(R) Arc(TM) Pro B60 Graphics
[Rank 0] IPC mapping established.
[Rank 1] IPC mapping established.
[Rank 1] Reader kernel submitted, spinning on flag...
[Rank 0] Writing 1024 elements + setting flag...
[Rank 0] Writer done (data + fence + flag).
[Rank 1] Flag detected after 0 spins
[Rank 1] Data verification: 1024/1024 correct
[Rank 1] PASSED!
Summary
| Test | Pattern | Result |
|---|---|---|
test_ipc_bypass_lsc |
Bypass-cache LSC store + while-load | PASSED (765 spins) |
test_ipc_rel_acq |
Data write + system fence + flag store / flag load + acquire fence + data read | PASSED (0 spins, 1024/1024 correct) |