GitHunt
LE

leizhenyuan/gpu_cas_pattern

Cross-GPU IPC Test Results on Intel Arc Pro B60

Test 1: Bypass-Cache LSC (Atomic Flag Replacement)

Binary: test_ipc_bypass_lsc

Description: GPU A performs a bypass-cache (lsc_store.ugm.uc.uc) store of a 64-bit value. GPU B spins using bypass-cache (lsc_load.ugm.uc.uc) load on the same IPC-mapped address until the value appears.

$ mpirun -np 2 ./test_ipc_bypass_lsc

[Rank 0] Device: Intel(R) Arc(TM) Pro B60 Graphics
[Rank 1] Device: Intel(R) Arc(TM) Pro B60 Graphics
[Rank 0] IPC mapping established.
[Rank 1] IPC mapping established.
[Rank 1] Submitting while-load kernel (spinning on remote memory)...
[Rank 0] Bypass-cache storing value: 956397711170
[Rank 0] Uncached store done.
[Rank 1] While-load completed!
[Rank 1] Spin count: 765 iterations
[Rank 1] Loaded value: 956397711170 (expected: 956397711170)
[Rank 1] PASSED!

=== Test completed ===

Test 2: Acquire-Release Pattern (Data + Flag with Fence)

Binary: test_ipc_rel_acq

Description: GPU A writes 1024 data elements, issues a system-scope release fence (lsc_fence.ugm.evict.sysrel), then sets a flag via bypass-cache store. GPU B spins on the flag via bypass-cache load, issues a system-scope acquire fence (lsc_fence.ugm.invalidate.sysacq), then reads and verifies all 1024 data elements.

$ mpirun -np 2 ./test_ipc_rel_acq

[Rank 0] Device: Intel(R) Arc(TM) Pro B60 Graphics
[Rank 1] Device: Intel(R) Arc(TM) Pro B60 Graphics
[Rank 0] IPC mapping established.
[Rank 1] IPC mapping established.
[Rank 1] Reader kernel submitted, spinning on flag...
[Rank 0] Writing 1024 elements + setting flag...
[Rank 0] Writer done (data + fence + flag).
[Rank 1] Flag detected after 0 spins
[Rank 1] Data verification: 1024/1024 correct
[Rank 1] PASSED!

Summary

Test Pattern Result
test_ipc_bypass_lsc Bypass-cache LSC store + while-load PASSED (765 spins)
test_ipc_rel_acq Data write + system fence + flag store / flag load + acquire fence + data read PASSED (0 spins, 1024/1024 correct)