WE

Wenox/fast-fw

Optimized implementation of Floyd Warshall algorithm using modern AVX2.

all-pairs-shortest-path asm cpp floyd-warshall-algorithm graph simd

Fast Floyd Warshall

Optimized implementation of Floyd Warshall algorithm using modern AVX2 extended instruction set.

SIMD vector instructions allow efficient parallelization on the CPU level.

The results become even better when the graph weights are limited to the range of 8 or 16 bits.

Results

Assembly implementation completely outperforms C++ implementation even when -O3 compiler flag is specified.

This was tested on both sparse and dense graphs of large sizes.

Comparison including C++

Comparison excluding C++

November, 2020.

On this page

Languages

C++64.8%Assembly30.5%C4.8%

Contributors

MIT License

Created July 25, 2021

Updated April 15, 2024