This repo provides complementary demonstrations for a web article found on the web, by Christos Argyropoulos.
The Quest for Performance Part II : Perl vs Python
The inspiration came from reading Comparison of various GPU acceleration frameworks using matrix-vector multiplication, by Thomas Germer. If you're trying Mr. Germer's repo, comment out the line ti.loop_config(block_dim=N) for better performance on desktop GPUs. More over, set block_size = 32. I ran with m, n = 8192, 8192.
Contributors
MIT License
Created August 1, 2024
Updated May 21, 2025