53 results for “topic:ptx”
Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
ILGPU JIT Compiler for high-performance .Net GPU programs
row-major matmul optimization
🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
Free software file format parser for Avid ProTools sessions
TeraXLang - Triton Extension for LLM. As fast as FlashAttention FlashMLA, etc.
Energinets Model Testbench. Automate gridcompliance studies in PSCAD and Powerfactory.
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
Binary Ninja plugin for reverse engineering PTX -- the virtual instruction set architecture of CUDA-based GPUs.
This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA/CUTLASS kernels, Triton spells, and PTX sorcery.
Web knowledge is fragmented — duplicated across fonts, embeddings, metadata, and renderings. Humans see pixels, AI sees tokens, neither shares the source. Knowledge3D: a sovereign GPU-native reference implementation for W3C PM-KR, where humans and AI consume the same procedural knowledge from one source.
CUDA kernels in any language supported by LLVM
Speed boost using: Assembly, GPU and WASM
GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving
Set of examples written for hardware acceleration via TornadoVM
Compile Rust into PTX
Inline PTX Assembly in CUDA example
Compile MLIR to PTX and execute it on NVIDIA GPUs
Optimizing GPU compiler and database system for NVIDIA hardware
VeriBlock CUDA PoW Miner
Bloch's equations and Optimal Control for MRI and NMR applications
FastPtx: a python pTx pulse design tool for freely optimizing RF and gradient pulses with autodifferentiation
Visual Studio Code extension with PTX assembly syntax support
Rust bindings to the NVIDIA NVBIT binary instrumentation API
公共運輸整合資訊流通服務平臺(Public Transport Data eXchange,PTX)的非官方 Golang 用戶端程式庫
🎉持续更新:CUDA 12.2 PTX-ISA-8.2学习笔记,部分中文翻译 + 个人理解 + 内联汇编示例,讲解CUDA 12.2 PTX-ISA-8.2 汇编指令;进行中.....
zCUDA: 🔗 CUDA Binding Layer | ⚡ CUDA Kernel DSL. Write CUDA kernels in pure Zig. Or keep your C++ kernels. Ship them all to GPU.
PTX Inject and Stack PTX for Python
compile linux to PTX / nvidia ISA and run on a single GPU core sequentially or parallelize the code and run on multiple; check feasibility