279 results for “topic:triton”
Efficient Triton Kernels for LLM Training
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
A service for autodiscovery and configuration of applications running in containers
FlagGems is an operator library for large language models implemented in the Triton Language.
Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.
🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Linux kernel module to support Turbo mode and RGB Keyboard for Acer Predator notebook series
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
LLVM based static binary analysis framework
Automatic ROPChain Generation
OpenDILab RL HPC OP Lib, including CUDA and Triton kernel
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
SymGDB - symbolic execution plugin for gdb
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
NVIDIA-accelerated, deep learned model support for image space object detection
FlashSinkhorn: IO-Aware Entropic Optimal Transport in PyTorch + Triton. Streaming Sinkhorn with O(nd) memory.
A performance library for machine learning applications.
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
Ozoz dotfiles for bspwm, i3WM
Triton implementation of FlashAttention2 that adds Custom Masks.
nanoRLHF: from-scratch journey into how LLMs and RLHF really work.
ClearML - Model-Serving Orchestration and Repository Solution
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
Resources About Dynamic Binary Instrumentation and Dynamic Binary Analysis
(WIP)The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for algorithm service that ensures reliability, high concurrency and scalability of services.
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU