"topic:triton" — Search

279 results for “topic:triton”

Efficient Triton Kernels for LLM Training

finetuninggemma2hacktoberfestllamallama3llm-trainingllmsmistralphi3tritontriton-kernels

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda3.2k365Updated 3 hours ago

attentioncudaefficient-attentioninference-accelerationllmllm-inframlsysquantizationtritonvideo-generatevideo-generationvit

ELS-RD/kernlArchived

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Jupyter Notebook1.6k98Updated 1 week ago

cudacuda-kernelpytorchtransformertriton

TritonDataCenter/containerpilot

A service for autodiscovery and configuration of applications running in containers

Go1.1k137Updated 1 week ago

consulcontainerpilotcontainersdockerjoyentorchestrationservice-discoverytriton

flagos-ai/FlagGems

FlagGems is an operator library for large language models implemented in the Triton Language.

Python917276Updated 1 hour ago

pytorchtritontriton-kernels

JonathanSalwan/Tigress_protection

Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.

LLVM886147Updated 1 week ago

deobfuscationllvmreverse-engineeringsolution-tigress-challengesymbolic-executiontaint-analysistigresstigress-protectionstriton

coderonion/awesome-llm-and-aigc

🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.

80470Updated 3 days ago

ai4sai4scienceaigcawesome-listcudadatasetsdeepseekgptlangchainllamallmmllmqwenqwen3r1reinforcement-learningtritonvlavlmyolo

BobMcDear/attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Python59732Updated 2 days ago

cudadeep-learningmachine-learningopenaiopenai-tritonpytorchtriton

JafarAkhondali/acer-predator-turbo-and-rgb-keyboard-linux-module

Linux kernel module to support Turbo mode and RGB Keyboard for Acer Predator notebook series

C57098Updated 6 days ago

acerhacktoberfestheliosledlinuxpredatorrgbrgb-ledtritonturbo

rkinas/triton-resources

A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.

Python46527Updated 1 day ago

cudatriton

coderonion/awesome-cuda-and-hpc

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

45641Updated 1 day ago

awesomeblascublascudacudnncutlassdeepseekgemmgpuhpcllamallmmlirptxpytorchtensorrttensorrt-llmtritontvmvlm

Colton1skees/Dna

LLVM based static binary analysis framework

C++30328Updated 2 days ago

analysisbinarydeobfuscationinstruction-semanticslifterllvmllvm-irprogram-analysisstatic-analysistritonx86x86-64

d4em0n/exrop

Automatic ROPChain Generation

Python29621Updated 1 day ago

binary-exploitationctfexploit-developmentexploitdevpwnreverse-engineeringroprop-chainrop-exploitationrop-gadgetssymbolic-executiontriton

opendilab/DI-hpc

OpenDILab RL HPC OP Lib, including CUDA and Triton kernel

Python2427Updated 3 weeks ago

cudahpclstmpytorchreinforcement-learningtriton

gpu-mode/reference-kernels

Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!

Python219113Updated 8 hours ago

cudagpuleaderboardtriton

SQLab/symgdb

SymGDB - symbolic execution plugin for gdb

Python21823Updated 3 weeks ago

gdbgdb-pluginsymbolic-executiontriton

meta-pytorch/tritonparse

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

Python19519Updated just now

compilerdebugginggpu-kernelinteractive-visualizationir-analysisir-visualizationpytorchstructured-loggingtriton

NVIDIA-ISAAC-ROS/isaac_ros_object_detection

NVIDIA-accelerated, deep learned model support for image space object detection

C++18442Updated 1 week ago

deep-learninggpuinferencejetsonmachine-learningnvidiaobject-detectionrosros2ros2-humbletensorrttriton

ot-triton-lab/flash-sinkhorn

FlashSinkhorn: IO-Aware Entropic Optimal Transport in PyTorch + Triton. Streaming Sinkhorn with O(nd) memory.

Python18319Updated 1 week ago

cudaentropic-optimal-transportflash-attentionflashsinkhorngpumachine-learningoptimal-transportpytorchsinkhorntriton

kakaobrain/tridentArchived

A performance library for machine learning applications.

Python18313Updated 1 week ago

aideep-learninglibrarymachine-learningperformancepythonpytorchtriton

ROCm/iris

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

Python17934Updated 1 day ago

async-programmingcommunicationdistributed-computingfused-kernelgemmgpgpuhipkernel-fusionmlmultigpurdmaremote-memory-accessrmarocmshmemsymmetric-memorytritonworkgroup-specialization

mmsaeed509/bspwm-dots

Ozoz dotfiles for bspwm, i3WM

Shell17710Updated 1 week ago

acerarcharchlinuxbspwmdotfilesexodiaosheliosi3wmlinuxneofetchneovimpolybarpredatorrofitritonturbo

alexzhang13/flashattention2-custom-mask

Triton implementation of FlashAttention2 that adds Custom Masks.

Python17016Updated 2 hours ago

attentionattention-mechanismcuda-kernelsdeep-learningflash-attentionflash-attention-2tritontriton-lang

hyunwoongko/nanoRLHF

nanoRLHF: from-scratch journey into how LLMs and RLHF really work.

Python16714Updated 3 days ago

llmpytorchrlhfscratch-implementationtriton

clearml/clearml-serving

ClearML - Model-Serving Orchestration and Repository Solution

Python16151Updated 1 month ago

aiclearmldeep-learningdevopskubernetesmachine-learningmlopsmodel-servingservingserving-mlserving-pytorch-modelstensorflow-servingtritontriton-inference-server

DeepAuto-AI/hip-attention

Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.

Python15015Updated 2 weeks ago

attentionattention-mechanismhip-attentionopenai-tritonsub-quadratic-attentiontriton

alphaSeclab/DBI-Stuff

Resources About Dynamic Binary Instrumentation and Dynamic Binary Analysis

13827Updated 3 months ago

adbidrmemorydynamic-binary-analysisdynamic-binary-instrumentationdynamoriofridaintelpinmanticoreqbditritonvalgrind

novioleo/Savior

(WIP)The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for algorithm service that ensures reliability, high concurrency and scalability of services.

Python13529Updated 1 month ago

deeplearningdeploymentdistributedrpatritonworkflow

RightNow-AI/autokernel

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python1339Updated just now

autoresearchcudagpukernel-optimizationpytorchtriton

NVIDIA-ISAAC-ROS/isaac_ros_dnn_inference

NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

C++13017Updated 1 week ago

aideep-learningdeeplearningdnngpujetsonnvidiarosros2ros2-humbletaotensorrttensorrt-inferencetritontriton-inference-server

Page 1 of 10