GitHunt — Discover GitHub Repositories

Dynamic Attention Mask (DAM) generate adaptive sparse attention masks per layer and head for Transformer models, enabling long-context inference with lower compute and memory overhead without fine-tuning.

Python120Updated 1 month ago

efficient-aiinference-optimizationsparse-attention

wenhao728/VORTA

The code implementation of paper "VORTA: Efficient Video Diffusion via Routing Sparse Attention"

Python110Updated 1 month ago

diffusion-modelssparse-attentionvideo-diffusion-model

zhenyi4/ssa

Official repository for "SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space"

Python101Updated 6 days ago

efficiencyefficient-attentionllmpre-trainingsparse-attention

Superposition09m/Awesome-LM-Architecture

A Curated Collection of Frontier Language Model Architectures

80Updated 1 month ago

diffusion-llmlinear-attentionmoesparse-attentiontest-time-learning

Lanerra/DWARF

O(N) attention with a bounded inference KV cache. D4 Daubechies wavelet field + content-gated Q·K gather at dyadic offsets.

Python30Updated 17 hours ago

ablationablation-studyattention-mechanismdeep-learningefficient-attention+13

sidcraftscode/Hydra

Toy Hydra prototypes: SSM + sparse attention + MoE + memory; synthetic benchmarks. Paper: https://arxiv.org/abs/2508.15099

Python20Updated 3 months ago

benchmarkinglanguage-modellong-contextmemorymixture-of-experts+4

Iron-Bound/native-sparse-attention

Building Native Sparse Attention

Python10Updated 1 year ago

deep-learningflash-attentionsparse-attention

NetBr3ak/HSPMN

HSPMN: Hybrid Sparse-Predictive Matter Network - LLM architecture optimized for Blackwell GPUs bridging O(N) and O(N^2) routing via ALF-LB

Python10Updated 11 hours ago

artificial-intelligencedeep-learningllm-architecturemachine-learningneural-networks+6

wesleyscholl/drex

🦀 a rust project for reinventing current llm architecture to be highly efficient, more scalable and better performing ✨

Python10Updated 4 hours ago

beyond-transformercandlecatastrophic-forgettingcognitive-architecturecontinual-learning+15

li-guohao/asam-attention

Adaptive Sparse Attention Module with Flash Attention - 5.45x speedup on consumer GPUs

Python10Updated 1 month ago

attention-mechanismcudadeep-learningefficient-inferenceflash-attention+4

TokyozxcSpedy/benchmark_moe

🔧 Optimize MoE model inference performance with automated Triton kernel tuning in the vLLM framework for various architectures and hardware setups.

Python10Updated 1 hour ago

benchmarkbenchmarkinglanguage-modellong-contextmemory+7

RaphaelMouravieff/TabStruct

Code for ACL 2025 paper: "Structural Deep Encoding for Table Question Answering"

Jupyter Notebook00Updated 7 months ago

acl2025benchmarkdeep-learningnatural-language-processingnlp+10

chaowei312/HyperGraph-Sparse-Attention

Sparse attention via hypergraph partitioning for efficient long-context transformers

Python00Updated 3 weeks ago

attention-mechanismdeep-learningpytorchsparse-attentiontransformers

Page 1 of 2