2 results for “topic:attention-sink”
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
🐙 Implements Flash Attention with sink for gpt-oss-20b; includes test.py. WIP backward pass, varlen support, and community sync to return softmax_lse only.