TritonSigmoid: A fast, padding-aware sigmoid attention kernel for GPUs [R]
Researchers open-source a sigmoid attention kernel that beats FlashAttention-2 by 43%.
Researchers have open-sourced TritonSigmoid, a fast, padding-aware sigmoid attention kernel for GPUs, optimized for single-cell foundation models. In such models, each cell is represented as a sequence of genes, and a single gene may be regulated by multiple transcription factors simultaneously. Traditional softmax attention forces these factors to compete for attention, but sigmoid allows the model to attend strongly to many genes at once. Because cells express anywhere from 200 to 16,000+ genes, the kernel natively handles variable-length padding, avoiding wasted compute on empty positions.
In experiments on an H100 GPU, TritonSigmoid achieved up to 515 TFLOPS, significantly outperforming FlashAttention-2 (361 TFLOPS) and FlashSigmoid (440 TFLOPS). It also delivered lower validation loss across six held-out datasets, 25% better cell-type separation in learned representations, and stable training in cases where softmax attention catastrophically diverges. This open-source release (paper and code available) promises more efficient and accurate attention for genomics and other domains requiring multi-token focus.
- Up to 515 TFLOPS on H100, beating FlashAttention-2 (361 TFLOPS) and FlashSigmoid (440 TFLOPS).
- 25% improvement in cell-type separation and lower validation loss across six datasets.
- Stable training where softmax attention diverges, with native variable-length padding support.
Why It Matters
Enables efficient, accurate attention for single-cell genomics and other variable-length sequence tasks.