Pinokio

SageAttention

https://github.com/thu-ml/sageattentionupdated 1/17/2026, 5:43:28 PMindexed 4/14/2026, 2:14:06 PM

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Community tagsLoading...
Sort
Loading…