正在加载视频...
视频加载失败
(1/5) FP4 hardware is here, but 4-bit attention still kills model quality, blocking true end-to-end FP4 serving. To fix that, we propose Attn-QAT, the first systematic study of quantization-aware training for attention. The result: FP4 attention quality is comparable to BF16 attention with 1.1x–1.5x higher throughput than SageAttention3 on... show more
37,506 次观看 • 2 个月前 •via X (Twitter)
0 条评论
暂无评论
原始帖子的评论将显示在这里
相关视频
0:33
Sensitive content
Pay close attention to the end
SuperTate
17,945 次观看 • 6 个月前
1:48
Sensitive content
🚨 Attention Leftists, the high road has come to an end!
Catarina Senora Gatita
333,050 次观看 • 9 个月前

