正在加载视频...
视频加载失败
timelapse #89 (12.5 hrs): - got single gpu nvfp4 gemm @ 5.2 PFLOPS working reliably (sm100) - solved ampere/hopper gemm kernel from scratch issues - split kernel optimization chapter into: - gemv, softmax, layernorm, topK, gemm (fp32 only cuda cores) - gemm (tf32, fp16, bf16, fp8, fp4) - cutting... show more
60,596 次观看 • 9 个月前 •via X (Twitter)
0 条评论
暂无评论
原始帖子的评论将显示在这里

