Loading video...
Video Failed to Load
I implemented Google Research's TurboQuant as a CUDA-native compression engine on Blackwell B200. 5x KV cache compression on Qwen 2.5-1.5B, near-loseless attention scores, generating live from compressed memory. 5 custom cuTile CUDA kernels ft: - fused attention (with QJL corrections) - online softmax -on-chip cache decompression - pipelined TMA... show more
806,592 views • 2 months ago •via X (Twitter)
0 Comments
No comments available
Comments from the original post will appear here
