正在加载视频...
视频加载失败
Two days ago, Deepseek surprised everyone with an "undefined-behavior" PTX optimization speeding up particular ML workloads on a Hopper NVIDIA GPU Kernel. Let's reverse engineer the hack, implement it ourselves, and benchmark the speedup on an H100.
11 条评论

LaurieWired1 年前
Full Video:

LaurieWired1 年前
My test code:

NetMind.AI2 年前
Get access to a wide range of GPUs like H100, A100, 4090, 3090 and save over 90% at NetMind Power. Rent Now!

numanumabruh1 年前
You'd never know she's 6'5"

Jason Ho1 年前
laurie supremacy

Bob (Moderna #7) Kerns1 年前
Until recently, I'd only seen your tweets; the first video I encountered was the 2025 prediction ones. Assumptions violated: higher voice, younger. Always good to have one's assumptions flagged, but especially the age. I was struck by the maturity of your analysis!

Dave 🚀1 年前
LFG!

KnowledgeisMostValuable1 年前
I'd fight off a bear for you

nisten - e/acc1 年前
lfg

🥀shiVam🥀1 年前
wait, did you film this at Google HQ? (must appreciate the audio recording and editing)

Calcs1 年前
Fantastic video, more please, lol 😂


