正在加载视频...

视频加载失败

Two days ago, Deepseek surprised everyone with an "undefined-behavior" PTX optimization speeding up particular ML workloads on a Hopper NVIDIA GPU Kernel. Let's reverse engineer the hack, implement it ourselves, and benchmark the speedup on an H100.

228,591 次观看 • 1 年前 •via X (Twitter)

11 条评论

LaurieWired 的头像
LaurieWired1 年前

Full Video:

LaurieWired 的头像
LaurieWired1 年前

My test code:

NetMind.AI 的头像
NetMind.AI2 年前

Get access to a wide range of GPUs like H100, A100, 4090, 3090 and save over 90% at NetMind Power. Rent Now!

numanumabruh 的头像
numanumabruh1 年前

You'd never know she's 6'5"

Jason Ho 的头像
Jason Ho1 年前

laurie supremacy

Bob (Moderna #7) Kerns 的头像
Bob (Moderna #7) Kerns1 年前

Until recently, I'd only seen your tweets; the first video I encountered was the 2025 prediction ones. Assumptions violated: higher voice, younger. Always good to have one's assumptions flagged, but especially the age. I was struck by the maturity of your analysis!

Dave 🚀 的头像
Dave 🚀1 年前

LFG!

KnowledgeisMostValuable 的头像
KnowledgeisMostValuable1 年前

I'd fight off a bear for you

nisten - e/acc 的头像
nisten - e/acc1 年前

lfg

🥀shiVam🥀 的头像
🥀shiVam🥀1 年前

wait, did you film this at Google HQ? (must appreciate the audio recording and editing)

Calcs 的头像
Calcs1 年前

Fantastic video, more please, lol 😂

相关视频