
h100envy
@h100envy • 1,125 subscribers
you're literally copium, i'm your opium
Videos

PyTorch core engineer at Meta turned CUDA kernel writing into a sport in 13 minutes - better than $1500 GPU programming bootcamps. profile the kernel -> find the bottleneck -> rewrite -> benchmark -> merge the winning code into PyTorch. That loop is how the open community now beats hand-tuned vendor kernels. GPU MODE community + KernelBot competition + winning kernel merged into the framework - that's the stack. Watch it, then steal the loop below.
h100envy35,148 Aufrufe • vor 4 Tagen
Keine weiteren Inhalte verfügbar