Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Writing a CUDA kernel requires a shift in mental model. Instead of one fast processor, you manage thousands of tiny threads. Here is the code and the logic explained for Matrix Multiplication.

189,009 Aufrufe • vor 6 Monaten •via X (Twitter)

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Introducing The AI CUDA Engineer: An agentic AI system that automates the production of highly optimized CUDA kernels. The AI CUDA Engineer can produce highly optimized CUDA kernels, reaching 10-100x speedup over common machine learning operations in PyTorch. Our system is also able to produce highly optimized CUDA kernels that are much faster than existing CUDA kernels commonly used in production. We believe that fundamentally, AI systems can and should be as resource-efficient as the human brain, and that the best path to achieve this efficiency is to use AI to make AI more efficient! We are excited to publish our paper, The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition. We also release a dataset of over 17,000 verified CUDA kernels produced by The AI CUDA Engineer. Paper: Kernel Archive Webpage: HuggingFace Dataset: The AI CUDA Engineer utilizes evolutionary LLM-driven code optimization to autonomously improve the runtime of machine learning operations. Our system is not only able to convert PyTorch code into CUDA kernels, but through the use of evolution, it can also optimize the runtime performance of CUDA kernels, fuse multiple operations, and even discover novel solutions for writing efficient CUDA operations by learning from past innovations! We believe The AI CUDA Engineer opens a new era of AI-driven acceleration of AI and automated inference time optimization. We (Robert Lange, Aaditya Prasad 🇺🇸, Suuun, Maxence Faldor, Yujin Tang, hardmaru) are excited to continue Sakana AI's mission of leveraging AI to improve AI.

Sakana AI

1,149,339 Aufrufe • vor 1 Jahr