Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Writing a CUDA kernel requires a shift in mental model. Instead of one fast processor, you manage thousands of tiny threads. Here is the code and the logic explained for Matrix Multiplication.

Ashutosh Maheshwari

35,305 subscribers

189,009 Aufrufe • vor 6 Monaten •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Fast matrix multiplication on GPUs has traditionally meant wrestling with threads, shared memory, and low-level hardware details. This webinar explores how NVIDIA’s CUDA Tile model—and its Julia port, cuTile.jl—makes high-performance GPU programming more accessible. Join Dr. Andy Terrel of NVIDIA and Dr. Tim Besard of JuliaHub to see real examples across linear algebra, AI inference, and HPC. Register here - #JuliaLang #GPUProgramming #CUDA #HPC #AIInfrastructure

Fast matrix multiplication on GPUs has traditionally meant wrestling with threads, shared memory, and low-level hardware details. This webinar explores how NVIDIA’s CUDA Tile model—and its Julia port, cuTile.jl—makes high-performance GPU programming more accessible. Join Dr. Andy Terrel of NVIDIA and Dr. Tim Besard of JuliaHub to see real examples across linear algebra, AI inference, and HPC. Register here - #JuliaLang #GPUProgramming #CUDA #HPC #AIInfrastructure

JuliaHub

10,255 Aufrufe • vor 1 Monat

My dear software developers (and anyone who’s interested in the future of code search): I have crawled through depths of hell to bring you, one of the more important foundational piece of programming: fast, the most accurate, index free, and correct code search Here is a real time code search on leaked claude code sources, linux kernel 100k files, and chromium repo 500k files

My dear software developers (and anyone who’s interested in the future of code search): I have crawled through depths of hell to bring you, one of the more important foundational piece of programming: fast, the most accurate, index free, and correct code search Here is a real time code search on leaked claude code sources, linux kernel 100k files, and chromium repo 500k files

Dmitriy Kovalenko

190,044 Aufrufe • vor 2 Monaten

timelapse attempt #2 >day 42 of unemployment >writing the naive cuda flashattention kernel >private sidequest progress >starting a blogpost >still haven't book the housing for asia in 3days

timelapse attempt #2 >day 42 of unemployment >writing the naive cuda flashattention kernel >private sidequest progress >starting a blogpost >still haven't book the housing for asia in 3days

alexine 🏴‍☠️

558,960 Aufrufe • vor 6 Monaten

The Matrix (1999) had one of the smartest mystery-driven releases ever. The marketing barely explained anything, trailers asked a single question, and pushed curiosity instead of answers.

The Matrix (1999) had one of the smartest mystery-driven releases ever. The marketing barely explained anything, trailers asked a single question, and pushed curiosity instead of answers.

cinesthetic.

20,244 Aufrufe • vor 5 Monaten

📁Mo Gawdat, former Google X executive, says AI is no longer just writing code, it is correcting human mathematics. After 56 years using the same matrix multiplication method, AI realized the approach was flawed. It did not optimize software. It invented new math. The result was a 23% performance boost and the removal of hundreds of millions of dollars in costs and energy use.

📁Mo Gawdat, former Google X executive, says AI is no longer just writing code, it is correcting human mathematics. After 56 years using the same matrix multiplication method, AI realized the approach was flawed. It did not optimize software. It invented new math. The result was a 23% performance boost and the removal of hundreds of millions of dollars in costs and energy use.

Jon Hernandez

136,394 Aufrufe • vor 5 Monaten

Introducing The AI CUDA Engineer: An agentic AI system that automates the production of highly optimized CUDA kernels. The AI CUDA Engineer can produce highly optimized CUDA kernels, reaching 10-100x speedup over common machine learning operations in PyTorch. Our system is also able to produce highly optimized CUDA kernels that are much faster than existing CUDA kernels commonly used in production. We believe that fundamentally, AI systems can and should be as resource-efficient as the human brain, and that the best path to achieve this efficiency is to use AI to make AI more efficient! We are excited to publish our paper, The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition. We also release a dataset of over 17,000 verified CUDA kernels produced by The AI CUDA Engineer. Paper: Kernel Archive Webpage: HuggingFace Dataset: The AI CUDA Engineer utilizes evolutionary LLM-driven code optimization to autonomously improve the runtime of machine learning operations. Our system is not only able to convert PyTorch code into CUDA kernels, but through the use of evolution, it can also optimize the runtime performance of CUDA kernels, fuse multiple operations, and even discover novel solutions for writing efficient CUDA operations by learning from past innovations! We believe The AI CUDA Engineer opens a new era of AI-driven acceleration of AI and automated inference time optimization. We (Robert Lange, Aaditya Prasad 🇺🇸, Suuun, Maxence Faldor, Yujin Tang, hardmaru) are excited to continue Sakana AI's mission of leveraging AI to improve AI.

Introducing The AI CUDA Engineer: An agentic AI system that automates the production of highly optimized CUDA kernels. The AI CUDA Engineer can produce highly optimized CUDA kernels, reaching 10-100x speedup over common machine learning operations in PyTorch. Our system is also able to produce highly optimized CUDA kernels that are much faster than existing CUDA kernels commonly used in production. We believe that fundamentally, AI systems can and should be as resource-efficient as the human brain, and that the best path to achieve this efficiency is to use AI to make AI more efficient! We are excited to publish our paper, The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition. We also release a dataset of over 17,000 verified CUDA kernels produced by The AI CUDA Engineer. Paper: Kernel Archive Webpage: HuggingFace Dataset: The AI CUDA Engineer utilizes evolutionary LLM-driven code optimization to autonomously improve the runtime of machine learning operations. Our system is not only able to convert PyTorch code into CUDA kernels, but through the use of evolution, it can also optimize the runtime performance of CUDA kernels, fuse multiple operations, and even discover novel solutions for writing efficient CUDA operations by learning from past innovations! We believe The AI CUDA Engineer opens a new era of AI-driven acceleration of AI and automated inference time optimization. We (Robert Lange, Aaditya Prasad 🇺🇸, Suuun, Maxence Faldor, Yujin Tang, hardmaru) are excited to continue Sakana AI's mission of leveraging AI to improve AI.

Sakana AI

1,149,339 Aufrufe • vor 1 Jahr

This is my favorite clip of the new Elon pod. He opens up saying xAI struggles with memory usage/bandwidth and CUDA kernel optimization (matmul, attention, MoE, etc). If you are good kernel or performance engineering in general, you should apply. Steer the world in a better direction.

This is my favorite clip of the new Elon pod. He opens up saying xAI struggles with memory usage/bandwidth and CUDA kernel optimization (matmul, attention, MoE, etc). If you are good kernel or performance engineering in general, you should apply. Steer the world in a better direction.

Elliot Arledge

158,971 Aufrufe • vor 5 Monaten

Cursor CEO Michael Truell on the future of writing code: “ Our goal with Cursor is to invent a new type of programming.” “It looks like a world where you have a representation of the logic of your software that does look more like English.” “You can imagine kind of an evolution of programming language towards pseudocode. You have written down the logic of the software, and you can edit that at a high level.” “It won't be the impenetrable millions of lines of code, it'll instead be something that's much terser and easier to understand and easier to navigate.” Source: Michael Truell (CEO Cursor) with Lenny Rachitsky on Lenny's Podcast

Cursor CEO Michael Truell on the future of writing code: “ Our goal with Cursor is to invent a new type of programming.” “It looks like a world where you have a representation of the logic of your software that does look more like English.” “You can imagine kind of an evolution of programming language towards pseudocode. You have written down the logic of the software, and you can edit that at a high level.” “It won't be the impenetrable millions of lines of code, it'll instead be something that's much terser and easier to understand and easier to navigate.” Source: Michael Truell (CEO Cursor) with Lenny Rachitsky on Lenny's Podcast

a16z

812,505 Aufrufe • vor 7 Monaten

The largest advancement of the CUDA platform since its creation in 2006 is here 👀 Introducing CUDA Tile, a tile-based programming model that provides the ability to write algorithms at a higher level and abstract away the details of specialized hardware, such as tensor cores. Read the technical blog 👉

The largest advancement of the CUDA platform since its creation in 2006 is here 👀 Introducing CUDA Tile, a tile-based programming model that provides the ability to write algorithms at a higher level and abstract away the details of specialized hardware, such as tensor cores. Read the technical blog 👉

NVIDIA AI Developer

244,885 Aufrufe • vor 6 Monaten

Luminal ( is creating PyTorch for Production – an ML compiler that generates blazingly fast CUDA kernels and makes deploying to production one line of code. Congrats on the launch, Jake Stevens, Joe Fioti, and Matthew Gunton!

Luminal ( is creating PyTorch for Production – an ML compiler that generates blazingly fast CUDA kernels and makes deploying to production one line of code. Congrats on the launch, Jake Stevens, Joe Fioti, and Matthew Gunton!

Y Combinator

98,496 Aufrufe • vor 11 Monaten

Neo in The Matrix (1999) becoming The One is one of the hardest aura shifts ever put on screen. The second Neo starts seeing the code and casually stops bullets, the entire movie suddenly starts moving at his pace instead.

Neo in The Matrix (1999) becoming The One is one of the hardest aura shifts ever put on screen. The second Neo starts seeing the code and casually stops bullets, the entire movie suddenly starts moving at his pace instead.

cinesthetic.

229,774 Aufrufe • vor 1 Monat

Problem-solving is one of the most important skills that you need as a web dev, but most videos don't cover it. I wanted to show you what troubleshooting is like with a Frontend Mentor challenge, instead of writing perfect code the first time around.

Problem-solving is one of the most important skills that you need as a web dev, but most videos don't cover it. I wanted to show you what troubleshooting is like with a Frontend Mentor challenge, instead of writing perfect code the first time around.

Jess Chan | Coder Coder

10,871 Aufrufe • vor 10 Monaten

Stop writing utility classes. Utility classes are a sign that you are writing procedural code. You should avoid them. Instead, write a real object model. For example, an email is not a String... It's an Email. Gautier - 🤘

Stop writing utility classes. Utility classes are a sign that you are writing procedural code. You should avoid them. Instead, write a real object model. For example, an email is not a String... It's an Email. Gautier - 🤘

Gautier 💙

19,509 Aufrufe • vor 1 Jahr

At Anthropic AI is writing code, even designing the next versions of itself, Dario Amodei says. This loop is closing fast, and the speed of progress is both exciting and a little unsettling. Its accelerating.

At Anthropic AI is writing code, even designing the next versions of itself, Dario Amodei says. This loop is closing fast, and the speed of progress is both exciting and a little unsettling. Its accelerating.

Chubby♨️

20,504 Aufrufe • vor 4 Monaten

A new mechanism to manage starvation instead of ending it, and Israeli helicopters open fire on thousands of starving people crushed in the chaos. This is how the first day of allowed aid entry into Gaza looked like..

A new mechanism to manage starvation instead of ending it, and Israeli helicopters open fire on thousands of starving people crushed in the chaos. This is how the first day of allowed aid entry into Gaza looked like..

Euro-Med Monitor

29,933 Aufrufe • vor 1 Jahr

i wanted to start a consistency challenge, then I came across the Sui Ghana Content Challenge and here I am. 20 days of dropping content: Videos Threads etc… A bit late today, but this is Day 1 I explained Web3 and why Sui. See you in the next one. #SuiContentChallenge

i wanted to start a consistency challenge, then I came across the Sui Ghana Content Challenge and here I am. 20 days of dropping content: Videos Threads etc… A bit late today, but this is Day 1 I explained Web3 and why Sui. See you in the next one. #SuiContentChallenge

Creative Bee🐝

14,814 Aufrufe • vor 8 Monaten

The "Falling Green Code" in The Matrix is actually just a bunch of Japanese sushi recipes scanned from a cookbook.

The "Falling Green Code" in The Matrix is actually just a bunch of Japanese sushi recipes scanned from a cookbook.

Every Movie Plug

19,622 Aufrufe • vor 22 Tagen

Class Act: Newly retired Patriots TE Rob Gronkowski on a possible statue outside the stadium. "If you put one, just put me in the endzone, and be tiny. Because Tom is one of a kind. The greatest. He played for 20+ years here. I think that the only statue to be deserved here is Tom Brady." Gronk is a true role model on and off the field ❤️

Class Act: Newly retired Patriots TE Rob Gronkowski on a possible statue outside the stadium. "If you put one, just put me in the endzone, and be tiny. Because Tom is one of a kind. The greatest. He played for 20+ years here. I think that the only statue to be deserved here is Tom Brady." Gronk is a true role model on and off the field ❤️

Dov Kleiman

739,751 Aufrufe • vor 7 Monaten

India’s highway boom is bigger than you think 60% growth in just a decade and thousands of km of new expressways changing the game. This massive shift in infrastructure momentum is being driven under Nitin Gadkari's leadership. The road revolution is real and it’s accelerating fast. 🚀 #Highway #RoadInfrastructure #NitinGadkari

India’s highway boom is bigger than you think 60% growth in just a decade and thousands of km of new expressways changing the game. This massive shift in infrastructure momentum is being driven under Nitin Gadkari's leadership. The road revolution is real and it’s accelerating fast. 🚀 #Highway #RoadInfrastructure #NitinGadkari

The Bharat Post

64,145 Aufrufe • vor 3 Monaten