Loading video...

Video Failed to Load

Go Home

Cloud GPU training is a scam. A single M4 MacBook does 2.9 TFLOPS. Seven friends with MacBooks match an NVIDIA A100. Alexander Hayes just open-sourced a tool that makes this work over Wi-Fi. It's called AirTrain. Here's how it works: Traditional distributed training (DDP) syncs gradients after every single...

160,201 views • 1 month ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

The bottleneck in AI has quietly shifted. - It's not the models. They are capable. - It's not the frameworks. They are mature. - It's not even the data, in many cases. When you want to train a model today, the first question isn't "what architecture should I use?" Instead, it's: "Where am I going to get infrastructure that actually works?" Not just GPUs but the entire stack: compute, deployment, scaling, storage. The traditional path is major cloud providers or specialized GPU clouds. Both have the same problem: they're built for enterprises with committed workloads, minimum spend requirements, contract negotiations, and involve quota approvals that take days. Even the "on-demand" options require you to piece together training, deployment, and scaling across different services. By the time you're actually training, hours, if not days, have passed. And there's a subtler cost: part of your brain is always managing infrastructure instead of thinking about the actual problem. I've been using Runpod for a while now, and it's the closest I've found to infrastructure that just disappears. I pay for the serverless solution by the second, and stop when I'm done. This sounds like it should be the default across all providers, but it isn't. For instance, when I'm prototyping, I don't need an H100. Instead, I need the flexibility to use cheaper GPUs that are actually available, where I can iterate fast and not worry about cost. An A40 at a few cents per hour is perfect for this. Then, when the approach is validated, I scale up. This matches how good engineering actually works. Running distributed training across multiple nodes for multi-GPU training usually requires significant infra work. RunPod abstracts most of this away. A lot of the advantage in AI comes from iteration speed. Infra that adds days of latency to that loop is a real cost, even if it's hard to measure. But good infra gets out of your way. It's available when you need it, invisible when you don't. In the video below, I have shown a simple model training workflow trained using PyTorch in Jupyter Lab. It runs in a dedicated PyTorch Pod hosted on Runpod, and I worked with the team to put this together for you. Find a link to start using Runpod in the replies!

Avi Chawla

13,696 views • 5 months ago