Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

I trained models across MacBooks using Apple's AirDrop protocol. grove is a distributed training library for Apple Silicon. Devices discover each other over AWDL, a direct radio link. If there's a shared WiFi network it upgrades to that for speed, otherwise everything goes over the direct link. No router,...

666,255 Aufrufe • vor 2 Monaten •via X (Twitter)

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Cloud GPU training is a scam. A single M4 MacBook does 2.9 TFLOPS. Seven friends with MacBooks match an NVIDIA A100. Alexander Hayes just open-sourced a tool that makes this work over Wi-Fi. It's called AirTrain. Here's how it works: Traditional distributed training (DDP) syncs gradients after every single step. For a 124M parameter model, that's ~500MB exchanged per step. You need 50 GB/s of sustained bandwidth. Impossible over Wi-Fi. AirTrain uses the DiLoCo algorithm. Each Mac trains independently for 500 steps, then syncs only the difference. One sync per 500 steps instead of one per step. 500x less network communication. Wi-Fi actually works. The entire sync takes ~2 seconds. Here's what makes it wild: → Zero-config discovery. Devices find each other automatically via mDNS/Bonjour. Same protocol as AirDrop. → Fault tolerant. Nodes can join and leave mid-training without killing the run. → Checkpoint relay. Train for a few hours, export a checkpoint, hand it off to someone else to continue. Like a relay race for ML training. → Built on Apple's MLX framework. Native to M1/M2/M3/M4/M5 unified memory. No host-to-device copy overhead. → Local dashboard. Real-time loss curves, peer monitoring, throughput metrics in your browser. Here's the wildest part: An M4 Max with 128GB unified memory can train a 70B parameter model without offloading. An NVIDIA RTX 4090 has 24GB VRAM. Apple Silicon gets ~245-460 GFLOPS per watt. Training on MacBooks costs almost nothing in electricity compared to cloud GPUs. And there are hundreds of millions of Apple Silicon Macs in the world. The math: Traditional DDP: 1 sync per step = 50 GB/s required AirTrain (DiLoCo): 1 sync per 500 steps = 0.1 GB/s required Wi-Fi handles 0.1 GB/s. That's it. That's the breakthrough. They even built a community platform at with live session browsing, checkpoint sharing, and a contributor leaderboard. Training a 124M parameter GPT-2? Instead of renting cloud GPUs at $3/hr, pool three MacBooks in a coffee shop and train for free. MIT licensed. Built in Python. 1 contributor. Early stage but the idea is insane. 100% Open Source. (Link in the comments)

Guri Singh

160,201 Aufrufe • vor 1 Monat