Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Decentralized RL as fast as centralized RL for LLMs. Bittensor SN81 grail has shattered the bandwidth barrier with PULSE (Patch Updates via Lossless Sparse Encoding). "By identifying the 99% weight sparsity inherent in Adam-bounded updates, we've achieved a 100x reduction in weight synchronization - dropping 14GB transfers to just...

13,074 görüntüleme • 2 ay önce •via X (Twitter)

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

Covenant Labs just did a 90-minute AMA breaking down their 3 Bittensor subnets. templar. basilica. grail. Pre-training, compute, and post-training under one roof. Most people missed it. Here's everything they said. Covenant is building what they call the "end to end intelligence continuum." Three subnets. Three layers of the AI stack. All permissionless. Templar (SN3) handles decentralized pre-training. Basilica (SN39) handles compute. Grail (SN81) handles RL post-training. Sam Dare, the lead, put it bluntly. Decentralized training is "humanity's last dance." Not about beating OpenAI head to head. About creating optionality. About making it cheap enough for anyone to train models. The gap between academia and frontier labs is growing exponentially. Researchers can't afford to experiment. The actual training run costs 5% of the reported budget. The other 95% is experimentation. If Covenant cracks cheap training, that entire surface area opens up. On Templar specifically: • Hit 39% emission on Bittensor. Highest since Apex was the only subnet on the network • Covenant-72B trained permissionlessly with 70+ contributors on commodity internet • 1.1 trillion tokens processed. No centralized data center • Performance competitive with LLaMA-2-70B On Grail, something flew under the radar. They built Pulse. A weight synchronization method that compresses model updates by 100x. • In RL post-training, only ~1% of weights update per step • Pulse exploits that sparsity. Lossless compression • Prime Intellect's comparable system took 14 minutes to sync a 30B model • Pulse makes decentralized RL training actually feasible at scale • Already used by Cursor The lead researcher on Grail said they've trained on math, code, and GPU kernels. Got 40-60% improvement on benchmarks. Working toward agentic training with 100K+ token context and 30B+ parameter models. On Basilica, the compute subnet: The team was blunt. Just reselling GPU hours is a 5-10% margin game. Traditional compute providers already do that. Their play is value-added services. • "GPU as code." No dashboard. No UI. Agents interact via SDK • Custom scheduler that places workloads across heterogeneous hardware • Verification checks for GPU, CPU, bandwidth, memory, storage, and OS security • Partnerships with providers like Mass Compute for 10-20% below market pricing • Miners compete on useful infrastructure, not just GPU hours Sam then went on a rant about the miner burn debate. His take: Bittensor had to grow up. dTAO introduced investors. The old "miners are God" philosophy doesn't hold. • Subnet owners have a duty to protect token value • Miners are a resource optimization exercise, not a cost reduction exercise • 100% miner emissions on compute subnets = immediate sell pressure • The 41% miner allocation is arbitrary. Different business models need different splits • Fish (who started burns) agreed. Burns usually mean the validation isn't mature enough The bigger point. You can't police burns. Subnets just send to their own keys instead of the burn address. Subnet 28 does exactly that. Sam's position: judge subnets on outcomes, not process. Const has changed the protocol 9-10 times in 2 years. That iteration speed is Bittensor's actual moat. The whole Covenant thesis is playing out in real time. TAO is up 100%+ in a month. Jensen Huang name-dropped the network. Grayscale has an ETF filing. But the real story is three subnets quietly building every layer of decentralized AI.

Jesus Martinez

26,642 görüntüleme • 2 ay önce