Загрузка видео...
Не удалось загрузить видео
Introducing OpenDiLoCo, an open-source implementation and scaling of DeepMind’s Distributed Low-Communication (DiLoCo) method, enabling globally distributed AI model training.
260,002 просмотров • 1 год назад •via X (Twitter)
Комментарии: 11

Last week, we released the first step in our masterplan by launching the PI Compute Exchange. Today, we are thrilled to announce a major step forward on the second part by open-sourcing our framework to enable collaborative model development across globally distributed GPUs.

We reproduced DeepMind's DiLoCo experiments in a scalable, decentralized training framework. We trained a model across 3 countries with 90-95% compute utilization and scaled it to 3x the size of the original work, proving its effectiveness for billion-parameter models.

DiLoCo Recent work by @GoogleDeepMind introduced an approach that enables training of language models on devices that are poorly connected. The method allows for data parallel training, but requires synchronization of gradients only every 500 steps.

OpenDiLoCo To foster collaboration in this promising research direction to democratize AI, we have released our code for OpenDiLoCo under an open-source license: Our implementation is built on top of the Hivemind library, enabling a real-world decentralized training setup for DiLoCo, including: - On/Off ramping of compute resources - Fault tolerance training - Peer-to-Peer: There is no master node.

We replicate the main experiment results and show that DiLoCo with 8 replicas significantly outperforms the baseline without any replicas and matches the performance of a stronger baseline with the same compute budget, despite 500x lower communication.

Scaling DiLoCo to Billion Parameter Models The original DiLoCo work only experimented with model sizes of up to 400 million parameters. We scale the method to a 1.1 billion parameter model. While we demonstrate that DiLoCo works at the billion-parameter scale, we believe further work is needed to make it effective with larger batch sizes and increased local steps.

Globally Distributed Training Setting We train a billion parameter scale model across three countries. Due to DiLoCo’s reduction in communication time, the all-reduce bottleneck only took up 6.9% of the training time, minimally impacting the overall training speed.

Blog: Code: Paper:

We are excited about OpenDiLoCo's practical applications and look forward to building on it for the third part of our masterplan: To collaboratively train and contribute to open AI models in high-impact domains like language, agents, code, and science for collective ownership.

Join us in building the open future of decentralized AI! - Apply for open roles: - Collaborate on AI initiatives: - Contribute compute & earn ownership

We want to thank @m_ryabinin for his guidance and help with the Hivemind library, and @Ar_Douillard for his work on DiLoCo and helping us figure out the details of reproducing the original experiments!

