Загрузка видео...

Не удалось загрузить видео

На главную

Introducing OpenDiLoCo, an open-source implementation and scaling of DeepMind’s Distributed Low-Communication (DiLoCo) method, enabling globally distributed AI model training.

260,002 просмотров • 1 год назад •via X (Twitter)

Комментарии: 11

Фото профиля Prime Intellect
Prime Intellect1 год назад

Last week, we released the first step in our masterplan by launching the PI Compute Exchange. Today, we are thrilled to announce a major step forward on the second part by open-sourcing our framework to enable collaborative model development across globally distributed GPUs.

Фото профиля Prime Intellect
Prime Intellect1 год назад

We reproduced DeepMind's DiLoCo experiments in a scalable, decentralized training framework. We trained a model across 3 countries with 90-95% compute utilization and scaled it to 3x the size of the original work, proving its effectiveness for billion-parameter models.

Фото профиля Prime Intellect
Prime Intellect1 год назад

DiLoCo Recent work by @GoogleDeepMind introduced an approach that enables training of language models on devices that are poorly connected. The method allows for data parallel training, but requires synchronization of gradients only every 500 steps.

Фото профиля Prime Intellect
Prime Intellect1 год назад

OpenDiLoCo To foster collaboration in this promising research direction to democratize AI, we have released our code for OpenDiLoCo under an open-source license: Our implementation is built on top of the Hivemind library, enabling a real-world decentralized training setup for DiLoCo, including: - On/Off ramping of compute resources - Fault tolerance training - Peer-to-Peer: There is no master node.

Фото профиля Prime Intellect
Prime Intellect1 год назад

We replicate the main experiment results and show that DiLoCo with 8 replicas significantly outperforms the baseline without any replicas and matches the performance of a stronger baseline with the same compute budget, despite 500x lower communication.

Фото профиля Prime Intellect
Prime Intellect1 год назад

Scaling DiLoCo to Billion Parameter Models The original DiLoCo work only experimented with model sizes of up to 400 million parameters. We scale the method to a 1.1 billion parameter model. While we demonstrate that DiLoCo works at the billion-parameter scale, we believe further work is needed to make it effective with larger batch sizes and increased local steps.

Фото профиля Prime Intellect
Prime Intellect1 год назад

Globally Distributed Training Setting We train a billion parameter scale model across three countries. Due to DiLoCo’s reduction in communication time, the all-reduce bottleneck only took up 6.9% of the training time, minimally impacting the overall training speed.

Фото профиля Prime Intellect
Prime Intellect1 год назад

Blog: Code: Paper:

Фото профиля Prime Intellect
Prime Intellect1 год назад

We are excited about OpenDiLoCo's practical applications and look forward to building on it for the third part of our masterplan: To collaboratively train and contribute to open AI models in high-impact domains like language, agents, code, and science for collective ownership.

Фото профиля Prime Intellect
Prime Intellect1 год назад

Join us in building the open future of decentralized AI! - Apply for open roles: - Collaborate on AI initiatives: - Contribute compute & earn ownership

Фото профиля Prime Intellect
Prime Intellect1 год назад

We want to thank @m_ryabinin for his guidance and help with the Hivemind library, and @Ar_Douillard for his work on DiLoCo and helping us figure out the details of reproducing the original experiments!

Похожие видео