Loading video...
Video Failed to Load
There's no point in doing decentralized training without efficient communication. >> DiLoCo (H=15) ships ~480mb/merge with 163 syncs. >> SparseLoCo (H=15) ships ~5.5–17mb/merge at 0.78–3.12% density with 163 syncs Top-K Compression + 2 bit comms ~28–89× smaller per sync than DiLoCo. Subnet 3 :: Luis el grande If you... show more
17,767 views • 9 months ago •via X (Twitter)
0 Comments
No comments available
Comments from the original post will appear here
