正在加载视频...
视频加载失败
There's no point in doing decentralized training without efficient communication. >> DiLoCo (H=15) ships ~480mb/merge with 163 syncs. >> SparseLoCo (H=15) ships ~5.5–17mb/merge at 0.78–3.12% density with 163 syncs Top-K Compression + 2 bit comms ~28–89× smaller per sync than DiLoCo. Subnet 3 :: Luis el grande If you... show more
17,767 次观看 • 9 个月前 •via X (Twitter)
0 条评论
暂无评论
原始帖子的评论将显示在这里
