Decentralized Diffusion Models power stronger models trained on more accessible infrastructure. DDMs mitigate the networking bottleneck that locks training into expensive and power-hungry centralized clusters. They scale gracefully to billions of parameters and generate photorealistic images with just a week of training on eight independent GPU nodes. They’re easy to implement, adopt DiT hyperparameters directly and outperform standard models FLOP-for-FLOP.