正在加载视频...

视频加载失败

High-resolution image and video generation is hitting a wall because attention in DiTs scales quadratically with token count. But does every pixel need to be in full resolution? Introducing Foveated Diffusion: a new approach for efficient diffusion-based generation that allocates compute where it matters most. 1/7🧵

163,139 次观看 • 2 个月前 •via X (Twitter)

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Most recent diffusion language model research (that I’ve seen) seems to be using masking as the noising process. It looks like, however, most closed-source models (Google Gemini Diffusion and possibly Inception Labs’ Mercury) use a different noising process, where instead of masking tokens, they replace them with different tokens (either with a random token or a semantically similar token). I wondered how they were getting such high throughput with the latter noising process, since I believed that optimizing inference with KVCache approximation would be more difficult (for various reasons). I visualized this noising process with tiny-diffusion and compared it to normal unmasking, and was very surprised to see how fast the generation “settles” into a reasonable output, and then only slightly refines afterwards, requiring much fewer steps in total. Unmasking (where tokens are never remasked, the typical implementation) is inherently limited in generation speed by the fact that an increase in tokens decoded per step leads to more errors due to the mismatch between individual and marginal token probability distributions we sample from. The token replacement noising process seems to have a much different set of characteristics. Because we sample each token per step, every token makes “progress” towards the final output each iteration (in addition to *potentially* giving other tokens more information in future steps). Generally, masking has outperformed other noising processes, which is probably why most research focused on it (using smaller models). But the paper referred to in the retweet shows that random replacement as a noising process may scale better as model size increases. Big labs might have noticed these results much earlier (due to having drastically more training resources and being able to test larger models), which may explain the discrepancy in the choice of noising process. I’m gonna test this with larger models, since tiny-diffusion only has 10M parameters.

Nathan Barry

40,331 次观看 • 5 个月前

Google dropped a new AI paper called LUMIERE. It's remarkably flexible, supporting video inpainting, image-to-video, AND stylized video generation tasks. Say hello to “space-time diffusion” for video generation! Now what the heck does that mean exactly?! 🌐⏳ → TL;DR it utilizes a “Space-Time UNet” architecture that generates the full duration of the video in one pass, rather than generating distant keyframes and interpolating between them like prior works. Because the computation is done in this “compressed space-time representation” to generate the full clip at once, it's far more temporally consistent. → Another benefit of generating the full video at once is that you can “direct” the video generation, making it easier to hand off to other models/tasks without having to stitch together partial solutions. You can condition generations on additional inputs, meaning you get the full stack of AI video capabilities – from video inpainting to image-to-video and beyond. → New SOTA for AI video generation? User study results in the paper suggest human evaluators preferred Lumiere over Runway Gen-2, Pika Labs, and Stable Video Diffusion in terms of quality, text alignment AND motion. But as always, we need to get hands-on with this tech when Google *actually* decides to ship it. → Could this end up inside YouTube? Y’all know i’m obsessed with blending reality and imagination – so it’s the video inpainting tech I'm most excited about. I really hope this model finds its way into YouTube's Generative AI efforts, and based on their prior announcements and the list of acknowledgments in the paper I think it might! 🤞🏽 Links: 🔗Paper: 🔗Project:

Bilawal Sidhu

44,816 次观看 • 2 年前