Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Scaling up GANs for Text-to-Image Synthesis present our 1B-parameter GigaGAN, achieving lower FID than Stable Diffusion v1.5, DALL·E 2, and Parti-750M. It generates 512px outputs at 0.13s, orders of magnitude faster than diffusion and autoregressive models, and inherits the disentangled, continuous, and controllable latent space of GANs abs: project page:

AK

499,728 subscribers

278,115 görüntüleme • 3 yıl önce •via X (Twitter)

Oyun Eğitim Bilim & Teknoloji

Anya Rossi• Live Now

Private livecam show

10 Yorum

Daniel Losey 🔀 profil fotoğrafı

Daniel Losey 🔀3 yıl önce

amazing

David Marx (@digthatdata.bsky.social) profil fotoğrafı

David Marx (@digthatdata.bsky.social)3 yıl önce

GANs are back baybee

Nicolay Mausz profil fotoğrafı

Nicolay Mausz3 yıl önce

Adobe research - I guess this will be part of CC

Draz ⚛️ profil fotoğrafı

Draz ⚛️3 yıl önce

The upscaling is quite insane on how it accurately fills in details

Nerdy Rodent 🐀🤓💻 profil fotoğrafı

Nerdy Rodent 🐀🤓💻3 yıl önce

It’s been hours now, why isn’t it showing up? 😉

Asriel H profil fotoğrafı

Asriel H3 yıl önce

It has the same schema of injecting latent vector into every scaling layer as StyleGAN has

okaris profil fotoğrafı

okaris3 yıl önce

The examples provided don’t look as good as diffusion models. Some details obscured or looking weird.

Adhik Joshi profil fotoğrafı

Adhik Joshi3 yıl önce

Weights aren't open-source

Julien Genoud profil fotoğrafı

Julien Genoud3 yıl önce

The 4k upsampler 🤯

Clarence Hu profil fotoğrafı

Clarence Hu3 yıl önce

paging @gwern

Benzer Videolar

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048 abs: project page:

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048 abs: project page:

AK

718,746 görüntüleme • 3 yıl önce

"MeshFlow: Efficient Artistic Mesh Generation via MeshVAE and Flow-based Diffusion Transformer" TL;DR: learns a continuous mesh latent space and generates vertices and connectivity in parallel with flow matching, producing quality 3D meshes up to 18× faster than autoregressive.

"MeshFlow: Efficient Artistic Mesh Generation via MeshVAE and Flow-based Diffusion Transformer" TL;DR: learns a continuous mesh latent space and generates vertices and connectivity in parallel with flow matching, producing quality 3D meshes up to 18× faster than autoregressive.

Alexandre Morgand

42,638 görüntüleme • 11 gün önce

Dreamix: Video Diffusion Models are General Video Editors abs: project page: present diffusion-based method that is able to perform text-based motion and appearance editing of general videos

Dreamix: Video Diffusion Models are General Video Editors abs: project page: present diffusion-based method that is able to perform text-based motion and appearance editing of general videos

AK

398,160 görüntüleme • 3 yıl önce

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

Daniel Israel

38,699 görüntüleme • 8 ay önce

Generative Novel View Synthesis with 3D-Aware Diffusion Models abs: project page:

Generative Novel View Synthesis with 3D-Aware Diffusion Models abs: project page:

AK

304,708 görüntüleme • 3 yıl önce

We're moving beyond autoregressive LLMs! Autoregressive LLMs generate text word-by-word, which can be slow and affect quality, while diffusion models refine noise step-by-step, allowing for faster iterations and error correction. Here's Gemini Diffusion running at 857 tokens/s:

We're moving beyond autoregressive LLMs! Autoregressive LLMs generate text word-by-word, which can be slow and affect quality, while diffusion models refine noise step-by-step, allowing for faster iterations and error correction. Here's Gemini Diffusion running at 857 tokens/s:

Akshay 🚀

34,524 görüntüleme • 1 yıl önce

Block Diffusion Interpolating Between Autoregressive and Diffusion Language Models

Block Diffusion Interpolating Between Autoregressive and Diffusion Language Models

AK

160,553 görüntüleme • 1 yıl önce

MinerU-Diffusion A 2.5B diffusion-based OCR model that replaces slow autoregressive decoding with parallel block-wise diffusion, achieving up to 3.2x faster inference while improving robustness on complex documents with tables, formulas, and layouts.

MinerU-Diffusion A 2.5B diffusion-based OCR model that replaces slow autoregressive decoding with parallel block-wise diffusion, achieving up to 3.2x faster inference while improving robustness on complex documents with tables, formulas, and layouts.

DailyPapers

15,304 görüntüleme • 3 ay önce

Continuous diffusion had a good run—now it’s time for Discrete diffusion! Introducing Anchored Posterior Sampling (APS) APS outperforms discrete and continuous baselines in terms of performance & scaling on inverse problems, stylization, and text-guided editing.

Continuous diffusion had a good run—now it’s time for Discrete diffusion! Introducing Anchored Posterior Sampling (APS) APS outperforms discrete and continuous baselines in terms of performance & scaling on inverse problems, stylization, and text-guided editing.

Litu Rout

39,922 görüntüleme • 8 ay önce

I'm thrilled to announce the launch of ⚡️Flash Diffusion from Jasper! Earlier this year, with our acquisition of Clipdrop, we launched the Jasper AI Research Lab in Paris. Today, we are excited to release our first piece of groundbreaking research: the open-source distillation method, "Flash Diffusion". Flash Diffusion accelerates inference by 500%, reduces computing costs, and produces higher-quality image outputs. Dive into the details and discover how Flash Diffusion is set to revolutionize the field of AI and image synthesis. Read all about it here: Try a demo on Hugging Face:

I'm thrilled to announce the launch of ⚡️Flash Diffusion from Jasper! Earlier this year, with our acquisition of Clipdrop, we launched the Jasper AI Research Lab in Paris. Today, we are excited to release our first piece of groundbreaking research: the open-source distillation method, "Flash Diffusion". Flash Diffusion accelerates inference by 500%, reduces computing costs, and produces higher-quality image outputs. Dive into the details and discover how Flash Diffusion is set to revolutionize the field of AI and image synthesis. Read all about it here: Try a demo on Hugging Face:

Timothy Young

10,062 görüntüleme • 2 yıl önce

Struggling with slow inference of diffusion and flow models? Check out the video below—I’ve been using our new FastGen library to achieve 7-28x acceleration for text-2-image and {text,image,video}-2-video generation without sacrificing visual fidelity!

Struggling with slow inference of diffusion and flow models? Check out the video below—I’ve been using our new FastGen library to achieve 7-28x acceleration for text-2-image and {text,image,video}-2-video generation without sacrificing visual fidelity!

Julius Berner

13,623 görüntüleme • 4 ay önce

Today, we’re thrilled to announce the open weights for Stable Diffusion 3 Medium, the latest and most advanced text-to-image AI model in our Stable Diffusion 3 series! This new release represents a major milestone in the evolution of generative AI and continues our commitment to democratising this powerful technology. 🎉 Learn more and get started here:

Today, we’re thrilled to announce the open weights for Stable Diffusion 3 Medium, the latest and most advanced text-to-image AI model in our Stable Diffusion 3 series! This new release represents a major milestone in the evolution of generative AI and continues our commitment to democratising this powerful technology. 🎉 Learn more and get started here:

Stability AI

307,838 görüntüleme • 2 yıl önce

👀 Pixel perfect 💎✨ 🖼️ Edify Image from #NVIDIAResearch is a family of diffusion models that supports a wide range of applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360° HDR panorama generation, and finetuning for image customization. 🧵 1/2

👀 Pixel perfect 💎✨ 🖼️ Edify Image from #NVIDIAResearch is a family of diffusion models that supports a wide range of applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360° HDR panorama generation, and finetuning for image customization. 🧵 1/2

NVIDIA AI Developer

14,747 görüntüleme • 1 yıl önce

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or common sense reasoning. We propose to equip diffusion models with enhanced reasoning capabilities by using off-the-shelf pretrained large language models (LLMs) in a novel two-stage generation process. First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with an image prompt, an LLM outputs a scene layout in the form of bounding boxes along with corresponding individual descriptions. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning. Additionally, our method naturally allows dialog-based scene specification and is able to handle prompts in a language that is not well-supported by the underlying diffusion model.

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or common sense reasoning. We propose to equip diffusion models with enhanced reasoning capabilities by using off-the-shelf pretrained large language models (LLMs) in a novel two-stage generation process. First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with an image prompt, an LLM outputs a scene layout in the form of bounding boxes along with corresponding individual descriptions. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning. Additionally, our method naturally allows dialog-based scene specification and is able to handle prompts in a language that is not well-supported by the underlying diffusion model.

AK

83,657 görüntüleme • 2 yıl önce

Nvidia's PID (Pixel diffusion Decoder) It decodes latent representations into high-resolution images, replacing the decode–then–super-resolve cascade while achieving lower latency and higher visual quality page: HF repo:

Nvidia's PID (Pixel diffusion Decoder) It decodes latent representations into high-resolution images, replacing the decode–then–super-resolve cascade while achieving lower latency and higher visual quality page: HF repo:

Stable Diffusion Tutorials

37,247 görüntüleme • 29 gün önce

Stable Diffusion generates beautiful images, but can it be used for open-world recognition? Try Demo! Our #CVPR2023 paper shows that the pre-trained diffusion model indeed is a good image parser, allows for open-vocabulary segmentation and detection.

Stable Diffusion generates beautiful images, but can it be used for open-world recognition? Try Demo! Our #CVPR2023 paper shows that the pre-trained diffusion model indeed is a good image parser, allows for open-vocabulary segmentation and detection.

Xiaolong Wang

241,225 görüntüleme • 3 yıl önce

We’re excited to announce new editing features in Stable Assistant, our user-friendly chatbot. Leveraging the power of Stable Diffusion 3, these advanced text-to-image capabilities produce higher quality outputs. Explore all the features here:

We’re excited to announce new editing features in Stable Assistant, our user-friendly chatbot. Leveraging the power of Stable Diffusion 3, these advanced text-to-image capabilities produce higher quality outputs. Explore all the features here:

Stability AI

84,125 görüntüleme • 2 yıl önce

🚀 Introducing InterDyn — our newly accepted CVPR work that explores controllable synthesis of interactive dynamics! Building upon powerful video diffusion models, InterDyn infers future motion and interactions directly from an input image and a dynamic control signal (e.g., a moving hand mask). Check out how we push the boundaries of intuitive physics with video generative models. Project page: Arxiv: #GenAI #AIGC #VideoGen #ML #ComputerVision #CVPR2025 🧵1/6

🚀 Introducing InterDyn — our newly accepted CVPR work that explores controllable synthesis of interactive dynamics! Building upon powerful video diffusion models, InterDyn infers future motion and interactions directly from an input image and a dynamic control signal (e.g., a moving hand mask). Check out how we push the boundaries of intuitive physics with video generative models. Project page: Arxiv: #GenAI #AIGC #VideoGen #ML #ComputerVision #CVPR2025 🧵1/6

Haven (Haiwen) Feng

44,898 görüntüleme • 1 yıl önce

TurboEdit Instant text-based image editing discuss: We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disentangled controls can be easily achieved in the few-step diffusion model by conditioning on an (automatically generated) detailed text prompt. To manipulate the inverted image, we freeze the noise maps and modify one attribute in the text prompt (either manually or via instruction based editing driven by an LLM), resulting in the generation of a new image similar to the input image with only one attribute changed. It can further control the editing strength and accept instructive text prompt. Our approach facilitates realistic text-guided image edits in real-time, requiring only 8 number of functional evaluations (NFEs) in inversion (one-time cost) and 4 NFEs per edit. Our method is not only fast, but also significantly outperforms state-of-the-art multi-step diffusion editing techniques.

TurboEdit Instant text-based image editing discuss: We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disentangled controls can be easily achieved in the few-step diffusion model by conditioning on an (automatically generated) detailed text prompt. To manipulate the inverted image, we freeze the noise maps and modify one attribute in the text prompt (either manually or via instruction based editing driven by an LLM), resulting in the generation of a new image similar to the input image with only one attribute changed. It can further control the editing strength and accept instructive text prompt. Our approach facilitates realistic text-guided image edits in real-time, requiring only 8 number of functional evaluations (NFEs) in inversion (one-time cost) and 4 NFEs per edit. Our method is not only fast, but also significantly outperforms state-of-the-art multi-step diffusion editing techniques.

AK

16,062 görüntüleme • 1 yıl önce

Trying out real-time Stable Diffusion and it's FAST! (link below) I'm struggling to type faster than the AI can generate the images😂

Trying out real-time Stable Diffusion and it's FAST! (link below) I'm struggling to type faster than the AI can generate the images😂

Chase Lean

140,336 görüntüleme • 2 yıl önce