Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Introducing StyleDrop, a model that allows a significantly higher level of stylized text-to-image synthesis by using a few style reference images that describe the style for text-to-image generation, bypassing the burden of text prompt engineering. More→

Google AI

2,427,164 subscribers

80,377 views • 2 years ago •via X (Twitter)

Arts Science & Technology

Anya Rossi• Live Now

Private livecam show

10 Comments

kache2 years ago

model? code?

Alex Volkov (Thursd/AI) 🔜 AIENG summit NY2 years ago

Shoutout @natanielruizg 👏

Max (e/acc)2 years ago

Could you please introduce Gemini (at least pro) to EU, and Gemini Ultra to the world?🙂

Digital Adam2 years ago

@natanielruizg IPAdapter?

Nader Ale Ebrahim2 years ago

Impressive work, @GoogleAI! This innovation promises to simplify the process and enhance the quality of text-to-image generation. Keep pushing the boundaries of AI research! 🌟 #AI #Research #Innovation

takeyourmeds2 years ago

watchumean introducing that's old stuff in stable diffusion like a year old

Rom_AI2 years ago

Where is our access, Mr. Google?

Sumone .2 years ago

Just looking like a wow !!!

Mr.D2 years ago

That sounds like an exciting advancement in text-to-image synthesis! StyleDrop seems to offer a more efficient approach by utilizing style reference images instead of relying solely on text prompts. This could potentially lead to more accurate and diverse image generation. I'm curious to learn more about how this model works and the results it can produce. @mira_hurley @TimeForPlanX

xiutai2 years ago

@PublicAI_ #AI

Related Videos

StyleDrop: Text-to-Image Generation in Any Style introduce StyleDrop, a method that enables the synthesis of images that faithfully follow a specific style using a text-to-image model. The proposed method is extremely versatile and captures nuances and details of a user-provided style, such as color schemes, shading, design patterns, and local and global effects. It efficiently learns a new style by fine-tuning very few trainable parameters (less than 1% of total model parameters) and improving the quality via iterative training with either human or automated feedback. Better yet, StyleDrop is able to deliver impressive results even when the user supplies only a single image that specifies the desired style. An extensive study shows that, for the task of style tuning text-to-image models, StyleDrop implemented on Muse convincingly outperforms other methods, including DreamBooth and textual inversion on Imagen or Stable Diffusion. paper page:

StyleDrop: Text-to-Image Generation in Any Style introduce StyleDrop, a method that enables the synthesis of images that faithfully follow a specific style using a text-to-image model. The proposed method is extremely versatile and captures nuances and details of a user-provided style, such as color schemes, shading, design patterns, and local and global effects. It efficiently learns a new style by fine-tuning very few trainable parameters (less than 1% of total model parameters) and improving the quality via iterative training with either human or automated feedback. Better yet, StyleDrop is able to deliver impressive results even when the user supplies only a single image that specifies the desired style. An extensive study shows that, for the task of style tuning text-to-image models, StyleDrop implemented on Muse convincingly outperforms other methods, including DreamBooth and textual inversion on Imagen or Stable Diffusion. paper page:

AK

56,376 views • 3 years ago

Can a small academic team build a strong text-to-image model using only public datasets? Introducing i1: a simple, fully open recipe for strong text-to-image models

Can a small academic team build a strong text-to-image model using only public datasets? Introducing i1: a simple, fully open recipe for strong text-to-image models

Zhuang Liu

67,476 views • 16 days ago

CosmicMan A Text-to-Image Foundation Model for Humans We present CosmicMan, a text-to-image foundation model specialized for generating high-fidelity human images. Unlike current general-purpose foundation models that are stuck in the dilemma of inferior quality and

CosmicMan A Text-to-Image Foundation Model for Humans We present CosmicMan, a text-to-image foundation model specialized for generating high-fidelity human images. Unlike current general-purpose foundation models that are stuck in the dilemma of inferior quality and

AK

46,778 views • 2 years ago

The winner of Lovable's weekend competition: Kolbo ai - A powerful tool to help make all sorts of social media content with AI Features of the winning app: - Supabase for backend - Project-based organization system - OpenAI for text & image generation - Anthropic for text generation - Google Gemini for text generation - Midjourney for image generation - for image generation - Text-to-speech - Speech-to-text - Stripe for payments - mu for music generation Built by Zohar Vanunu 👇

The winner of Lovable's weekend competition: Kolbo ai - A powerful tool to help make all sorts of social media content with AI Features of the winning app: - Supabase for backend - Project-based organization system - OpenAI for text & image generation - Anthropic for text generation - Google Gemini for text generation - Midjourney for image generation - for image generation - Text-to-speech - Speech-to-text - Stripe for payments - mu for music generation Built by Zohar Vanunu 👇

Lovable

35,841 views • 1 year ago

Make a few more of these ... Make a LOT more of these ... Gemini 2.0 native image output is enabling a new way to prompt: instructing with image and text together. Subtle shifts in how I draw change how Gemini interprets the same text prompt.

Make a few more of these ... Make a LOT more of these ... Gemini 2.0 native image output is enabling a new way to prompt: instructing with image and text together. Subtle shifts in how I draw change how Gemini interprets the same text prompt.

Alexander Chen

23,601 views • 1 year ago

Introducing Ideogram 1.0: the most advanced text-to-image model, now available on This offers state-of-the-art text rendering, unprecedented photorealism, exceptional prompt adherence, and a new feature called Magic Prompt to help with prompting.

Introducing Ideogram 1.0: the most advanced text-to-image model, now available on This offers state-of-the-art text rendering, unprecedented photorealism, exceptional prompt adherence, and a new feature called Magic Prompt to help with prompting.

Ideogram

256,006 views • 2 years ago

ByteDance announced SeedEdit! A new image model that can edit images with text prompts. It allows for high-resolution editing and supports various changes like local replacements, geometric transformations, and style adjustments. Links ⬇️

ByteDance announced SeedEdit! A new image model that can edit images with text prompts. It allows for high-resolution editing and supports various changes like local replacements, geometric transformations, and style adjustments. Links ⬇️

Dreaming Tulpa 🥓👑

46,540 views • 1 year ago

Run InstantStyle Locally with 1 Click InstantStyle lets you generate images with a style of ANY other image, instantly. No LoRA required. Both text-to-image/image-to-image. I wrote a 1 click launcher for the gradio app from Frank (Haofan) Wang (The author of InstantStyle/InstantId!).

Run InstantStyle Locally with 1 Click InstantStyle lets you generate images with a style of ANY other image, instantly. No LoRA required. Both text-to-image/image-to-image. I wrote a 1 click launcher for the gradio app from Frank (Haofan) Wang (The author of InstantStyle/InstantId!).

cocktail peanut

39,104 views • 2 years ago

Ideogram AI presents Ideogram 1.0 text-to-image model offers state-of-the-art text rendering, unprecedented photorealism, exceptional prompt adherence, and a new feature called Magic Prompt to help with prompting

Ideogram AI presents Ideogram 1.0 text-to-image model offers state-of-the-art text rendering, unprecedented photorealism, exceptional prompt adherence, and a new feature called Magic Prompt to help with prompting

AK

96,079 views • 2 years ago

We think text-to-image AI is pretty interesting, so here's text-to-BIM! It won’t “create a museum in the style of Zaha Hadid.” (Yet.) But you can describe your building and get an editable 3D model in return. Coming soon from Hypar !

We think text-to-image AI is pretty interesting, so here's text-to-BIM! It won’t “create a museum in the style of Zaha Hadid.” (Yet.) But you can describe your building and get an editable 3D model in return. Coming soon from Hypar !

Hypar

28,752 views • 3 years ago

TurboEdit Instant text-based image editing discuss: We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disentangled controls can be easily achieved in the few-step diffusion model by conditioning on an (automatically generated) detailed text prompt. To manipulate the inverted image, we freeze the noise maps and modify one attribute in the text prompt (either manually or via instruction based editing driven by an LLM), resulting in the generation of a new image similar to the input image with only one attribute changed. It can further control the editing strength and accept instructive text prompt. Our approach facilitates realistic text-guided image edits in real-time, requiring only 8 number of functional evaluations (NFEs) in inversion (one-time cost) and 4 NFEs per edit. Our method is not only fast, but also significantly outperforms state-of-the-art multi-step diffusion editing techniques.

TurboEdit Instant text-based image editing discuss: We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disentangled controls can be easily achieved in the few-step diffusion model by conditioning on an (automatically generated) detailed text prompt. To manipulate the inverted image, we freeze the noise maps and modify one attribute in the text prompt (either manually or via instruction based editing driven by an LLM), resulting in the generation of a new image similar to the input image with only one attribute changed. It can further control the editing strength and accept instructive text prompt. Our approach facilitates realistic text-guided image edits in real-time, requiring only 8 number of functional evaluations (NFEs) in inversion (one-time cost) and 4 NFEs per edit. Our method is not only fast, but also significantly outperforms state-of-the-art multi-step diffusion editing techniques.

AK

16,062 views • 1 year ago

Text is often the hardest part of image generation to get right. MAI-Image-2 improves consistency and legibility for in-image text across infographics, diagrams, and slides — reducing the gap between prompt and output. Try it for yourself.

Text is often the hardest part of image generation to get right. MAI-Image-2 improves consistency and legibility for in-image text across infographics, diagrams, and slides — reducing the gap between prompt and output. Try it for yourself.

Microsoft AI

30,778 views • 2 months ago

👀 Pixel perfect 💎✨ 🖼️ Edify Image from #NVIDIAResearch is a family of diffusion models that supports a wide range of applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360° HDR panorama generation, and finetuning for image customization. 🧵 1/2

👀 Pixel perfect 💎✨ 🖼️ Edify Image from #NVIDIAResearch is a family of diffusion models that supports a wide range of applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360° HDR panorama generation, and finetuning for image customization. 🧵 1/2

NVIDIA AI Developer

14,747 views • 1 year ago

Introducing SDXL Turbo: A real-time text-to-image generation model. SDXL Turbo achieves state-of-the-art performance with a new distillation technology, enabling single-step image generation with unprecedented quality, reducing the required step count from 50 to just one. The code, research paper, and weights for non-commercial use are now available on our website. You can test SDXL Turbo on Stability AI’s image editing platform Clipdrop, with a beta demonstration of the real-time text-to-image generation capabilities. Learn more:

Introducing SDXL Turbo: A real-time text-to-image generation model. SDXL Turbo achieves state-of-the-art performance with a new distillation technology, enabling single-step image generation with unprecedented quality, reducing the required step count from 50 to just one. The code, research paper, and weights for non-commercial use are now available on our website. You can test SDXL Turbo on Stability AI’s image editing platform Clipdrop, with a beta demonstration of the real-time text-to-image generation capabilities. Learn more:

Stability AI

976,344 views • 2 years ago

🤯 OneDiffusion: A versatile, large-scale diffusion model that seamlessly supports bidirectional image synthesis and understanding across diverse tasks. ✅ Text to Image ✅ Image to Depth ✅ Image to Segmentation ✅ Image to Pose ✅ FaceID ✅ Image to Multiview How to use & more👇

🤯 OneDiffusion: A versatile, large-scale diffusion model that seamlessly supports bidirectional image synthesis and understanding across diverse tasks. ✅ Text to Image ✅ Image to Depth ✅ Image to Segmentation ✅ Image to Pose ✅ FaceID ✅ Image to Multiview How to use & more👇

Gradio

11,820 views • 1 year ago

Using the new WordPress Command Palette to call an assistant that adds LLM generated text to a 3D world using natural language commands! "add text: write a short poem about the metaverse" This extends to image, audio and 3D objects in the future. WebXR holodeck style editing!

Using the new WordPress Command Palette to call an assistant that adds LLM generated text to a 3D world using natural language commands! "add text: write a short poem about the metaverse" This extends to image, audio and 3D objects in the future. WebXR holodeck style editing!

XR Publisher

17,891 views • 2 years ago

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

Tencent Hy

89,257 views • 10 months ago

Higgsfield Mod for Minecraft is live. > prompt any building or city, even the Statue of Liberty > create paintings with text-to-image > snap a view and restyle it with image-to-image > make videos from a prompt with text-to-video. > animate in-game photos with image-to-video

Higgsfield Mod for Minecraft is live. > prompt any building or city, even the Statue of Liberty > create paintings with text-to-image > snap a view and restyle it with image-to-image > make videos from a prompt with text-to-video. > animate in-game photos with image-to-video

Higgsfield AI 🧩

454,704 views • 1 month ago