Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Deep Dive Video: Complex image editing used to take hours — now Google's Gemini 2.0 turns advanced ComfyUI & Photoshop workflows into simple text prompts. Here's exactly how to try it (completely free). Chapters: 00:00 Conversational Editing with Google's Multimodal AI 00:53 Image Generation w/ LLM World Knowledge 02:12... Easy Image Editing & Colorization 02:46 Advanced Conversational Edits (Chaining Prompts Together) 03:37 Long Text Generation (Google Beats OpenAI To The Punch) 04:25 Making Spicy Memes (Google AI Studio Safety Settings) 05:48 Advanced Prompting (One Shot ComfyUI Workflows) 07:19 Re-posing Characters (While Keeping Likeness Intact) 08:27 Spatial 3D Understanding (NO ControlNet) 10:42 Semantic Editing & In/Out Painting 13:46 Sprite Sheets & Animation Keyframes 14:40 Using Gemini To Build Image Editing Apps 16:37 Making Videos w/ Conversational Editingshow more

Bilawal Sidhu

66,504 subscribers

34,755 views • 1 year ago •via X (Twitter)

Science & Technology Arts

Anya Rossi• Live Now

Private livecam show

16 Comments

Bilawal Sidhu1 year ago

For those who prefer YT (w/ chapters):

Boxem1 year ago

It's simple. The faster your Amazon business is, the more money you make And Boxem makes your shipping faster than ever & our custom 2D barcodes have led to faster check-in times Get a free trial today:

TacticalRNDR ⭕️1 year ago

Keep up the great content. You are my most valued follow this year.

Bilawal Sidhu1 year ago

Appreciate it!

Bilal1 year ago

Love it! Thanks for featuring Hacky Experiments! 🙏

Bilawal Sidhu1 year ago

My pleasure! Keep hacking, and lean into some wildness — the failure cases were almost more fun that the utilitarian ones lol

John Nack1 year ago

Nice, I look forward to checking it out! Meanwhile, in case you and @oliver_wang2 don’t yet know one another, let’s fix that. 😌

Bilawal Sidhu1 year ago

@oliver_wang2 Thanks dude. We’re mutuals on X but we should def chat sometime Oliver!

VentureMind AI1 year ago

Thanks for this breakdown!

Neville Medhora1 year ago

Sweet!

Dexter | FeelDesign AI, Comfy UI, Interior Design1 year ago

how to show all the x accounts you mentioned in the videos?

Bilawal Sidhu1 year ago

Check out the video on YouTube — links to the x posts are in the description:

A T Wilkinson1 year ago

I’ve noticed the output quality to not be ideal, so a few other things would have to happen in post to fix this unless Google begins to natively output hq images. They are able in their other models but this one is not based on Imagen 3, or so it has told me.

BowtiedWhitebat + Read Pinned Tweet or NGMI1 year ago

bilaw imagine just WHAT DEY HAVE HIDDEN

Bilawal Sidhu1 year ago

Dude I bet there’s some really advanced tech in a few narrow domains but I legit think as far as gen ai goes we’re all on the same roller coaster together

Bill Platt1 year ago

Thank you for this @bilawalsidhu !!

Related Videos

Watch this Gemini 2.5 Flash Image (aka Nano Banana 🍌) tutorial from Google DeepMind DevRel Engineer Patrick Loeber, and start integrating the model into your apps. Key moments: 00:00 Introduction 00:32 AI Studio 01:25 Project Setup 03:16 Image creation 05:47 Image editing 06:58 Multiple input images 07:35 Photo restoration 08:12 Conversational image editing 09:32 Best practices and effective prompting

Watch this Gemini 2.5 Flash Image (aka Nano Banana 🍌) tutorial from Google DeepMind DevRel Engineer Patrick Loeber, and start integrating the model into your apps. Key moments: 00:00 Introduction 00:32 AI Studio 01:25 Project Setup 03:16 Image creation 05:47 Image editing 06:58 Multiple input images 07:35 Photo restoration 08:12 Conversational image editing 09:32 Best practices and effective prompting

Google AI Developers

75,621 views • 10 months ago

Tutorial + Breakdown, as promised. 00:00 Intro 05:00 Referencing 08:40 Scripting 12:30 Voice-Over generation 17:25 Storyboarding 22:30 Prompt-to-Image Generation 27:42 Image-to-Video Generation 32:29 Editing & Sound Design 40:08 Recap

Tutorial + Breakdown, as promised. 00:00 Intro 05:00 Referencing 08:40 Scripting 12:30 Voice-Over generation 17:25 Storyboarding 22:30 Prompt-to-Image Generation 27:42 Image-to-Video Generation 32:29 Editing & Sound Design 40:08 Recap

MIDΞ (❖,❖)

194,453 views • 3 months ago

Introducing Nano Banana Pro (Gemini 3 Pro Image), our new state-of-the-art image generation and editing model from Google DeepMind. It improves on the original model while adding new advanced capabilities, enhanced world knowledge and text rendering, allowing you to create and edit studio-quality, production-ready visuals.

Introducing Nano Banana Pro (Gemini 3 Pro Image), our new state-of-the-art image generation and editing model from Google DeepMind. It improves on the original model while adding new advanced capabilities, enhanced world knowledge and text rendering, allowing you to create and edit studio-quality, production-ready visuals.

Google

1,897,124 views • 7 months ago

Grok Imagine is now LIVE in Hedra DAY 0 ! 🚀 Stunning photorealistic images & videos. Lightning-fast generation Easy edits for elements, styles & more Text to Image. Image Editing. Text to Video. Image to Video All inside Hedra. Try it now.

Grok Imagine is now LIVE in Hedra DAY 0 ! 🚀 Stunning photorealistic images & videos. Lightning-fast generation Easy edits for elements, styles & more Text to Image. Image Editing. Text to Video. Image to Video All inside Hedra. Try it now.

Hedra

749,171 views • 5 months ago

Bytedance drops an open-source Gemini Omni!!! Bernini is a new AI video generation + editing framework. > Edit videos with text prompts > Image/video references > Code available

Bytedance drops an open-source Gemini Omni!!! Bernini is a new AI video generation + editing framework. > Edit videos with text prompts > Image/video references > Code available

⚡AI Search⚡

43,567 views • 1 month ago

FLUX [dev] Kontext is insanely modular for image editing. We found a Reddit goldmine of prompts that unlock its full power that we had to share. Here's how to crush AI image generation with Glif. 00:00 Introduction 00:10 Goldmine of Kontext Prompts 00:50 Remix a Flux Kontext [dev] Workflow 01:30 How to prompt engineer inside of Glif 01:50 ComfyUI inside of Glif 02:30 Workflow Execution and Test Result

FLUX [dev] Kontext is insanely modular for image editing. We found a Reddit goldmine of prompts that unlock its full power that we had to share. Here's how to crush AI image generation with Glif. 00:00 Introduction 00:10 Goldmine of Kontext Prompts 00:50 Remix a Flux Kontext [dev] Workflow 01:30 How to prompt engineer inside of Glif 01:50 ComfyUI inside of Glif 02:30 Workflow Execution and Test Result

GLIF

24,132 views • 1 year ago

1. Precise Edits Made Easy Image editing has never been simpler. No need to type out prompts, just talk to the AI and describe exactly how you want the image edited.

1. Precise Edits Made Easy Image editing has never been simpler. No need to type out prompts, just talk to the AI and describe exactly how you want the image edited.

el.cine

98,422 views • 1 year ago

Qwen-Image-Edit is out in anycoder for image editing in your vibe coded apps Built on 20B Qwen-Image, it brings precise bilingual text editing (Chinese & English) while preserving style, and supports both semantic and appearance-level editing.

Qwen-Image-Edit is out in anycoder for image editing in your vibe coded apps Built on 20B Qwen-Image, it brings precise bilingual text editing (Chinese & English) while preserving style, and supports both semantic and appearance-level editing.

AK

119,012 views • 10 months ago

Image editing in the Google Gemini just got a major upgrade.

Image editing in the Google Gemini just got a major upgrade.

Google

151,271 views • 10 months ago

Grok Imagine API just released A world-class video generation + video editing model Text-to-Video: Turn simple prompts into rich video clips with audio Image Generation + Editing: Bring ideas to life with visuals from scratch Video Editing Tools: Restyle scenes, add/remove props, control motion Best-in-Class Quality + Low Latency: Designed to deliver fast, cost-efficient results API pricing: Image input: $0.002 Video input : $0.01 Video output : $0.05

Grok Imagine API just released A world-class video generation + video editing model Text-to-Video: Turn simple prompts into rich video clips with audio Image Generation + Editing: Bring ideas to life with visuals from scratch Video Editing Tools: Restyle scenes, add/remove props, control motion Best-in-Class Quality + Low Latency: Designed to deliver fast, cost-efficient results API pricing: Image input: $0.002 Video input : $0.01 Video output : $0.05

X Freeze

15,078 views • 5 months ago

introducing Qwen Edit. this new models offers incredible image editing capabilities from text prompts. try it now for free!

introducing Qwen Edit. this new models offers incredible image editing capabilities from text prompts. try it now for free!

Krea

76,683 views • 10 months ago

Step into the future of AI image generation with Qwen-Image! From superior text rendering to consistent image editing across multiple languages, it sets a new benchmark! 💡What will you create with Qwen-Image?

Step into the future of AI image generation with Qwen-Image! From superior text rendering to consistent image editing across multiple languages, it sets a new benchmark! 💡What will you create with Qwen-Image?

Alibaba Group

203,475 views • 11 months ago

🚀 Excited to introduce Qwen-Image-Edit! Built on 20B Qwen-Image, it brings precise bilingual text editing (Chinese & English) while preserving style, and supports both semantic and appearance-level editing. ✨ Key Features ✅ Accurate text editing with bilingual support ✅ High-level semantic editing (e.g. object rotation, IP creation) ✅ Low-level appearance editing (e.g. addition/delete/insert) Try it now: Hugging Face: ModelScope: Blog: Github: API:

🚀 Excited to introduce Qwen-Image-Edit! Built on 20B Qwen-Image, it brings precise bilingual text editing (Chinese & English) while preserving style, and supports both semantic and appearance-level editing. ✨ Key Features ✅ Accurate text editing with bilingual support ✅ High-level semantic editing (e.g. object rotation, IP creation) ✅ Low-level appearance editing (e.g. addition/delete/insert) Try it now: Hugging Face: ModelScope: Blog: Github: API:

Qwen

658,347 views • 10 months ago

InstantDrag Improving Interactivity in Drag-based Image Editing discuss: Drag-based image editing has recently gained popularity for its interactivity and precision. However, despite the ability of text-to-image models to generate samples within a second, drag editing still lags behind due to the challenge of accurately reflecting user interaction while maintaining image content. Some existing approaches rely on computationally intensive per-image optimization or intricate guidance-based methods, requiring additional inputs such as masks for movable regions and text prompts, thereby compromising the interactivity of the editing process. We introduce InstantDrag, an optimization-free pipeline that enhances interactivity and speed, requiring only an image and a drag instruction as input. InstantDrag consists of two carefully designed networks: a drag-conditioned optical flow generator (FlowGen) and an optical flow-conditioned diffusion model (FlowDiffusion). InstantDrag learns motion dynamics for drag-based image editing in real-world video datasets by decomposing the task into motion generation and motion-conditioned image generation. We demonstrate InstantDrag's capability to perform fast, photo-realistic edits without masks or text prompts through experiments on facial video datasets and general scenes. These results highlight the efficiency of our approach in handling drag-based image editing, making it a promising solution for interactive, real-time applications.

InstantDrag Improving Interactivity in Drag-based Image Editing discuss: Drag-based image editing has recently gained popularity for its interactivity and precision. However, despite the ability of text-to-image models to generate samples within a second, drag editing still lags behind due to the challenge of accurately reflecting user interaction while maintaining image content. Some existing approaches rely on computationally intensive per-image optimization or intricate guidance-based methods, requiring additional inputs such as masks for movable regions and text prompts, thereby compromising the interactivity of the editing process. We introduce InstantDrag, an optimization-free pipeline that enhances interactivity and speed, requiring only an image and a drag instruction as input. InstantDrag consists of two carefully designed networks: a drag-conditioned optical flow generator (FlowGen) and an optical flow-conditioned diffusion model (FlowDiffusion). InstantDrag learns motion dynamics for drag-based image editing in real-world video datasets by decomposing the task into motion generation and motion-conditioned image generation. We demonstrate InstantDrag's capability to perform fast, photo-realistic edits without masks or text prompts through experiments on facial video datasets and general scenes. These results highlight the efficiency of our approach in handling drag-based image editing, making it a promising solution for interactive, real-time applications.

AK

71,232 views • 1 year ago

Seedance 2.0 - Advanced Workflows Series 10. Fixing AI Artifacts with Video Editing Use the multi-reference Vid2Vid capabilities of Seedance 2.0 to fix small defects and artifacts in AI-generated shots. Take a screenshot of the video frame containing the defect, upload it to GPT Image 2, and modify the image so the defect disappears. Flawed Video + Corrected Image: Seedance 2.0 will regenerate the video without the defect. GPT Image 2 and Seedance 2.0 are now available on insMind (link at the end of the thread). Workflow + Prompts 👇

Seedance 2.0 - Advanced Workflows Series 10. Fixing AI Artifacts with Video Editing Use the multi-reference Vid2Vid capabilities of Seedance 2.0 to fix small defects and artifacts in AI-generated shots. Take a screenshot of the video frame containing the defect, upload it to GPT Image 2, and modify the image so the defect disappears. Flawed Video + Corrected Image: Seedance 2.0 will regenerate the video without the defect. GPT Image 2 and Seedance 2.0 are now available on insMind (link at the end of the thread). Workflow + Prompts 👇

VoxelPlot

22,566 views • 2 months ago

GoogleAI just released "Muse", a text-to-image generation/editing model via Masked Generative Transformers: - Achieves new SOTA - Zero-shot, Mask-free editing - Zero-shot Inpainting/Outpainting - 900M params 📄 Paper: ⚙️ Project:

GoogleAI just released "Muse", a text-to-image generation/editing model via Masked Generative Transformers: - Achieves new SOTA - Zero-shot, Mask-free editing - Zero-shot Inpainting/Outpainting - 900M params 📄 Paper: ⚙️ Project:

Lior Alexander

630,453 views • 3 years ago

Wan2.1-VACE 14B & 1.3B are now natively supported in ComfyUI! This model from Wan brings all-in-one editing capability to your video generation: 🔹Text-to-Video & Image-to-Video 🔹 Video-to-video (Pose & depth control) 🔹 Inpainting & Outpainting 🔹 Character + object reference

Wan2.1-VACE 14B & 1.3B are now natively supported in ComfyUI! This model from Wan brings all-in-one editing capability to your video generation: 🔹Text-to-Video & Image-to-Video 🔹 Video-to-video (Pose & depth control) 🔹 Inpainting & Outpainting 🔹 Character + object reference

ComfyUI

21,517 views • 1 year ago

Today we launched Nano Banana Pro (Gemini 3 Pro Image). This state-of-the-art image generation and editing model turns your vision into functional reality with unprecedented control, improved text rendering and factual accuracy.

Today we launched Nano Banana Pro (Gemini 3 Pro Image). This state-of-the-art image generation and editing model turns your vision into functional reality with unprecedented control, improved text rendering and factual accuracy.

News from Google

215,740 views • 7 months ago