Loading video...

Video Failed to Load

Go Home

Deep Dive Video: Complex image editing used to take hours — now Google's Gemini 2.0 turns advanced ComfyUI & Photoshop workflows into simple text prompts. Here's exactly how to try it (completely free). Chapters: 00:00 Conversational Editing with Google's Multimodal AI 00:53 Image Generation w/ LLM World Knowledge 02:12...

34,755 views • 1 year ago •via X (Twitter)

16 Comments

Bilawal Sidhu's profile picture
Bilawal Sidhu1 year ago

For those who prefer YT (w/ chapters):

Boxem's profile picture
Boxem1 year ago

It's simple. The faster your Amazon business is, the more money you make And Boxem makes your shipping faster than ever & our custom 2D barcodes have led to faster check-in times Get a free trial today:

TacticalRNDR ⭕️'s profile picture
TacticalRNDR ⭕️1 year ago

Keep up the great content. You are my most valued follow this year.

Bilawal Sidhu's profile picture
Bilawal Sidhu1 year ago

Appreciate it!

Bilal's profile picture
Bilal1 year ago

Love it! Thanks for featuring Hacky Experiments! 🙏

Bilawal Sidhu's profile picture
Bilawal Sidhu1 year ago

My pleasure! Keep hacking, and lean into some wildness — the failure cases were almost more fun that the utilitarian ones lol

John Nack's profile picture
John Nack1 year ago

Nice, I look forward to checking it out! Meanwhile, in case you and @oliver_wang2 don’t yet know one another, let’s fix that. 😌

Bilawal Sidhu's profile picture
Bilawal Sidhu1 year ago

@oliver_wang2 Thanks dude. We’re mutuals on X but we should def chat sometime Oliver!

VentureMind AI's profile picture
VentureMind AI1 year ago

Thanks for this breakdown!

Neville Medhora's profile picture
Neville Medhora1 year ago

Sweet!

Dexter | FeelDesign AI, Comfy UI, Interior Design's profile picture
Dexter | FeelDesign AI, Comfy UI, Interior Design1 year ago

how to show all the x accounts you mentioned in the videos?

Bilawal Sidhu's profile picture
Bilawal Sidhu1 year ago

Check out the video on YouTube — links to the x posts are in the description:

A T Wilkinson's profile picture
A T Wilkinson1 year ago

I’ve noticed the output quality to not be ideal, so a few other things would have to happen in post to fix this unless Google begins to natively output hq images. They are able in their other models but this one is not based on Imagen 3, or so it has told me.

BowtiedWhitebat + Read Pinned Tweet or NGMI's profile picture
BowtiedWhitebat + Read Pinned Tweet or NGMI1 year ago

bilaw imagine just WHAT DEY HAVE HIDDEN

Bilawal Sidhu's profile picture
Bilawal Sidhu1 year ago

Dude I bet there’s some really advanced tech in a few narrow domains but I legit think as far as gen ai goes we’re all on the same roller coaster together

Bill Platt's profile picture
Bill Platt1 year ago

Thank you for this @bilawalsidhu !!

Related Videos

InstantDrag Improving Interactivity in Drag-based Image Editing discuss: Drag-based image editing has recently gained popularity for its interactivity and precision. However, despite the ability of text-to-image models to generate samples within a second, drag editing still lags behind due to the challenge of accurately reflecting user interaction while maintaining image content. Some existing approaches rely on computationally intensive per-image optimization or intricate guidance-based methods, requiring additional inputs such as masks for movable regions and text prompts, thereby compromising the interactivity of the editing process. We introduce InstantDrag, an optimization-free pipeline that enhances interactivity and speed, requiring only an image and a drag instruction as input. InstantDrag consists of two carefully designed networks: a drag-conditioned optical flow generator (FlowGen) and an optical flow-conditioned diffusion model (FlowDiffusion). InstantDrag learns motion dynamics for drag-based image editing in real-world video datasets by decomposing the task into motion generation and motion-conditioned image generation. We demonstrate InstantDrag's capability to perform fast, photo-realistic edits without masks or text prompts through experiments on facial video datasets and general scenes. These results highlight the efficiency of our approach in handling drag-based image editing, making it a promising solution for interactive, real-time applications.

AK

71,232 views • 1 year ago