Loading video...

Video Failed to Load

Go Home

As I promised yesterday, I'll briefly explain LoRA training and share a workflow I made so you can do it quickly. First, let me answer a very common question: 'Why train LoRAs when we have such advanced models?' Even though we have incredibly advanced models now (like NBP), we...

15,133 views • 5 months ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

📢📢 𝐀𝐯𝐚𝐭𝟑𝐫 📢📢 Avat3r creates high-quality 3D head avatars from just a few input images in a single forward pass with a new dynamic 3DGS reconstruction model. Video: Project: Our core idea is to make Gaussian Reconstruction Models animatable. We find that a simple cross-attention to an expression code sequence is already sufficient to model complex facial expressions. We then incorporate position maps from DUSt3R and feature maps from Sapiens to facilitate the prediction task. While DUSt3R's position maps act as a pixel-aligned initialization for the Gaussians' positions, the Sapiens feature maps help the cross-view transformer to match corresponding image tokens in the 4 input images. One major challenge in creating a 3D head avatar from smartphone images comes from inconsistent facial expressions when the subject could not remain perfectly static during the capture. We eliminate this static requirement by simply showing our model input images with different facial expressions during training. This technique makes our model robust to inconsistent input images later on. Finally, we show that despite the model has been trained with 4 input images, one can even create a 3D head avatar when only a single image is available. To achieve this, we employ a pre-trained 3D GAN to lift the single image to 3D and then render the 4 input images for our model. This allows us to create 3D head avatars from single images and even highly out-of-distribution examples like AI generated faces, paintings or statues. Great work by Tobias Kirschstein from his internship at Meta with Javier Romero, Artem Sevastopolsky, and Shunsuke Saito

Matthias Niessner

74,698 views • 1 year ago

All these demo videos make HEAD SWAPPING with Nano Banana look so easy, but then you give it a try and you're like... uh... what? Why didn't that work? Here's what I've found. Nano Banana reads your image, almost literally, so if you write on the image, it reads the text. This is how Higgsfield AI 🧩 has capitalized on the tech: "Write on the image" and give it direction, right? Totally true, but you don't need Higgi to write on your image. Nano Banana will understand your direction regardless of where you write on your image. On one hand, Higgi is really smart, because they're hranessing the tech in a unique way, but the whole "Higgsfield's Banana Placement" is a bit of a misnomer. It's more of a "Banana Placement" and Higgi is just giving you a sort of basic Photoshop-type tool to work with (again, pretty smart), but the real tech is the Banana. 🍌 This is how I head swapped heads in Runway, but Nano Banana maintains the aesthetic qualities of your image almost perfectly, whereas Runway Reference spits out a very Gen-4 looking image. I like using Nano in Freepik (now Magnific), mainly because it's fast and I can get 4 gens at a time, and you need to gen a dozen times of so before you get a winner (most of the time). I was pumped when I saw Freepik introduce the @ reference feature, just like Runway has, but it doesn't seem to work for head swapping. My guess is because that's not really how Nano Banana tech works... ideally. Marco is the person I saw using this "A" and "B" method, back when Nano was on LM Arena, and man-oh-man, it just works... like a charm. You need experiment with how much of the face you blot out, and the angle and facial expression of your new head if you want the blend to be perfect. All of the results in this video are 100% Nano Banana. I did not do any Photoshop work to the images after the fact. I really hope this helps. Let me know if you have any questions. I'm happy to help. And I'll keep posting videos like this if you guys find them useful. Let me know! And if you want more serious, one-on-one AI consultation you can throw something on the books here:

Jordan Daniel Chesney

61,803 views • 9 months ago

This is probably the most complex workflow I’ve ever built, only with open-source tools. It took my 4 days. It takes four inputs: author, title, and style; and generates a full visual animated story in one click in ComfyUI . I worked on it for four days. There are still some bugs, but here’s the first preview. Here’s a quick breakdown: - The four inputs are sent to LLMs with precise instructions to generate: first, prompts for images and image modifications; second, prompts for animations; third, prompts for generating music. - All voices are generated from the text and timed precisely, as they determine the length of each animation segment. - The first image and video are generated to serve as the title, but also as the guide for all other images created for the video. - Titles and subtitles are also added automatically in Comfy. - I also developed a lot of custom nodes for minor frame calculations, mostly to match audio and video. - The full system is a large loop that, for each line of text, generates an image and then a video from that image. The loop was the hardest part to build in this workflow, so it can process either a 20-second video or a 2-minute video with the same input. - There are multiple combinations of LLMs that try to understand the text in the best way to provide the best prompts for images and video. - The final video is assembled entirely within ComfyUI. - The music is generated based on the LLM output and matches the exact timing of the full animation. - Done! For reference, this workflow uses a lot of models and only works on an RTX 6000 Pro with plenty of RAM. My goal is not to replace humans, as I’ll try to explain later, this workflow is highly controlled and can be adapted or reworked at any point by real artists! My aim was to create a tool that can animate text in one go, allowing the AI some freedom while keeping a strict flow. I don’t know yet how I’ll share this workflow with people, I still need to polish it properly, but maybe through Patreon. Anyway, I hope you enjoy my research, and let’s always keep pushing further! :)

Lovis Odin

58,571 views • 9 months ago