Загрузка видео...

Не удалось загрузить видео

На главную

Testing LCM LORAs in an AnimateDiff & multi-controlnet workflow in ComfyUI. I was able to process this entire Black Pink music video as a single .mp4 input. The LCM lets me render at 6 steps (vs 20+) on my 4090 and uses up only 10.5 GB of VRAM. Here's...

182,419 просмотров • 2 лет назад •via X (Twitter)

Комментарии: 10

Фото профиля CoffeeVectors
CoffeeVectors2 лет назад

Entire thing took 81 minutes to render 2,467 frames, so about 2 seconds per frame. This isn't including the time to extract the img sequence from video and gen the ControlNet maps. Used Zoe Depth and Canny ControlNets in SD 1.5 at 910 x 512. [2/11]

Фото профиля CoffeeVectors
CoffeeVectors2 лет назад

Improving the output to give it a stronger style, more details & feel less rotoscope-ish, will require adjusting individual shots. But doing the entire video in one go lays down a rough draft for you to iterate on—build on fun surprises, troubleshoot problem areas. [3/11]

Фото профиля CoffeeVectors
CoffeeVectors2 лет назад

For the input video I used every other frame in order to target 12 fps. [4/11]

Фото профиля CoffeeVectors
CoffeeVectors2 лет назад

Here's a screen shot of how I added the LCM LORA. I went with the baked in VAE from the checkpoint. [5/11]

Фото профиля CoffeeVectors
CoffeeVectors2 лет назад

Kept the prompt pretty generic to see how it would apply to all the various shots. [6/11]

Фото профиля CoffeeVectors
CoffeeVectors2 лет назад

In the K Sampler, I used the LCM Sampler. You need to update to the latest version of ComfyUI to access it. [7/11]

Фото профиля CoffeeVectors
CoffeeVectors2 лет назад

And here's how I arranged the nodes for multi-control net. [8/11]

Фото профиля CoffeeVectors
CoffeeVectors2 лет назад

If you want to learn more about LCM LORAs, I mainly referred to @NerdyRodent’s tutorial. Go check it out! It speeds up all rendering in SD. It's not just for videos! [9/11]

Фото профиля CoffeeVectors
CoffeeVectors2 лет назад

If you want to learn more about Animate Diff, go check @PurzBeats’ live stream videos! [10/11]

Фото профиля CoffeeVectors
CoffeeVectors2 лет назад

Lastly, shout out to @rainisto for giving me the idea to try this on a full music video, and @PurzBeats again for answering some of my questions about AnimateDiff! [11/11]

Похожие видео

This is probably the most complex workflow I’ve ever built, only with open-source tools. It took my 4 days. It takes four inputs: author, title, and style; and generates a full visual animated story in one click in ComfyUI . I worked on it for four days. There are still some bugs, but here’s the first preview. Here’s a quick breakdown: - The four inputs are sent to LLMs with precise instructions to generate: first, prompts for images and image modifications; second, prompts for animations; third, prompts for generating music. - All voices are generated from the text and timed precisely, as they determine the length of each animation segment. - The first image and video are generated to serve as the title, but also as the guide for all other images created for the video. - Titles and subtitles are also added automatically in Comfy. - I also developed a lot of custom nodes for minor frame calculations, mostly to match audio and video. - The full system is a large loop that, for each line of text, generates an image and then a video from that image. The loop was the hardest part to build in this workflow, so it can process either a 20-second video or a 2-minute video with the same input. - There are multiple combinations of LLMs that try to understand the text in the best way to provide the best prompts for images and video. - The final video is assembled entirely within ComfyUI. - The music is generated based on the LLM output and matches the exact timing of the full animation. - Done! For reference, this workflow uses a lot of models and only works on an RTX 6000 Pro with plenty of RAM. My goal is not to replace humans, as I’ll try to explain later, this workflow is highly controlled and can be adapted or reworked at any point by real artists! My aim was to create a tool that can animate text in one go, allowing the AI some freedom while keeping a strict flow. I don’t know yet how I’ll share this workflow with people, I still need to polish it properly, but maybe through Patreon. Anyway, I hope you enjoy my research, and let’s always keep pushing further! :)

Lovis Odin

58,571 просмотров • 9 месяцев назад