Загрузка видео...

Не удалось загрузить видео

На главную

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model with Gradio demo local demo: This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the...

810,024 просмотров • 2 лет назад •via X (Twitter)

Комментарии: 10

Фото профиля Laurent
Laurent2 лет назад

@Gradio It seems to be a war between MagicAnimate and AnimateAnyone. Who wins?

Фото профиля Riad
Riad2 лет назад

@Gradio That was fast

Фото профиля patsan e/acc
patsan e/acc2 лет назад

@Gradio

Фото профиля Chromox
Chromox2 лет назад

@Gradio MagicAnimate, is now available for free on Chromox. Welcome to have fun. #MagicAnimate

Фото профиля Gas in aja lah
Gas in aja lah2 лет назад

@Gradio There is no single purpose of this technology other than blackmail, porn revenge and exploitation

Фото профиля richard nd
richard nd2 лет назад

@Gradio can someone test ? #MagicAnimate

Фото профиля zast
zast2 лет назад

@Gradio Coming soon in @cocktailpeanut ? 😁

Фото профиля Vlad
Vlad2 лет назад

@Gradio @cocktailpeanut did you see this?

Фото профиля Tritech 🔫
Tritech 🔫2 лет назад

@Gradio Attention Onlyfan hoes, your days are numbered.

Фото профиля 💧🌱🍉The Plautine Chapel🖤🇺🇦
💧🌱🍉The Plautine Chapel🖤🇺🇦2 лет назад

@Gradio The legends were true.

Похожие видео

Google presents Still-Moving Customized Video Generation without Customized Video Data Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its infancy, primarily due to the lack of customized video data. In this work, we introduce Still-Moving, a novel generic framework for customizing a text-to-video (T2V) model, without requiring any customized video data. The framework applies to the prominent T2V design where the video model is built over a text-to-image (T2I) model (e.g., via inflation). We assume access to a customized version of the T2I model, trained only on still image data (e.g., using DreamBooth or StyleDrop). Naively plugging in the weights of the customized T2I model into the T2V model often leads to significant artifacts or insufficient adherence to the customization data. To overcome this issue, we train lightweight Spatial Adapters that adjust the features produced by the injected T2I layers. Importantly, our adapters are trained on "frozen videos" (i.e., repeated images), constructed from image samples generated by the customized T2I model. This training is facilitated by a novel Motion Adapter module, which allows us to train on such static videos while preserving the motion prior of the video model. At test time, we remove the Motion Adapter modules and leave in only the trained Spatial Adapters. This restores the motion prior of the T2V model while adhering to the spatial prior of the customized T2I model. We demonstrate the effectiveness of our approach on diverse tasks including personalized, stylized, and conditional generation. In all evaluated scenarios, our method seamlessly integrates the spatial prior of the customized T2I model with a motion prior supplied by the T2V model.

AK

40,467 просмотров • 1 год назад

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing abs: paper page: present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i.e., rendered from the canonical content field) to each individual frame along the time axis.Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline.We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e.g., the object shape) from the video.With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field.We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training.More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog.

AK

153,241 просмотров • 2 лет назад

InstantDrag Improving Interactivity in Drag-based Image Editing discuss: Drag-based image editing has recently gained popularity for its interactivity and precision. However, despite the ability of text-to-image models to generate samples within a second, drag editing still lags behind due to the challenge of accurately reflecting user interaction while maintaining image content. Some existing approaches rely on computationally intensive per-image optimization or intricate guidance-based methods, requiring additional inputs such as masks for movable regions and text prompts, thereby compromising the interactivity of the editing process. We introduce InstantDrag, an optimization-free pipeline that enhances interactivity and speed, requiring only an image and a drag instruction as input. InstantDrag consists of two carefully designed networks: a drag-conditioned optical flow generator (FlowGen) and an optical flow-conditioned diffusion model (FlowDiffusion). InstantDrag learns motion dynamics for drag-based image editing in real-world video datasets by decomposing the task into motion generation and motion-conditioned image generation. We demonstrate InstantDrag's capability to perform fast, photo-realistic edits without masks or text prompts through experiments on facial video datasets and general scenes. These results highlight the efficiency of our approach in handling drag-based image editing, making it a promising solution for interactive, real-time applications.

AK

71,201 просмотров • 1 год назад

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior paper page: present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.

AK

161,400 просмотров • 2 лет назад