Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Dreamix: Video Diffusion Models are General Video Editors abs: project page: present diffusion-based method that is able to perform text-based motion and appearance editing of general videos

AK

500,969 subscribers

398,132 Aufrufe • vor 3 Jahren •via X (Twitter)

Bildung Gesundheit & Wellness Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Abi

Abivor 3 Jahren

pre: "You won't believe your eyes!" post: "You can't believe your eyes!"

Profilbild von Rekt Adult (🩸,🩸)

Rekt Adult (🩸,🩸)vor 3 Jahren

where/when/how can someone use it? they didn't include the code with the github page

Profilbild von Minute Movies

Minute Moviesvor 3 Jahren

excuse me what

Profilbild von William Lamkin

William Lamkinvor 3 Jahren

Nice find. 3D + Motion and editing tools seem to be the subject of the next wave of papers behind audio-based GenAI models

Profilbild von Stern - uɹǝʇS

Stern - uɹǝʇSvor 3 Jahren

@CoffeeVectors @TomLikesRobots @GanWeaving as soon as models of this fidelity ad coherence arrive publicly …… wow

Profilbild von Jason Murphy

Jason Murphyvor 3 Jahren

Amazing!

Profilbild von Vic J

Vic Jvor 3 Jahren

@OliverLaufer

Profilbild von Bernard Bontemps

Bernard Bontempsvor 3 Jahren

@Francescu

Profilbild von Leo Enin 🍰

Leo Enin 🍰vor 3 Jahren

Absolutely astonishing! Imagine graphic tools 10 years from now. Now imagine video games and movies in 20 years, given the progress we got over the last 20. And I'm not even talking about other industries here. The world is rapidly changing, and it's beautiful.

Profilbild von Guilherme

Guilhermevor 3 Jahren

@BaixaEssaPorra

Ähnliche Videos

Single Motion Diffusion abs: project page:

Single Motion Diffusion abs: project page:

AK

72,767 Aufrufe • vor 3 Jahren

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048 abs: project page:

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048 abs: project page:

AK

718,746 Aufrufe • vor 3 Jahren

We present TeSMo ( ), a text-controlled scene-aware motion generation method based on denoising diffusion models. It’s an exciting collaboration with Justus Thies, Michael Black, Jason Peng, Davis Rempe. (1/7)

We present TeSMo ( ), a text-controlled scene-aware motion generation method based on denoising diffusion models. It’s an exciting collaboration with Justus Thies, Michael Black, Jason Peng, Davis Rempe. (1/7)

Hongwei Yi

45,724 Aufrufe • vor 2 Jahren

AccVideo just dropped on Hugging Face Accelerating Video Diffusion Model with Synthetic Dataset present a efficient distillation method to accelerate video diffusion models with synthetic dataset method is 8.5x faster than HunyuanVideo

AccVideo just dropped on Hugging Face Accelerating Video Diffusion Model with Synthetic Dataset present a efficient distillation method to accelerate video diffusion models with synthetic dataset method is 8.5x faster than HunyuanVideo

AK

20,633 Aufrufe • vor 1 Jahr

🏃Motion Tokenizer Bridging Semantic and Kinematic Conditions🏃‍♀️ #MoTok is a diffusion-based tokenizer that unifies *perception-planning-control*, combining the strengths of continuous diffusion & discrete tokens - Project: - Code:

🏃Motion Tokenizer Bridging Semantic and Kinematic Conditions🏃‍♀️ #MoTok is a diffusion-based tokenizer that unifies perception-planning-control, combining the strengths of continuous diffusion & discrete tokens - Project: - Code:

Ziwei Liu

19,228 Aufrufe • vor 2 Monaten

Generative Novel View Synthesis with 3D-Aware Diffusion Models abs: project page:

Generative Novel View Synthesis with 3D-Aware Diffusion Models abs: project page:

AK

304,708 Aufrufe • vor 3 Jahren

Scaling up GANs for Text-to-Image Synthesis present our 1B-parameter GigaGAN, achieving lower FID than Stable Diffusion v1.5, DALL·E 2, and Parti-750M. It generates 512px outputs at 0.13s, orders of magnitude faster than diffusion and autoregressive models, and inherits the disentangled, continuous, and controllable latent space of GANs abs: project page:

Scaling up GANs for Text-to-Image Synthesis present our 1B-parameter GigaGAN, achieving lower FID than Stable Diffusion v1.5, DALL·E 2, and Parti-750M. It generates 512px outputs at 0.13s, orders of magnitude faster than diffusion and autoregressive models, and inherits the disentangled, continuous, and controllable latent space of GANs abs: project page:

AK

278,115 Aufrufe • vor 3 Jahren

Today we’re sharing two new advances in our generative AI research: Emu Video & Emu Edit. Details ➡️ These new models deliver exciting results in high quality, diffusion-based text-to-video generation & controlled image editing w/ text instructions. 🧵

Today we’re sharing two new advances in our generative AI research: Emu Video & Emu Edit. Details ➡️ These new models deliver exciting results in high quality, diffusion-based text-to-video generation & controlled image editing w/ text instructions. 🧵

AI at Meta

798,183 Aufrufe • vor 2 Jahren

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models Gradio demo is out on Hugging Face Spaces demo:

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models Gradio demo is out on Hugging Face Spaces demo:

AK

87,679 Aufrufe • vor 3 Jahren

I'm excited to share our new work Diffusion as Shader (DaS), a versatile controllable video generation method for various tasks: object manipulation, camera control, mesh-to-video, and motion transfer. Project page: Github:

I'm excited to share our new work Diffusion as Shader (DaS), a versatile controllable video generation method for various tasks: object manipulation, camera control, mesh-to-video, and motion transfer. Project page: Github:

Yuan Liu

33,363 Aufrufe • vor 1 Jahr

I finally released my new video on YouTube about Diffusion Models / Score-Based Generative Models. Literally planned this for a year and put so much work in. I think this approach to diffusion models is so intuitive and highly recommend giving that a go! Video is 38min long, so you will need some time to watch that haha.

I finally released my new video on YouTube about Diffusion Models / Score-Based Generative Models. Literally planned this for a year and put so much work in. I think this approach to diffusion models is so intuitive and highly recommend giving that a go! Video is 38min long, so you will need some time to watch that haha.

dome | Outlier

54,540 Aufrufe • vor 1 Jahr

Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model abs: project page:

Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model abs: project page:

AK

90,072 Aufrufe • vor 3 Jahren

A new frontier. Project Starlight—the first and only diffusion-based AI model for enhancing video. Launching soon.

A new frontier. Project Starlight—the first and only diffusion-based AI model for enhancing video. Launching soon.

Topaz Labs

22,375 Aufrufe • vor 1 Jahr

We present Neural Riemannian Motion Fields (NRMF) as a promising paradigm for generative modeling of articulated motion. Unlike various diffusion based models, NRMF is useful both in generation and in deployment as a prior across tasks: #CVPR #AI #CVML👇

We present Neural Riemannian Motion Fields (NRMF) as a promising paradigm for generative modeling of articulated motion. Unlike various diffusion based models, NRMF is useful both in generation and in deployment as a prior across tasks: #CVPR #AI #CVML👇

Tolga Birdal

16,624 Aufrufe • vor 9 Monaten

InstantDrag Improving Interactivity in Drag-based Image Editing discuss: Drag-based image editing has recently gained popularity for its interactivity and precision. However, despite the ability of text-to-image models to generate samples within a second, drag editing still lags behind due to the challenge of accurately reflecting user interaction while maintaining image content. Some existing approaches rely on computationally intensive per-image optimization or intricate guidance-based methods, requiring additional inputs such as masks for movable regions and text prompts, thereby compromising the interactivity of the editing process. We introduce InstantDrag, an optimization-free pipeline that enhances interactivity and speed, requiring only an image and a drag instruction as input. InstantDrag consists of two carefully designed networks: a drag-conditioned optical flow generator (FlowGen) and an optical flow-conditioned diffusion model (FlowDiffusion). InstantDrag learns motion dynamics for drag-based image editing in real-world video datasets by decomposing the task into motion generation and motion-conditioned image generation. We demonstrate InstantDrag's capability to perform fast, photo-realistic edits without masks or text prompts through experiments on facial video datasets and general scenes. These results highlight the efficiency of our approach in handling drag-based image editing, making it a promising solution for interactive, real-time applications.

InstantDrag Improving Interactivity in Drag-based Image Editing discuss: Drag-based image editing has recently gained popularity for its interactivity and precision. However, despite the ability of text-to-image models to generate samples within a second, drag editing still lags behind due to the challenge of accurately reflecting user interaction while maintaining image content. Some existing approaches rely on computationally intensive per-image optimization or intricate guidance-based methods, requiring additional inputs such as masks for movable regions and text prompts, thereby compromising the interactivity of the editing process. We introduce InstantDrag, an optimization-free pipeline that enhances interactivity and speed, requiring only an image and a drag instruction as input. InstantDrag consists of two carefully designed networks: a drag-conditioned optical flow generator (FlowGen) and an optical flow-conditioned diffusion model (FlowDiffusion). InstantDrag learns motion dynamics for drag-based image editing in real-world video datasets by decomposing the task into motion generation and motion-conditioned image generation. We demonstrate InstantDrag's capability to perform fast, photo-realistic edits without masks or text prompts through experiments on facial video datasets and general scenes. These results highlight the efficiency of our approach in handling drag-based image editing, making it a promising solution for interactive, real-time applications.

AK

71,201 Aufrufe • vor 1 Jahr

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or common sense reasoning. We propose to equip diffusion models with enhanced reasoning capabilities by using off-the-shelf pretrained large language models (LLMs) in a novel two-stage generation process. First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with an image prompt, an LLM outputs a scene layout in the form of bounding boxes along with corresponding individual descriptions. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning. Additionally, our method naturally allows dialog-based scene specification and is able to handle prompts in a language that is not well-supported by the underlying diffusion model.

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or common sense reasoning. We propose to equip diffusion models with enhanced reasoning capabilities by using off-the-shelf pretrained large language models (LLMs) in a novel two-stage generation process. First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with an image prompt, an LLM outputs a scene layout in the form of bounding boxes along with corresponding individual descriptions. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning. Additionally, our method naturally allows dialog-based scene specification and is able to handle prompts in a language that is not well-supported by the underlying diffusion model.

AK

83,657 Aufrufe • vor 2 Jahren

video editing is stuck in 2005, time for something new introducing diffusion the first infinite canvas for video and motion graphics like figma, but for editing

video editing is stuck in 2005, time for something new introducing diffusion the first infinite canvas for video and motion graphics like figma, but for editing

konstantinpaulus

128,691 Aufrufe • vor 1 Monat

Announcing Diffusion Forcing Transformer (DFoT), our new video diffusion algorithm that generates ultra-long videos of 800+ frames. DFoT enables History Guidance, a simple add-on to any existing video diffusion models for a quality boost. Website: (1/7)

Announcing Diffusion Forcing Transformer (DFoT), our new video diffusion algorithm that generates ultra-long videos of 800+ frames. DFoT enables History Guidance, a simple add-on to any existing video diffusion models for a quality boost. Website: (1/7)

Boyuan Chen

175,996 Aufrufe • vor 1 Jahr

Researchers from RAI Institute present Diffuse-CLoC, a new control policy that fuses kinematic motion diffusion models with physics-based control to produce motions that are both physically realistic and precisely controllable. This breakthrough moves us closer to developing generalist policies that enable humanoid robots to perform diverse tasks, including dynamic locomotion and contact-rich manipulation, in a natural-looking and robust way. Learn more at

Researchers from RAI Institute present Diffuse-CLoC, a new control policy that fuses kinematic motion diffusion models with physics-based control to produce motions that are both physically realistic and precisely controllable. This breakthrough moves us closer to developing generalist policies that enable humanoid robots to perform diverse tasks, including dynamic locomotion and contact-rich manipulation, in a natural-looking and robust way. Learn more at

RAI Institute

13,426 Aufrufe • vor 10 Monaten

Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution paper page: Text-based diffusion models have exhibited remarkable success in generation and editing, showing great promise for enhancing visual content with their generative prior. However, applying these models to video super-resolution remains challenging due to the high demands for output fidelity and temporal consistency, which is complicated by the inherent randomness in diffusion models. Our study introduces Upscale-A-Video, a text-guided latent diffusion framework for video upscaling. This framework ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences; globally, without training, a flow-guided recurrent latent propagation module is introduced to enhance overall video stability by propagating and fusing latent across the entire sequences. Thanks to the diffusion paradigm, our model also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation, enabling a trade-off between fidelity and quality. Extensive experiments show that Upscale-A-Video surpasses existing methods in both synthetic and real-world benchmarks, as well as in AI-generated videos, showcasing impressive visual realism and temporal consistency.

Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution paper page: Text-based diffusion models have exhibited remarkable success in generation and editing, showing great promise for enhancing visual content with their generative prior. However, applying these models to video super-resolution remains challenging due to the high demands for output fidelity and temporal consistency, which is complicated by the inherent randomness in diffusion models. Our study introduces Upscale-A-Video, a text-guided latent diffusion framework for video upscaling. This framework ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences; globally, without training, a flow-guided recurrent latent propagation module is introduced to enhance overall video stability by propagating and fusing latent across the entire sequences. Thanks to the diffusion paradigm, our model also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation, enabling a trade-off between fidelity and quality. Extensive experiments show that Upscale-A-Video surpasses existing methods in both synthetic and real-world benchmarks, as well as in AI-generated videos, showcasing impressive visual realism and temporal consistency.

AK

32,840 Aufrufe • vor 2 Jahren