Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Collaborative Video Diffusion Consistent Multi-video Generation with Camera Control Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation

AK

508,646 subscribers

29,278 Aufrufe • vor 2 Jahren •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

6 Kommentare

Profilbild von AK

AKvor 2 Jahren

process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories make strides towards it. Yet, it remains challenging to generate a video of the same scene from multiple different camera trajectories.

Profilbild von AK

AKvor 2 Jahren

Solutions to this multi-video generation problem could enable large-scale 3D scene generation with editable camera trajectories, among other applications. We introduce collaborative video diffusion (CVD) as an important step towards this vision. The CVD framework includes a

Profilbild von AK

AKvor 2 Jahren

novel cross-video synchronization module that promotes consistency between corresponding frames of the same video rendered from different camera poses using an epipolar attention mechanism. Trained on top of a state-of-the-art camera-control module for video generation,

Profilbild von AK

AKvor 2 Jahren

CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines, as shown in extensive experiments.

Profilbild von AK

AKvor 2 Jahren

paper page:

Profilbild von AK

AKvor 2 Jahren

daily papers:

Ähnliche Videos

BlockVid Block Diffusion for High-Quality and Consistent Minute-Long Video Generation

BlockVid Block Diffusion for High-Quality and Consistent Minute-Long Video Generation

AK

19,949 Aufrufe • vor 7 Monaten

Nvidia just dropped GEN3C 3D-Informed World-Consistent Video Generation with Precise Camera Control

Nvidia just dropped GEN3C 3D-Informed World-Consistent Video Generation with Precise Camera Control

AK

58,157 Aufrufe • vor 1 Jahr

🔥🔥We propose #VideoBooth to enable **customized video generation** with image prompts, which provide more accurate and direct content control beyond the text prompts. - Project: - Code: - Video:

🔥🔥We propose #VideoBooth to enable customized video generation with image prompts, which provide more accurate and direct content control beyond the text prompts. - Project: - Code: - Video:

Ziwei Liu

26,329 Aufrufe • vor 2 Jahren

V3D Video Diffusion Models are Effective 3D Generators Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency

V3D Video Diffusion Models are Effective 3D Generators Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency

AK

31,997 Aufrufe • vor 2 Jahren

World meet #emuvideo For the past year, our team has been pushing on video generation. The result? Emu Video that generates high quality videos from text or images. SOTA performance vs. commercial products and academic papers. Check it out

World meet #emuvideo For the past year, our team has been pushing on video generation. The result? Emu Video that generates high quality videos from text or images. SOTA performance vs. commercial products and academic papers. Check it out

Ishan Misra

132,130 Aufrufe • vor 2 Jahren

Grok Imagine API just released A world-class video generation + video editing model Text-to-Video: Turn simple prompts into rich video clips with audio Image Generation + Editing: Bring ideas to life with visuals from scratch Video Editing Tools: Restyle scenes, add/remove props, control motion Best-in-Class Quality + Low Latency: Designed to deliver fast, cost-efficient results API pricing: Image input: $0.002 Video input : $0.01 Video output : $0.05

Grok Imagine API just released A world-class video generation + video editing model Text-to-Video: Turn simple prompts into rich video clips with audio Image Generation + Editing: Bring ideas to life with visuals from scratch Video Editing Tools: Restyle scenes, add/remove props, control motion Best-in-Class Quality + Low Latency: Designed to deliver fast, cost-efficient results API pricing: Image input: $0.002 Video input : $0.01 Video output : $0.05

X Freeze

15,078 Aufrufe • vor 5 Monaten

(1/2) LightIt: Illumination Modeling and Control for Diffusion Models! #CVPR2024 We facilitate lighting control for novel image generation from text prompts. We can also edit lighting for a given input image. Video: Project:

(1/2) LightIt: Illumination Modeling and Control for Diffusion Models! #CVPR2024 We facilitate lighting control for novel image generation from text prompts. We can also edit lighting for a given input image. Video: Project:

Matthias Niessner

19,849 Aufrufe • vor 2 Jahren

Make Pixels Dance: High-Dynamic Video Generation paper page: Creating high-dynamic videos such as motion-rich actions and sophisticated visual effects poses a significant challenge in the field of artificial intelligence. Unfortunately, current state-of-the-art video generation methods, primarily focusing on text-to-video generation, tend to produce video clips with minimal motions despite maintaining high fidelity. We argue that relying solely on text instructions is insufficient and suboptimal for video generation. In this paper, we introduce PixelDance, a novel approach based on diffusion models that incorporates image instructions for both the first and last frames in conjunction with text instructions for video generation. Comprehensive experimental results demonstrate that PixelDance trained with public data exhibits significantly better proficiency in synthesizing videos with complex scenes and intricate motions, setting a new standard for video generation.

Make Pixels Dance: High-Dynamic Video Generation paper page: Creating high-dynamic videos such as motion-rich actions and sophisticated visual effects poses a significant challenge in the field of artificial intelligence. Unfortunately, current state-of-the-art video generation methods, primarily focusing on text-to-video generation, tend to produce video clips with minimal motions despite maintaining high fidelity. We argue that relying solely on text instructions is insufficient and suboptimal for video generation. In this paper, we introduce PixelDance, a novel approach based on diffusion models that incorporates image instructions for both the first and last frames in conjunction with text instructions for video generation. Comprehensive experimental results demonstrate that PixelDance trained with public data exhibits significantly better proficiency in synthesizing videos with complex scenes and intricate motions, setting a new standard for video generation.

AK

102,444 Aufrufe • vor 2 Jahren

Today we’re sharing two new advances in our generative AI research: Emu Video & Emu Edit. Details ➡️ These new models deliver exciting results in high quality, diffusion-based text-to-video generation & controlled image editing w/ text instructions. 🧵

Today we’re sharing two new advances in our generative AI research: Emu Video & Emu Edit. Details ➡️ These new models deliver exciting results in high quality, diffusion-based text-to-video generation & controlled image editing w/ text instructions. 🧵

AI at Meta

798,246 Aufrufe • vor 2 Jahren

High quality AI generated talking heads are coming! GAIA can generate talking avatars from a single portrait image and speech clip. It even supports text prompts like `sad`, `open mouth` or `surprise` to guide video generation. Crazy times ahead 🤯

High quality AI generated talking heads are coming! GAIA can generate talking avatars from a single portrait image and speech clip. It even supports text prompts like `sad`, `open mouth` or `surprise` to guide video generation. Crazy times ahead 🤯

Dreaming Tulpa 🥓👑

660,019 Aufrufe • vor 2 Jahren

MuseSteamer just leveled up! Our video generation model now supports real-time interactive long-form video generation. It breaks the traditional 10-second limit, creating videos of any length with greater speed and control—enabling users to pause, rewrite storylines, or extend transitions any time during the process.

MuseSteamer just leveled up! Our video generation model now supports real-time interactive long-form video generation. It breaks the traditional 10-second limit, creating videos of any length with greater speed and control—enabling users to pause, rewrite storylines, or extend transitions any time during the process.

Baidu Inc.

142,732 Aufrufe • vor 9 Monaten

AI video generation finally becomes directable. Use real actor motion to drive your characters — and control the camera for cinematic shots. Consistent motion. Consistent framing. Real creative power.

AI video generation finally becomes directable. Use real actor motion to drive your characters — and control the camera for cinematic shots. Consistent motion. Consistent framing. Real creative power.

Kinetix

771,938 Aufrufe • vor 8 Monaten

Video generation is now live on OpenRouter! One API gives you access to the top video models, right alongside your text, images, audio, embeddings, and reranker generations. Watch our human-generated launch video:

Video generation is now live on OpenRouter! One API gives you access to the top video models, right alongside your text, images, audio, embeddings, and reranker generations. Watch our human-generated launch video:

OpenRouter

119,780 Aufrufe • vor 3 Monaten

Self-Forcing++ for minute-scale video generation ByteDance's new method generates high-quality videos up to 4 min 15 sec! It scales diffusion models without long-video teachers or retraining, preserving fidelity and consistency.

Self-Forcing++ for minute-scale video generation ByteDance's new method generates high-quality videos up to 4 min 15 sec! It scales diffusion models without long-video teachers or retraining, preserving fidelity and consistency.

DailyPapers

17,744 Aufrufe • vor 9 Monaten

🚀 Long shot generation in LTXV-13B Now supports up to 60 seconds of video! • Auto-regressive generation (up to 60s) • Standard generation (up to 20s) • Streamable on H100 with low latency • Compatible with control LoRAs (released last week) • Time-varying prompts supported

🚀 Long shot generation in LTXV-13B Now supports up to 60 seconds of video! • Auto-regressive generation (up to 60s) • Standard generation (up to 20s) • Streamable on H100 with low latency • Compatible with control LoRAs (released last week) • Time-varying prompts supported

Yoav HaCohen

14,105 Aufrufe • vor 1 Jahr

StreamDiT Real-Time Streaming Text-to-Video Generation StreamDiT enables real-time text-to-video generation at 16 FPS on a single GPU (H100)

StreamDiT Real-Time Streaming Text-to-Video Generation StreamDiT enables real-time text-to-video generation at 16 FPS on a single GPU (H100)

AK

43,682 Aufrufe • vor 1 Jahr

I'm excited to share our new work Diffusion as Shader (DaS), a versatile controllable video generation method for various tasks: object manipulation, camera control, mesh-to-video, and motion transfer. Project page: Github:

I'm excited to share our new work Diffusion as Shader (DaS), a versatile controllable video generation method for various tasks: object manipulation, camera control, mesh-to-video, and motion transfer. Project page: Github:

Yuan Liu

33,363 Aufrufe • vor 1 Jahr

Introducing varg teams Collaborative mode for video generation. ✦ Create videos with AI agents ✦ Share assets and prompts ✦ Fork and remix any video Every video in varg is code - full end-to-end generated videos Built for teams making ads, social media, and in-app content

Introducing varg teams Collaborative mode for video generation. ✦ Create videos with AI agents ✦ Share assets and prompts ✦ Fork and remix any video Every video in varg is code - full end-to-end generated videos Built for teams making ads, social media, and in-app content

Alex Varga /✦

27,184 Aufrufe • vor 1 Monat

As detailed in the Meta Movie Gen technical report, today we’re open sourcing Movie Gen Bench: two new media generation benchmarks that we hope will help to enable the AI research community to progress work on more capable audio and video generation models. Movie Gen Video Bench is the largest and most comprehensive benchmark ever released for evaluating text-to-video generation. It includes a collection of 1,000+ prompts that cover concepts ranging from detailed human activity to animals, physics, unusual subjects and more — with broad coverage across different motion levels. Movie Gen Audio Bench is a first-of-its-kind benchmark aimed at evaluating video-to-audio and (text+video)-to-audio generation. It includes 527 generated videos and associated sound effects and music prompts covering a diverse set of ambient environments and sound effects. To enable fair and easy comparison to our models for future works, these new benchmarks include non cherry-picked generated videos and audio from Movie Gen. In releasing these new benchmarks we hope to promote fair & extensive evaluations in media generation research to enable greater progress in this field.

As detailed in the Meta Movie Gen technical report, today we’re open sourcing Movie Gen Bench: two new media generation benchmarks that we hope will help to enable the AI research community to progress work on more capable audio and video generation models. Movie Gen Video Bench is the largest and most comprehensive benchmark ever released for evaluating text-to-video generation. It includes a collection of 1,000+ prompts that cover concepts ranging from detailed human activity to animals, physics, unusual subjects and more — with broad coverage across different motion levels. Movie Gen Audio Bench is a first-of-its-kind benchmark aimed at evaluating video-to-audio and (text+video)-to-audio generation. It includes 527 generated videos and associated sound effects and music prompts covering a diverse set of ambient environments and sound effects. To enable fair and easy comparison to our models for future works, these new benchmarks include non cherry-picked generated videos and audio from Movie Gen. In releasing these new benchmarks we hope to promote fair & extensive evaluations in media generation research to enable greater progress in this field.

AI at Meta

156,273 Aufrufe • vor 1 Jahr