正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

We propose Long Context Tuning (LCT) for scene-level video generation to bridge the gap between current single-shot generation and real-world narrative video productions. Homepage: Report:

Ceyuan Yang

2,177 subscribers

46,813 次观看 • 1 年前 •via X (Twitter)

教育科学技术

Anya Rossi• Live Now

Private livecam show

9 条评论

Ceyuan Yang 的头像

Ceyuan Yang1 年前

The faith that too much inductive bias might compromise scalability guides us to expand context window of attention to multishot. Combining interleaved 3D Rope, asynchronous timesteps and context-causal attention with KV-cache, LCT supports efficient auto-regressive sampling.

Ceyuan Yang 的头像

Ceyuan Yang1 年前

Benefiting from auto-regressive sampling, LCT also enables several emerging model abilities without explicit objectives: interactive generation. For example, we can feed the SoRA-generated video as the start, continue to produce videos, following text prompts.

Ceyuan Yang 的头像

Ceyuan Yang1 年前

Besides, through joint training on SHORT single-shot and LONG multi-shot videos, LCT also enables single shot extension interactively.

Ceyuan Yang 的头像

Ceyuan Yang1 年前

Remarkably, despite no extra explicit training objective, our model enables compositional generation by accepting separate identity and environment images to synthesize coherent videos that integrate these distinct elements.

Ceyuan Yang 的头像

Ceyuan Yang1 年前

Our bidirectional model accepts visual conditions in arbitrary order and location, supporting "scene interpolation" applications. As shown below, given the first and last shots, we can generate intermediate scenes with semantic coherence.

Ceyuan Yang 的头像

Ceyuan Yang1 年前

This is the longest video I've generated so far. So does this thread lol. Many thanks to Yuwei, Ziyan, Zhibei, Zhijie, Zhenheng, Dahua and Lu.

Gan Jing World 的头像

Gan Jing World1 年前

🎥Darkest Before Dawn Limited-Time Free Viewing on GJW+ Belgian climber Siebe Vanhee tackles Yosemite’s Dawn Wall in Darkest Before Dawn, a stunning film blending raw storytelling and cinematic beauty. Award-winning and festival favorite worldwide.

Synthical 的头像

Synthical1 年前

Dark mode for this paper for those who read at night 🌚

ZurdaMierda 的头像

ZurdaMierda1 年前

Cool

相关视频

Glad to share Seaweed-7B, a cost-effective foundation model for video generation. Our tech report highlights the key designs that significantly improve compute efficiency and performance given limited resources, achieving comparable quality against other industry-level models. To unleash the power of the foundation model, Seaweed-7B further enables a wide range of downstream applications including image-to-video generation, human video generation, subject-consistent video generation, video-audio joint generation, long video generation and storytelling, real-time generation, super-resolution generation, camera controlled generation. Check out our webpage and report for more details: Webpage: Paper: It's a wonderful journey of the last year. Thanks to all teammates for their contributions, sincerely.

Glad to share Seaweed-7B, a cost-effective foundation model for video generation. Our tech report highlights the key designs that significantly improve compute efficiency and performance given limited resources, achieving comparable quality against other industry-level models. To unleash the power of the foundation model, Seaweed-7B further enables a wide range of downstream applications including image-to-video generation, human video generation, subject-consistent video generation, video-audio joint generation, long video generation and storytelling, real-time generation, super-resolution generation, camera controlled generation. Check out our webpage and report for more details: Webpage: Paper: It's a wonderful journey of the last year. Thanks to all teammates for their contributions, sincerely.

Ceyuan Yang

77,423 次观看 • 1 年前

Voyager Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation

Voyager Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation

AK

15,840 次观看 • 1 年前

🔥1-min Interactive Video Generation with Multimodal Control🔥 Towards *long-context world model*, #LongVie is an end-to-end autoregressive framework for controllable ultra-long video generation - Page: - Paper: . Thanks AK

🔥1-min Interactive Video Generation with Multimodal Control🔥 Towards long-context world model, #LongVie is an end-to-end autoregressive framework for controllable ultra-long video generation - Page: - Paper: . Thanks AK

Ziwei Liu

14,185 次观看 • 11 个月前

This is a step forward for long-context video tuning, which has already unlocked impressive multi-shot video generation pipelines from . Generating minutes- or even hour-long videos without drifting or forgetting is not "coming soon" -- it is already here. 3/4

This is a step forward for long-context video tuning, which has already unlocked impressive multi-shot video generation pipelines from . Generating minutes- or even hour-long videos without drifting or forgetting is not "coming soon" -- it is already here. 3/4

Gordon Wetzstein

27,473 次观看 • 10 个月前

StreamDiT Real-Time Streaming Text-to-Video Generation StreamDiT enables real-time text-to-video generation at 16 FPS on a single GPU (H100)

StreamDiT Real-Time Streaming Text-to-Video Generation StreamDiT enables real-time text-to-video generation at 16 FPS on a single GPU (H100)

AK

43,659 次观看 • 1 年前

LongLive Real-time Interactive Long Video Generation

LongLive Real-time Interactive Long Video Generation

AK

17,214 次观看 • 9 个月前

Helios Real Real-Time Long Video Generation Model paper:

Helios Real Real-Time Long Video Generation Model paper:

AK

25,951 次观看 • 4 个月前

Long video generation usually results in context increasing/scaling during chunk/frame-wise rollout. Considering context scaling may require context selection, we thus introduce the idea of MoE into long context modelling and propose Mixture of Contexts. All previous context/memory is considered while the chosen ones are computed in a data-driven manner. You can easily enjoy 7x compute savings.

Long video generation usually results in context increasing/scaling during chunk/frame-wise rollout. Considering context scaling may require context selection, we thus introduce the idea of MoE into long context modelling and propose Mixture of Contexts. All previous context/memory is considered while the chosen ones are computed in a data-driven manner. You can easily enjoy 7x compute savings.

Ceyuan Yang

22,141 次观看 • 10 个月前

HoloCine Holistic Generation of Cinematic Multi-Shot Long Video Narratives

HoloCine Holistic Generation of Cinematic Multi-Shot Long Video Narratives

AK

17,434 次观看 • 8 个月前

Introducing The Matrix --- a foundation world model for generating infinite-length, hyper-realistic videos with real-time, frame-level control: - Infinite-length video generation - 720p high-quality rendering - Real-time, frame-level control at 16 FPS - Generalization to real-world video control 🔗Blog: 📄Paper: 💻Code & Playable Demo: Coming soon! Key Innovation: A brand new technique called the shift-window denoise process model, enabling auto-regressive generation for diffusion and consistency models in real-time. Special thanks to project leader Ruili Feng and the entire Matrix team for their dedication and hard work over the year-long project.

Hongyang Zhang

178,322 次观看 • 1 年前

🔥🔥We propose #VideoBooth to enable **customized video generation** with image prompts, which provide more accurate and direct content control beyond the text prompts. - Project: - Code: - Video:

🔥🔥We propose #VideoBooth to enable customized video generation with image prompts, which provide more accurate and direct content control beyond the text prompts. - Project: - Code: - Video:

Ziwei Liu

26,329 次观看 • 2 年前

It’s great to see the gap between reconstruction and generation narrowing so quickly this year

It’s great to see the gap between reconstruction and generation narrowing so quickly this year

Lucky Iyinbor

16,899 次观看 • 1 个月前

Scaling Zero-Shot Reference-to-Video Generation

Scaling Zero-Shot Reference-to-Video Generation

AK

21,507 次观看 • 7 个月前

🚀 We open-sourced LongLive — interactive, real-time long-video generation. 👥Generates video in real time as users enter text prompts. ⚡️20.7 FPS on a single H100,⏱️up to 240s per clip. 🎬Fine-tunes SOTA short-video models (e.g., Wan) into long-video generators. 🌍One step closer to World Models. All code for training & inference, model weights, demo page, and videos released! Paper: Code: Model: Demo Page: Introduction Video:

🚀 We open-sourced LongLive — interactive, real-time long-video generation. 👥Generates video in real time as users enter text prompts. ⚡️20.7 FPS on a single H100,⏱️up to 240s per clip. 🎬Fine-tunes SOTA short-video models (e.g., Wan) into long-video generators. 🌍One step closer to World Models. All code for training & inference, model weights, demo page, and videos released! Paper: Code: Model: Demo Page: Introduction Video:

Yukang Chen

11,835 次观看 • 9 个月前

What if your story didn’t reset every scene? I tested Pai — the first long-form video generation model from Utopai — and built an animated short that flows like a real narrative. Intentional. Cinematic. Continuous. Experience it here: #UtopaiPartners

What if your story didn’t reset every scene? I tested Pai — the first long-form video generation model from Utopai — and built an animated short that flows like a real narrative. Intentional. Cinematic. Continuous. Experience it here: #UtopaiPartners

Nova Reid

10,861 次观看 • 4 个月前

Mixture of Contexts for Long Video Generation

Mixture of Contexts for Long Video Generation

AK

16,313 次观看 • 10 个月前

I wanted storytelling — not just output. So I tried Pai by Utopai — the first long-form video generation model built for narrative continuity. I created an original animated short where the story actually flows from scene to scene. See it here: #UtopaiPartners

I wanted storytelling — not just output. So I tried Pai by Utopai — the first long-form video generation model built for narrative continuity. I created an original animated short where the story actually flows from scene to scene. See it here: #UtopaiPartners

Erina | AI Tools & News

11,511 次观看 • 4 个月前

Long video generation is a systems problem. Introducing LongLive-2.0 from NVIDIA Research: an end-to-end NVFP4 training and inference system for long video generation. Low-precision deployment often relies on post-training quantization, creating a gap between how models are trained and how they run. LongLive-2.0 aligns NVFP4-aware training, distillation, and W4A4 inference, maintaining strong benchmark quality while improving speed and memory efficiency.

Long video generation is a systems problem. Introducing LongLive-2.0 from NVIDIA Research: an end-to-end NVFP4 training and inference system for long video generation. Low-precision deployment often relies on post-training quantization, creating a gap between how models are trained and how they run. LongLive-2.0 aligns NVFP4-aware training, distillation, and W4A4 inference, maintaining strong benchmark quality while improving speed and memory efficiency.

NVIDIA AI

60,445 次观看 • 1 个月前

BlockVid Block Diffusion for High-Quality and Consistent Minute-Long Video Generation

BlockVid Block Diffusion for High-Quality and Consistent Minute-Long Video Generation

AK

19,949 次观看 • 7 个月前