Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

NVIDIA AI Released DiffusionRenderer: An AI Model for Editable, Photorealistic 3D Scenes from a Single Video In a groundbreaking new paper, researchers at NVIDIA, University of Toronto, Vector Institute and the University of Illinois Urbana-Champaign have unveiled a framework that directly tackles this challenge. DiffusionRenderer represents a revolutionary leap...

104,741 Aufrufe • vor 11 Monaten •via X (Twitter)

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

MrNeRF

52,801 Aufrufe • vor 1 Jahr

Google dropped a new AI paper called LUMIERE. It's remarkably flexible, supporting video inpainting, image-to-video, AND stylized video generation tasks. Say hello to “space-time diffusion” for video generation! Now what the heck does that mean exactly?! 🌐⏳ → TL;DR it utilizes a “Space-Time UNet” architecture that generates the full duration of the video in one pass, rather than generating distant keyframes and interpolating between them like prior works. Because the computation is done in this “compressed space-time representation” to generate the full clip at once, it's far more temporally consistent. → Another benefit of generating the full video at once is that you can “direct” the video generation, making it easier to hand off to other models/tasks without having to stitch together partial solutions. You can condition generations on additional inputs, meaning you get the full stack of AI video capabilities – from video inpainting to image-to-video and beyond. → New SOTA for AI video generation? User study results in the paper suggest human evaluators preferred Lumiere over Runway Gen-2, Pika Labs, and Stable Video Diffusion in terms of quality, text alignment AND motion. But as always, we need to get hands-on with this tech when Google *actually* decides to ship it. → Could this end up inside YouTube? Y’all know i’m obsessed with blending reality and imagination – so it’s the video inpainting tech I'm most excited about. I really hope this model finds its way into YouTube's Generative AI efforts, and based on their prior announcements and the list of acknowledgments in the paper I think it might! 🤞🏽 Links: 🔗Paper: 🔗Project:

Bilawal Sidhu

44,800 Aufrufe • vor 2 Jahren