Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

๐Ÿ“ข 3D world models from video diffusion suffer from inconsistent frames -> blurry output. Our fix: instead of naรฏve 3D reconstruction, we non-rigidly align each frame into a globally-consistent 3DGS representation. ->sharp visuals on top of any VDM!

39,718 Aufrufe โ€ข vor 2 Monaten โ€ขvia X (Twitter)

0 Kommentare

Keine Kommentare verfรผgbar

Kommentare vom Original-Post werden hier angezeigt

ร„hnliche Videos

๐Ÿ“ข๐Ÿ“ข ๐๐ž๐ซ๐œ๐‡๐ž๐š๐: ๐๐ž๐ซ๐œ๐ž๐ฉ๐ญ๐ฎ๐š๐ฅ ๐‡๐ž๐š๐ ๐Œ๐จ๐๐ž๐ฅ ๐Ÿ๐จ๐ซ ๐’๐ข๐ง๐ ๐ฅ๐ž-๐ˆ๐ฆ๐š๐ ๐ž ๐Ÿ‘๐ƒ ๐‡๐ž๐š๐ ๐‘๐ž๐œ๐จ๐ง๐ฌ๐ญ๐ซ๐ฎ๐œ๐ญ๐ข๐จ๐ง & ๐„๐๐ข๐ญ๐ข๐ง๐ ๐Ÿ“ข๐Ÿ“ข PercHead reconstructs realistic 3D heads from a single image and enables disentangled 3D editing via geometric controls and style inputs from images or text. At its core is a generalized 3D head decoder trained with perceptual supervision from DINOv2 and SAM 2.1. We find that our new perceptual loss formulation improves reconstruction fidelity compared to commonly-used methods such as LPIPS. Our trained reconstruction model is able to generate 3D-consistent heads from a single input image. Even with challenging side-view inputs, the model robustly infers missing regions for a coherent, high-fidelity output. In addition, our architecture seamlessly adapts to downstream tasks: by swapping the encoder, we can transform the model into a disentangled 3D editing pipeline. In this scenario, we can control geometry through - potentially hand-drawn - segmentation maps, and condition style via image or text prompt. We also provide an interactive GUI to enable the exploration of our editing pipeline. ๐ŸŒ ๐Ÿ“ฝ๏ธ Great work by Antonio Oroz and Tobias Kirschstein

Matthias Niessner

18,808 Aufrufe โ€ข vor 7 Monaten