Video yรผkleniyor...
Video Yรผklenemedi
๐ข๐ข๐ข ๐๐๐๐: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers TL;DR: for 3D camera control in generative video, it really helps knowing *which* part of your model you should mess with Internship by Sherwin Bahmani at Snap
23,040 gรถrรผntรผleme โข 1 yฤฑl รถnce โขvia X (Twitter)
5 Yorum

TL;DR (expanded): 1) "when" in the diffusion process you condition for camera matters (i.e. noise scheduler) 2) "how" in the diffusion process you condition for camera maters (i.e. architecture) 3) "what data" you give to your diffusion model to condition camera matters

Why, you ask? 1) camera motion is low-frequency... early denoising iterations deal with low-frequency content 2) early DiT blocks are enough to fine-tune for camera control... more and you lose quality 3) model needs to know what a static view of the dynamic world looks like

A shout to the collaborators @isskoro @guocheng_qian A. Siarohin @willimenapace @SergeyTulyakov at Snap and @DaveLindell at UofT.

@sherwinbahmani Congrats @sherwinbahmani !!

@sherwinbahmani Congrats to the team ! Amazing work
