正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

📢Face Anything: 4D Face Reconstruction from Any Image Sequence Transformer model for 4D face reconstruction and dense tracking: - predict canonical facial coordinates per pixel - tracking as reconstruction in canonical space - geometry + correspondences in one forward pass Key idea: a shared canonical space across frames -... show more

Matthias Niessner

47,672 subscribers

61,427 次观看 • 2 个月前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

📢Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction📢 -> highly accurate face reconstruction by training powerful VITs via surface normals and UV-coordinates estimation. The geometric cues from our 2D foundation model backbone constrain the 3DMM parameters, which allows us to achieve remarkable reconstruction accuracy - works for both single image and videos! In addition, we introduce a new 3D face reconstruction benchmark that evaluates both neutral and posed face geometry. 🌍 📷 Great work by Simon Giebenhain Tobias Kirschstein Martin Rünz Lourdes Agapito

📢Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction📢 -> highly accurate face reconstruction by training powerful VITs via surface normals and UV-coordinates estimation. The geometric cues from our 2D foundation model backbone constrain the 3DMM parameters, which allows us to achieve remarkable reconstruction accuracy - works for both single image and videos! In addition, we introduce a new 3D face reconstruction benchmark that evaluates both neutral and posed face geometry. 🌍 📷 Great work by Simon Giebenhain Tobias Kirschstein Martin Rünz Lourdes Agapito

Matthias Niessner

62,104 次观看 • 1 年前

SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes abs: paper page: Existing methods for the 4D reconstruction of general, non-rigidly deforming objects focus on novel-view synthesis and neglect correspondences. However, time consistency enables advanced downstream tasks like 3D editing, motion analysis, or virtual-asset creation. We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner. Our dynamic-NeRF method takes multi-view RGB videos and background images from static cameras with known camera parameters as input. It then reconstructs the deformations of an estimated canonical model of the geometry and appearance in an online fashion. Since this canonical model is time-invariant, we obtain correspondences even for long-term, long-range motions. We employ neural scene representations to parametrize the components of our method. Like prior dynamic-NeRF methods, we use a backwards deformation model. We find non-trivial adaptations of this model necessary to handle larger motions: We decompose the deformations into a strongly regularized coarse component and a weakly regularized fine component, where the coarse component also extends the deformation field into the space surrounding the object, which enables tracking over time. We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.

SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes abs: paper page: Existing methods for the 4D reconstruction of general, non-rigidly deforming objects focus on novel-view synthesis and neglect correspondences. However, time consistency enables advanced downstream tasks like 3D editing, motion analysis, or virtual-asset creation. We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner. Our dynamic-NeRF method takes multi-view RGB videos and background images from static cameras with known camera parameters as input. It then reconstructs the deformations of an estimated canonical model of the geometry and appearance in an online fashion. Since this canonical model is time-invariant, we obtain correspondences even for long-term, long-range motions. We employ neural scene representations to parametrize the components of our method. Like prior dynamic-NeRF methods, we use a backwards deformation model. We find non-trivial adaptations of this model necessary to handle larger motions: We decompose the deformations into a strongly regularized coarse component and a weakly regularized fine component, where the coarse component also extends the deformation field into the space surrounding the object, which enables tracking over time. We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.

AK

76,380 次观看 • 2 年前

Introducing St4RTrack!🖖 Simultaneous 4D Reconstruction and Tracking in the world coordinate feed-forwardly, just by changing the meaning of two pointmaps!

Introducing St4RTrack!🖖 Simultaneous 4D Reconstruction and Tracking in the world coordinate feed-forwardly, just by changing the meaning of two pointmaps!

Junyi Zhang

52,426 次观看 • 1 年前

GauS-SLAM: Dense RGB-D SLAM with Gaussian Surfels • We propose a 2D Gaussian-based incremental reconstruction strategy and a Surface-aware Depth Rendering mechanism. This approach effectively mitigates geometry distortions and improves tracking accuracy. • Our dense SLAM system features a front-end/back-end architecture and incorporates a local map design, ensuring tracking accuracy and efficiency. • We conduct extensive experiments demonstrating the superiority of our approach in both tracking accuracy and reconstruction quality compared to SOTA methods.

GauS-SLAM: Dense RGB-D SLAM with Gaussian Surfels • We propose a 2D Gaussian-based incremental reconstruction strategy and a Surface-aware Depth Rendering mechanism. This approach effectively mitigates geometry distortions and improves tracking accuracy. • Our dense SLAM system features a front-end/back-end architecture and incorporates a local map design, ensuring tracking accuracy and efficiency. • We conduct extensive experiments demonstrating the superiority of our approach in both tracking accuracy and reconstruction quality compared to SOTA methods.

MrNeRF

11,004 次观看 • 1 年前

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds! No expensive optimization needed, yet delivers SOTA results for: ✅ Camera Pose Estimation ✅ Multi-view Depth Estimation ✅ Dense Point Cloud Reconstruction ✅ Point Tracking Project Page: Code & Weights:

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds! No expensive optimization needed, yet delivers SOTA results for: ✅ Camera Pose Estimation ✅ Multi-view Depth Estimation ✅ Dense Point Cloud Reconstruction ✅ Point Tracking Project Page: Code & Weights:

Jianyuan

203,078 次观看 • 1 年前

Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any time to any view at any other time? Introducing 4D-LRM: a Large Space-Time Reconstruction Model that ... 🔹 Predicts 4D Gaussian primitives directly from multi-view tokens (no motion vectors, no HexPlane); 🔹 Uses a clean, minimal Transformer backbone; 🔹 Generalizes with fast, high-quality feedforward rendering at any view and infinite frame rate. Check out more interactive demos and scaling behaviors on our homepage/paper. 👉Website: 👉Paper:

Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any time to any view at any other time? Introducing 4D-LRM: a Large Space-Time Reconstruction Model that ... 🔹 Predicts 4D Gaussian primitives directly from multi-view tokens (no motion vectors, no HexPlane); 🔹 Uses a clean, minimal Transformer backbone; 🔹 Generalizes with fast, high-quality feedforward rendering at any view and infinite frame rate. Check out more interactive demos and scaling behaviors on our homepage/paper. 👉Website: 👉Paper:

Martin Ziqiao Ma

21,787 次观看 • 1 年前

📢 New Paper PointSt3R: Point Tracking through 3D Grounded Correspondence Can point tracking be re-formulated as pairwise frame correspondence solely? We fine-tuning MASt3R with dynamic correspondences and a visibility loss and achieve competitive point tracking results 1/3

📢 New Paper PointSt3R: Point Tracking through 3D Grounded Correspondence Can point tracking be re-formulated as pairwise frame correspondence solely? We fine-tuning MASt3R with dynamic correspondences and a visibility loss and achieve competitive point tracking results 1/3

Dima Damen

10,458 次观看 • 7 个月前

Alibaba just released LHM on Hugging Face Large Animatable Human Reconstruction Model from a Single Image in Seconds

Alibaba just released LHM on Hugging Face Large Animatable Human Reconstruction Model from a Single Image in Seconds

AK

170,372 次观看 • 1 年前

Excited to share MonST3R! -- a simple way to estimate geometry from unposed video of dynamic scene We achieve competitive results on several downstreams (video depth, camera pose) and believe this is a promising step toward feed-forward 4D reconstruction

Excited to share MonST3R! -- a simple way to estimate geometry from unposed video of dynamic scene We achieve competitive results on several downstreams (video depth, camera pose) and believe this is a promising step toward feed-forward 4D reconstruction

Junyi Zhang

131,541 次观看 • 1 年前

SpatialTrackerV2: unified, end-to-end 3D point tracking model which simultaneously estimates Camera Motion, Consistent Geometry and Pixel-wise 3D Trajectories.

SpatialTrackerV2: unified, end-to-end 3D point tracking model which simultaneously estimates Camera Motion, Consistent Geometry and Pixel-wise 3D Trajectories.

Bilawal Sidhu

20,346 次观看 • 11 个月前

4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos Abstract: We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction, trained entirely on real-world monocular posed videos. Using 4D Gaussian as an inductive bias, 4DGT unifies static and dynamic components, enabling the modeling of complex, time-varying environments with varying object lifespans. We introduced a novel density control strategy in training, which allows our 4DGT to handle longer space-time input while maintaining efficient rendering at runtime. Our model processes 64 consecutive posed frames in a rolling-window fashion, predicting consistent 4D Gaussians in the scene. Unlike optimization-based methods, 4DGT performs purely feed-forward inference, reducing reconstruction time from hours to seconds and scaling effectively to long video sequences. Trained only on large-scale monocular posed video datasets, 4DGT can significantly outperform prior Gaussian-based networks in real-world videos and achieve on-par accuracy with optimization-based methods on cross-domain videos.

4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos Abstract: We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction, trained entirely on real-world monocular posed videos. Using 4D Gaussian as an inductive bias, 4DGT unifies static and dynamic components, enabling the modeling of complex, time-varying environments with varying object lifespans. We introduced a novel density control strategy in training, which allows our 4DGT to handle longer space-time input while maintaining efficient rendering at runtime. Our model processes 64 consecutive posed frames in a rolling-window fashion, predicting consistent 4D Gaussians in the scene. Unlike optimization-based methods, 4DGT performs purely feed-forward inference, reducing reconstruction time from hours to seconds and scaling effectively to long video sequences. Trained only on large-scale monocular posed video datasets, 4DGT can significantly outperform prior Gaussian-based networks in real-world videos and achieve on-par accuracy with optimization-based methods on cross-domain videos.

MrNeRF

34,782 次观看 • 1 年前

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing abs: paper page: present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i.e., rendered from the canonical content field) to each individual frame along the time axis.Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline.We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e.g., the object shape) from the video.With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field.We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training.More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog.

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing abs: paper page: present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i.e., rendered from the canonical content field) to each individual frame along the time axis.Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline.We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e.g., the object shape) from the video.With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field.We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training.More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog.

AK

153,241 次观看 • 2 年前

📢📢GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction📢📢 Reconstructing high-fidelity 3D scenes from sparse RGB input is hard. It needs a strong 3D prior! We reformulate multi-view scene reconstruction as conditional 3D generation over overlapping spatial chunks, lifting posed image features into a generative shape prior via 3D conditioning. As an example prior, we build on Trellis2, and train it such that its reconstruction is pixel aligned and matches from all views. GenRecon achieves unprecedented reconstruction quality from any sparse RGB input sequence, even from a phone capture. The reconstruction also includes PBR materials which facilitates relighting and virtual object insertion. Amazing work by Katharina Schmid, Nicolas von Lützow, Jozef, Angela Dai

📢📢GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction📢📢 Reconstructing high-fidelity 3D scenes from sparse RGB input is hard. It needs a strong 3D prior! We reformulate multi-view scene reconstruction as conditional 3D generation over overlapping spatial chunks, lifting posed image features into a generative shape prior via 3D conditioning. As an example prior, we build on Trellis2, and train it such that its reconstruction is pixel aligned and matches from all views. GenRecon achieves unprecedented reconstruction quality from any sparse RGB input sequence, even from a phone capture. The reconstruction also includes PBR materials which facilitates relighting and virtual object insertion. Amazing work by Katharina Schmid, Nicolas von Lützow, Jozef, Angela Dai

Matthias Niessner

17,572 次观看 • 27 天前

Check out 𝐌𝐨𝐭𝐢𝐨𝐧𝟐𝐕𝐞𝐜𝐒𝐞𝐭𝐬, a 4D diffusion model for dynamic surface reconstruction from imperfect observations of sparse, noisy, or partial point clouds. Main idea: we represent time-varying shapes via 4D neural representation with latent vector sets, and then explicitly learns the shape and motion distribution of non-rigid objects through an iterative denoising process of latent vector sets. Great work by Wei Cao るちゃん Biao Zhang Jiapeng Tang

Check out 𝐌𝐨𝐭𝐢𝐨𝐧𝟐𝐕𝐞𝐜𝐒𝐞𝐭𝐬, a 4D diffusion model for dynamic surface reconstruction from imperfect observations of sparse, noisy, or partial point clouds. Main idea: we represent time-varying shapes via 4D neural representation with latent vector sets, and then explicitly learns the shape and motion distribution of non-rigid objects through an iterative denoising process of latent vector sets. Great work by Wei Cao るちゃん Biao Zhang Jiapeng Tang

Matthias Niessner

29,058 次观看 • 2 年前

New AI research from Meta – CoTracker3 Simpler and Better Point Tracking by Pseudo-Labelling Real Videos. More details ➡️ Demo on Hugging Face ➡️ Building on our previous work on CoTracker, this new model demonstrates impressive tracking results where points can be tracked for a long time even when they're occluded or leave the field of view. CoTracker3 achieves state-of-the-art, outperforming all recent point tracking approaches on standard benchmarks — often by a substantial margin. We've released the research paper, code and a demo on Hugging Face — along with models available under an A-NC license to support further research in this space.

New AI research from Meta – CoTracker3 Simpler and Better Point Tracking by Pseudo-Labelling Real Videos. More details ➡️ Demo on Hugging Face ➡️ Building on our previous work on CoTracker, this new model demonstrates impressive tracking results where points can be tracked for a long time even when they're occluded or leave the field of view. CoTracker3 achieves state-of-the-art, outperforming all recent point tracking approaches on standard benchmarks — often by a substantial margin. We've released the research paper, code and a demo on Hugging Face — along with models available under an A-NC license to support further research in this space.

AI at Meta

218,910 次观看 • 1 年前

My face tracking addon for Mayo by Chocolate Rice is available now! Features: -Full UE face tracking using Adjerry's template -Eyes bounce when you blink! (Toggleable) -Tail wags when you smile! :D -Reactive ears that respond to your face tracking! Custom face expressions that are triggered by your face tracking! -Frown and Mayo will cry :( -Puff your cheeks like you're holding your breath and her face will slowly turn blue! (These can be disabled in the menu) #Mayo_3D

My face tracking addon for Mayo by Chocolate Rice is available now! Features: -Full UE face tracking using Adjerry's template -Eyes bounce when you blink! (Toggleable) -Tail wags when you smile! :D -Reactive ears that respond to your face tracking! Custom face expressions that are triggered by your face tracking! -Frown and Mayo will cry :( -Puff your cheeks like you're holding your breath and her face will slowly turn blue! (These can be disabled in the menu) #Mayo_3D

Threevee

126,160 次观看 • 2 个月前

Zhou et al., "PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception" VGGT extended to dynamic scenes with a dynamic mask predictor.

Zhou et al., "PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception" VGGT extended to dynamic scenes with a dynamic mask predictor.

Kwang Moo Yi

11,492 次观看 • 7 个月前

Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Contributions: • We propose STORM, the first feed-forward, self-supervised method for fast and accurate reconstruction of dynamic 3D scenes from sparse, multi-timestep, posed camera images. • Our bottom-up framework aggregates and transforms per-frame 3D Gaussian Splats into a cohesive scene representation, enabling self-supervised motion estimation. Furthermore, we introduce motion tokens that capture common motion primitives and regularize motion predictions, facilitating dynamic motion group segmentation without explicit motion or correspondence supervision. • We present several enhancements for in-the-wild scenarios, including sky modeling, camera exposure inconsistency handling, large novel-view extrapolation, and fine-grained human motions reconstruction, making STORM well-suited for real-world applications.

Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Contributions: • We propose STORM, the first feed-forward, self-supervised method for fast and accurate reconstruction of dynamic 3D scenes from sparse, multi-timestep, posed camera images. • Our bottom-up framework aggregates and transforms per-frame 3D Gaussian Splats into a cohesive scene representation, enabling self-supervised motion estimation. Furthermore, we introduce motion tokens that capture common motion primitives and regularize motion predictions, facilitating dynamic motion group segmentation without explicit motion or correspondence supervision. • We present several enhancements for in-the-wild scenarios, including sky modeling, camera exposure inconsistency handling, large novel-view extrapolation, and fine-grained human motions reconstruction, making STORM well-suited for real-world applications.

MrNeRF

53,292 次观看 • 1 年前

movin' and shmoovin', ARKIT face tracking work for [@]WaterDarkE25 !!! 💙

movin' and shmoovin', ARKIT face tracking work for [@]WaterDarkE25 !!! 💙

MADS ☀💙

11,417 次观看 • 1 年前

My new Free VRChat base! Full Face Tracking, and a little extra for those who may not have Face Tracking! Get it in the replies:

My new Free VRChat base! Full Face Tracking, and a little extra for those who may not have Face Tracking! Get it in the replies:

omatsu

105,942 次观看 • 1 年前