Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Introducing HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction. We built a multi-camera system and a semi-automatic method for annotating the shape and pose of hands and objects Project page:

Yu Xiang

5,993 subscribers

57,228 просмотров • 1 год назад •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

Комментарии: 8

Фото профиля Yu Xiang

Yu Xiang1 год назад

The capture system consists of 8 Intel RealSense D455 cameras and 1 Microsoft Azure Kinect positioned above a table. All the cameras are calibrated. Users wear a Microsoft HoloLens AR headset during data collection

Фото профиля Yu Xiang

Yu Xiang1 год назад

First, we use BundleSDF to reconstruct the textured meshes of objects. To prepare the input data, we manually move and rotate an object in front of the Azure Kinect camera, ensuring exhaustive coverage of the surfaces for high fidelity reconstruction.

Фото профиля Yu Xiang

Yu Xiang1 год назад

64 objects are reconstructed in the HO-Cap dataset

Фото профиля Yu Xiang

Yu Xiang1 год назад

In our semi-automatic annotation pipeline, we use FoundationPose for initial object pose estimation, MediaPipe for hand pose estimation followed by joint optimization of hands and objects based on SDF optimization

Фото профиля Yu Xiang

Yu Xiang1 год назад

Finally, the HO-Cap dataset provides segmentation masks and poses of hands and objects in the collected 64 videos, including first-person view videos from HoloLens

Фото профиля Yu Xiang

Yu Xiang1 год назад

This annotation pipeline has limitations: 1) BuddleSDF cannot reconstruct some objects very well, 2) MediaPipe occasionally fails to detect hand joints, 3) Object pose estimation may fail when small objects are held within the hand. Failed videos are excluded from the dataset

Фото профиля Yu Xiang

Yu Xiang1 год назад

This project is led by Jikai Wang from IRVL at UT Dallas, in collaboration with Yu-Wei Chao @yu_wei_chao and Bowen Wen @bowenwen_me at NVIDIA. A toolbox to use the dataset is available at

Фото профиля Yu Xiang

Yu Xiang1 год назад

In addition to use this dataset to study vision problems, another motivation to is to use the data as human demonstrations for dexterous robot manipulation. These manipulation actions in HO-Cap are very difficult for the current robots, a challenge for future robotic systems

Похожие видео

Introducing SAM 3D, the newest addition to the SAM collection, bringing common sense 3D understanding of everyday images. SAM 3D includes two models: 🛋️ SAM 3D Objects for object and scene reconstruction 🧑‍🤝‍🧑 SAM 3D Body for human pose and shape estimation Both models achieve state-of-the-art performance transforming static 2D images into vivid, accurate reconstructions. 🔗 Learn more:

Introducing SAM 3D, the newest addition to the SAM collection, bringing common sense 3D understanding of everyday images. SAM 3D includes two models: 🛋️ SAM 3D Objects for object and scene reconstruction 🧑‍🤝‍🧑 SAM 3D Body for human pose and shape estimation Both models achieve state-of-the-art performance transforming static 2D images into vivid, accurate reconstructions. 🔗 Learn more:

AI at Meta

857,258 просмотров • 7 месяцев назад

Introducing the Digital Twin Catalog: We believe it’s the world’s highest quality dataset for object reconstruction research. It consists of over 2,400 physical objects scanned with state-of-art hardware and hand-tuned to achieve sub-millimeter-level geometric accuracy and photorealism. Project Aria @Meta

Introducing the Digital Twin Catalog: We believe it’s the world’s highest quality dataset for object reconstruction research. It consists of over 2,400 physical objects scanned with state-of-art hardware and hand-tuned to achieve sub-millimeter-level geometric accuracy and photorealism. Project Aria @Meta

Reality Labs at Meta

27,757 просмотров • 1 год назад

🚀 Struggling with the lack of high-quality data for AI-driven human-object interaction research? We've got you covered! Introducing HUMOTO, a groundbreaking 4D dataset for human-object interaction, developed with a combination of wearable motion capture, SOTA 6D pose estimation vision models, LLM, and the professional refining works of multiple animation studios. HUMOTO features: ✅ Over 700 diverse daily activities ✅ Interactions with 60+ objects, 70+ articulated parts. ✅ Fine-grained text annotations ✅ Detailed hand and finger movements We hope HUMOTO will fuel your Humanoid AI research and drive new advancements! For research or commercial license inquiries, please contact yizho@adobe.com. Explore the dataset: 👉 HUMOTO Dataset Website Learn more: 👉 HUMOTO Project Page Jiaxin Lu Qixing Huang #AI #HumanObjectInteraction #HumanoidAI #MotionCapture #HUMOTO #AdobeResearch

🚀 Struggling with the lack of high-quality data for AI-driven human-object interaction research? We've got you covered! Introducing HUMOTO, a groundbreaking 4D dataset for human-object interaction, developed with a combination of wearable motion capture, SOTA 6D pose estimation vision models, LLM, and the professional refining works of multiple animation studios. HUMOTO features: ✅ Over 700 diverse daily activities ✅ Interactions with 60+ objects, 70+ articulated parts. ✅ Fine-grained text annotations ✅ Detailed hand and finger movements We hope HUMOTO will fuel your Humanoid AI research and drive new advancements! For research or commercial license inquiries, please contact [email protected]. Explore the dataset: 👉 HUMOTO Dataset Website Learn more: 👉 HUMOTO Project Page Jiaxin Lu Qixing Huang #AI #HumanObjectInteraction #HumanoidAI #MotionCapture #HUMOTO #AdobeResearch

Yi Zhou

71,548 просмотров • 1 год назад

In their latest video, Boston Dynamics’s AI team explains how they make the Atlas humanoid perceive and interact with the world. Atlas uses an agile perception system to understand both the shape and context of objects in complex environments. Atlas combines 2D and 3D awareness, keypoint-based localization, and an object pose tracking system that fuses vision, kinematics, and object knowledge to handle occlusion and uncertainty. Accurate calibration ensures precise hand–eye coordination for reliable manipulation. The team is now working toward a unified model that merges perception and control – pushing beyond spatial AI toward physical intelligence.

In their latest video, Boston Dynamics’s AI team explains how they make the Atlas humanoid perceive and interact with the world. Atlas uses an agile perception system to understand both the shape and context of objects in complex environments. Atlas combines 2D and 3D awareness, keypoint-based localization, and an object pose tracking system that fuses vision, kinematics, and object knowledge to handle occlusion and uncertainty. Accurate calibration ensures precise hand–eye coordination for reliable manipulation. The team is now working toward a unified model that merges perception and control – pushing beyond spatial AI toward physical intelligence.

The Humanoid Hub

49,655 просмотров • 1 год назад

ViPE: Video Pose Engine for 3D Geometric Perception Contributions: • A robust and efficient framework, ViPE, for estimating camera parameters and dense depth from diverse, in-the-wild videos. • A system design that integrates the strengths of classical SLAM (efficiency, scalability) and learned models (robustness), with key improvements in efficiency, dynamic object handling, and depth quality over prior work. • A large-scale dataset of annotated videos, created using ViPE, to facilitate future research in 3D computer vision.

ViPE: Video Pose Engine for 3D Geometric Perception Contributions: • A robust and efficient framework, ViPE, for estimating camera parameters and dense depth from diverse, in-the-wild videos. • A system design that integrates the strengths of classical SLAM (efficiency, scalability) and learned models (robustness), with key improvements in efficiency, dynamic object handling, and depth quality over prior work. • A large-scale dataset of annotated videos, created using ViPE, to facilitate future research in 3D computer vision.

MrNeRF

42,534 просмотров • 10 месяцев назад

MAGS-SLAM: Monocular Multi-Agent Gaussian Splatting SLAM for Geometrically and Photometrically Consistent Reconstruction TL;DR: The first RGB-only multi-agent 3D Gaussian Splatting SLAM for collaborative photorealistic scene reconstruction. Contributions: (1) We propose the first monocular RGB-only multi-agent 3D Gaussian Splatting SLAM system. It integrates Gaussian front-ends, compact submap summaries, inter-agent verification, Sim(3) submap pose graph, and occupancy-aware fusion into a unified framework, achieving accurate tracking and photorealistic reconstruction without depth sensors. (2) We propose a Pose-Graph Bundle Adjustment (PGBA)-consistent Sim(3) loop closure mechanism for multi-agent systems, which jointly resolves intra- and inter-agent scale drift through a submap-level Sim(3) pose graph coupling geometric and photometric residuals. Robustness is ensured by a spatial-extent gate that rejects degenerate loops and an adaptive edge invalidation scheme consistent with evolving PGBA corrections. (3) We propose an occupancy-aware fusion framework for coherent multi-agent Gaussian maps. It combines occupancy-grid deduplication, decoupled coordinator, and joint pose-Gaussian photometric refinement to eliminate duplicated Gaussians, residual misalignment, and photometric seams across agents. (4) We introduce ReplicaMultiagent Plus dataset. While existing multi-agent datasets are typically limited to 2-3 agents with short trajectories, our dataset scales to 4 agents with long-horizon trajectories. In addition, we provide ground-truth geometry and semantic annotations, supporting the evaluation of monocular, RGB-D, and semantic multi-agent SLAM for collaborative dense reconstruction.

MAGS-SLAM: Monocular Multi-Agent Gaussian Splatting SLAM for Geometrically and Photometrically Consistent Reconstruction TL;DR: The first RGB-only multi-agent 3D Gaussian Splatting SLAM for collaborative photorealistic scene reconstruction. Contributions: (1) We propose the first monocular RGB-only multi-agent 3D Gaussian Splatting SLAM system. It integrates Gaussian front-ends, compact submap summaries, inter-agent verification, Sim(3) submap pose graph, and occupancy-aware fusion into a unified framework, achieving accurate tracking and photorealistic reconstruction without depth sensors. (2) We propose a Pose-Graph Bundle Adjustment (PGBA)-consistent Sim(3) loop closure mechanism for multi-agent systems, which jointly resolves intra- and inter-agent scale drift through a submap-level Sim(3) pose graph coupling geometric and photometric residuals. Robustness is ensured by a spatial-extent gate that rejects degenerate loops and an adaptive edge invalidation scheme consistent with evolving PGBA corrections. (3) We propose an occupancy-aware fusion framework for coherent multi-agent Gaussian maps. It combines occupancy-grid deduplication, decoupled coordinator, and joint pose-Gaussian photometric refinement to eliminate duplicated Gaussians, residual misalignment, and photometric seams across agents. (4) We introduce ReplicaMultiagent Plus dataset. While existing multi-agent datasets are typically limited to 2-3 agents with short trajectories, our dataset scales to 4 agents with long-horizon trajectories. In addition, we provide ground-truth geometry and semantic annotations, supporting the evaluation of monocular, RGB-D, and semantic multi-agent SLAM for collaborative dense reconstruction.

MrNeRF

19,223 просмотров • 1 месяц назад

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds! No expensive optimization needed, yet delivers SOTA results for: ✅ Camera Pose Estimation ✅ Multi-view Depth Estimation ✅ Dense Point Cloud Reconstruction ✅ Point Tracking Project Page: Code & Weights:

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds! No expensive optimization needed, yet delivers SOTA results for: ✅ Camera Pose Estimation ✅ Multi-view Depth Estimation ✅ Dense Point Cloud Reconstruction ✅ Point Tracking Project Page: Code & Weights:

Jianyuan

203,078 просмотров • 1 год назад

LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos Contributions: • An incremental joint optimization approach for simultaneous camera pose and 3DGS reconstruction, reducing local minima and ensuring global consistency. • A robust pose estimation module leveraging learned 3D priors for accurate camera pose estimation. • An adaptive Octree Anchor Formation strategy that significantly reduces memory usage while preserving reconstruction quality.

LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos Contributions: • An incremental joint optimization approach for simultaneous camera pose and 3DGS reconstruction, reducing local minima and ensuring global consistency. • A robust pose estimation module leveraging learned 3D priors for accurate camera pose estimation. • An adaptive Octree Anchor Formation strategy that significantly reduces memory usage while preserving reconstruction quality.

MrNeRF

35,969 просмотров • 10 месяцев назад

This seemingly obvious prediction didn't take long to become reality. MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors Contributions: • The first real-time SLAM system using the two-view 3D reconstruction prior MASt3R [20] as a foundation. • Efficient techniques for pointmap matching, tracking and local fusion, graph construction and loop closure, and second-order global optimization. • A state-of-the-art dense SLAM system capable of handling generic, time-varying camera models. Abstract: We present a real-time monocular dense SLAM system, designed from the ground up using MASt3R, a two-view 3D reconstruction and matching prior. Equipped with this strong prior, our system remains robust on in-the-wild video sequences, making no assumptions on a fixed or parametric camera model beyond a unique camera center. Key features include: - Efficient methods for pointmap matching, camera tracking, and local fusion - Graph construction and loop closure - Second-order global optimization With known calibration, a simple modification achieves state-of-the-art performance across various benchmarks. Altogether, we propose a plug-and-play monocular SLAM system capable of producing globally-consistent poses and dense geometry while operating at 15 FPS.

This seemingly obvious prediction didn't take long to become reality. MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors Contributions: • The first real-time SLAM system using the two-view 3D reconstruction prior MASt3R [20] as a foundation. • Efficient techniques for pointmap matching, tracking and local fusion, graph construction and loop closure, and second-order global optimization. • A state-of-the-art dense SLAM system capable of handling generic, time-varying camera models. Abstract: We present a real-time monocular dense SLAM system, designed from the ground up using MASt3R, a two-view 3D reconstruction and matching prior. Equipped with this strong prior, our system remains robust on in-the-wild video sequences, making no assumptions on a fixed or parametric camera model beyond a unique camera center. Key features include: - Efficient methods for pointmap matching, camera tracking, and local fusion - Graph construction and loop closure - Second-order global optimization With known calibration, a simple modification achieves state-of-the-art performance across various benchmarks. Altogether, we propose a plug-and-play monocular SLAM system capable of producing globally-consistent poses and dense geometry while operating at 15 FPS.

MrNeRF

29,935 просмотров • 1 год назад

Ropedia Xperience-10M is out on Hugging Face a large-scale egocentric multimodal dataset of human experience for embodied AI, robotics, world models, and spatial intelligence It contains 10 million experiences (interaction) and 10,000 hours of synchronized first-person recordings with six video streams, audio, stereo depth, camera pose, hand mocap, full-body mocap, IMU, and hierarchical language annotations dataset:

Ropedia Xperience-10M is out on Hugging Face a large-scale egocentric multimodal dataset of human experience for embodied AI, robotics, world models, and spatial intelligence It contains 10 million experiences (interaction) and 10,000 hours of synchronized first-person recordings with six video streams, audio, stereo depth, camera pose, hand mocap, full-body mocap, IMU, and hierarchical language annotations dataset:

AK

10,461 просмотров • 3 месяцев назад

Code and data are now online for CameraHMR, our state-of-the-art parametric 3D human pose and shape (HPS) estimation method that will appear at hashtag#3DV2025. There are 4 key contributions that make it so accurate and robust: 1. To get accurate 3D shape and pose as well as good alignment to image features, you need to know the focal length of the camera. To solve this, we train HumanFOV to compute the field of view. 2. We introduce CameraHMR, which integrates HumanFOV into HMR2.0 to exploit the estimated focal length. 3. To get accurate pseudo ground truth (pGT) training data, we compute the focal length for images in 4DHumans dataset and modify SMPLify to take this into account. 4. But SMPLify only uses sparse 2D keypoints, which do not capture body shape. So we train a dense surface keypoint detector, DenseKP, on BEDLAM and run it on 4DHumans, resulting in improved body shape. The resulting method is CamSMPLify. We iterate training CameraHMR and running CamSMPLify on the training set initialized with CameraHMR. This results in much improved pGT for 4DHumans and a SOTA single-image HMR method.

Code and data are now online for CameraHMR, our state-of-the-art parametric 3D human pose and shape (HPS) estimation method that will appear at hashtag#3DV2025. There are 4 key contributions that make it so accurate and robust: 1. To get accurate 3D shape and pose as well as good alignment to image features, you need to know the focal length of the camera. To solve this, we train HumanFOV to compute the field of view. 2. We introduce CameraHMR, which integrates HumanFOV into HMR2.0 to exploit the estimated focal length. 3. To get accurate pseudo ground truth (pGT) training data, we compute the focal length for images in 4DHumans dataset and modify SMPLify to take this into account. 4. But SMPLify only uses sparse 2D keypoints, which do not capture body shape. So we train a dense surface keypoint detector, DenseKP, on BEDLAM and run it on 4DHumans, resulting in improved body shape. The resulting method is CamSMPLify. We iterate training CameraHMR and running CamSMPLify on the training set initialized with CameraHMR. This results in much improved pGT for 4DHumans and a SOTA single-image HMR method.

Michael Black

21,647 просмотров • 1 год назад

📢Introducing 360Anything, our method for lifting any perspective image or video to gravity-aligned 360° panoramas without using any camera or 3D information. This enables consistent novel view synthesis and 3D scene reconstruction. Project page: 🧵

📢Introducing 360Anything, our method for lifting any perspective image or video to gravity-aligned 360° panoramas without using any camera or 3D information. This enables consistent novel view synthesis and 3D scene reconstruction. Project page: 🧵

Ziyi Wu

46,914 просмотров • 4 месяцев назад

A totally new level of pose control in ComfyUI - VNCCS. - full 3D Pose Studio for character posing & lighting, - multi-pose, body generator, pose gallery; - vision-guided QWEN Detailer + camera controls.

A totally new level of pose control in ComfyUI - VNCCS. - full 3D Pose Studio for character posing & lighting, - multi-pose, body generator, pose gallery; - vision-guided QWEN Detailer + camera controls.

Wildminder

163,548 просмотров • 4 месяцев назад

Multi-modal #LLMs understand a lot about humans. But do they understand our 3D pose? We train #PoseGPT to estimate, generate, and reason about 3D human pose (#SMPL) in images and text. This is the first true foundation model for understanding 3D humans.

Multi-modal #LLMs understand a lot about humans. But do they understand our 3D pose? We train #PoseGPT to estimate, generate, and reason about 3D human pose (#SMPL) in images and text. This is the first true foundation model for understanding 3D humans.

Michael Black

81,365 просмотров • 2 лет назад

NVIDIA AI Open-Sources ViPE (Video Pose Engine): A Powerful and Versatile 3D Video Annotation Tool for Spatial AI ViPE integrates bundle adjustment with dense optical flow, sparse keypoint tracking, and metric depth priors to estimate camera intrinsics, poses, and dense depth maps at 3–5 FPS on a single GPU. It significantly improves over prior uncalibrated pose estimation methods, achieving 18% and 50% error reduction on TUM and KITTI benchmarks, respectively, and shows robustness to dynamic scenes and diverse camera models. Beyond the method, the NVIDIA team also released a large-scale dataset comprising ~100K real-world internet videos, 1M AI-generated videos, and 2K panoramic videos (≈96M frames) annotated with metric depth and poses. This dataset and engine aim to accelerate training for spatial AI tasks such as 3D reconstruction, video generation, and robotics.... full analysis: paper: codes: NVIDIA NVIDIA AI NVIDIAnewsroom NVIDIA Robotics NVIDIA AIDev NVIDIAdeveloper

NVIDIA AI Open-Sources ViPE (Video Pose Engine): A Powerful and Versatile 3D Video Annotation Tool for Spatial AI ViPE integrates bundle adjustment with dense optical flow, sparse keypoint tracking, and metric depth priors to estimate camera intrinsics, poses, and dense depth maps at 3–5 FPS on a single GPU. It significantly improves over prior uncalibrated pose estimation methods, achieving 18% and 50% error reduction on TUM and KITTI benchmarks, respectively, and shows robustness to dynamic scenes and diverse camera models. Beyond the method, the NVIDIA team also released a large-scale dataset comprising ~100K real-world internet videos, 1M AI-generated videos, and 2K panoramic videos (≈96M frames) annotated with metric depth and poses. This dataset and engine aim to accelerate training for spatial AI tasks such as 3D reconstruction, video generation, and robotics.... full analysis: paper: codes: NVIDIA NVIDIA AI NVIDIAnewsroom NVIDIA Robotics NVIDIA AIDev NVIDIAdeveloper

Marktechpost AI Dev News ⚡

217,453 просмотров • 9 месяцев назад

GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering Contributions: • We reformulate video stabilization as a novel 3D grounded scheme of local reconstruction and rendering. This approach is naturally robust to diverse camera motions and scene dynamics, is temporally consistent, and is capable of full frame stabilization. • We propose a novel test-time optimization for each unstable video. It leverages multi-view dynamics-aware photometric supervision and cross-frame regularization to achieve temporally consistent reconstructions. To avoid frame cropping, we introduce a scene extrapolation module based on video completion. • We provide a 3D-grounded dataset for our task by re-purposing an existing one, and introduce new metrics on sparse and dense reconstruction to evaluate 3D scene consistency. Extensive experiments (quantitative, qualitative, user study) versus image-based and gyro-basedmethods demonstrate the merits of our method.

GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering Contributions: • We reformulate video stabilization as a novel 3D grounded scheme of local reconstruction and rendering. This approach is naturally robust to diverse camera motions and scene dynamics, is temporally consistent, and is capable of full frame stabilization. • We propose a novel test-time optimization for each unstable video. It leverages multi-view dynamics-aware photometric supervision and cross-frame regularization to achieve temporally consistent reconstructions. To avoid frame cropping, we introduce a scene extrapolation module based on video completion. • We provide a 3D-grounded dataset for our task by re-purposing an existing one, and introduce new metrics on sparse and dense reconstruction to evaluate 3D scene consistency. Extensive experiments (quantitative, qualitative, user study) versus image-based and gyro-basedmethods demonstrate the merits of our method.

MrNeRF

11,638 просмотров • 11 месяцев назад

Introducing ShapeR, a method for robust conditional 3D shape generation from casually captured sequences. ShapeR leverages a rectified flow transformer conditioned on per-object multimodal data to turn casual image sequences into full metric scene reconstructions. Project Page: Paper: Links to code and huggingface below ⬇️

Introducing ShapeR, a method for robust conditional 3D shape generation from casually captured sequences. ShapeR leverages a rectified flow transformer conditioned on per-object multimodal data to turn casual image sequences into full metric scene reconstructions. Project Page: Paper: Links to code and huggingface below ⬇️

Yawar Siddiqui

70,391 просмотров • 4 месяцев назад

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

AK

122,525 просмотров • 2 лет назад

Fire and smoke VFX with GeoTracker for Blender! Here’s how: Blender’s built-in tracker for camera tracking, GeoTracker for precise 3D object tracking, and Blender for simulations — done fully in Blender. Get GeoTracker: #b3d

Fire and smoke VFX with GeoTracker for Blender! Here’s how: Blender’s built-in tracker for camera tracking, GeoTracker for precise 3D object tracking, and Blender for simulations — done fully in Blender. Get GeoTracker: #b3d

KeenTools

16,461 просмотров • 1 год назад

"near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence [...] a Neural Object Field that is learned concurrently with a pose graph optimization process in order to robustly accumulate information into a consistent 3D representation"

"near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence [...] a Neural Object Field that is learned concurrently with a pose graph optimization process in order to robustly accumulate information into a consistent 3D representation"

Fabien Benetou

222,906 просмотров • 3 лет назад