正在加载视频...

视频加载失败

Introducing HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction. We built a multi-camera system and a semi-automatic method for annotating the shape and pose of hands and objects Project page:

57,228 次观看 • 1 年前 •via X (Twitter)

8 条评论

Yu Xiang 的头像
Yu Xiang1 年前

The capture system consists of 8 Intel RealSense D455 cameras and 1 Microsoft Azure Kinect positioned above a table. All the cameras are calibrated. Users wear a Microsoft HoloLens AR headset during data collection

Yu Xiang 的头像
Yu Xiang1 年前

First, we use BundleSDF to reconstruct the textured meshes of objects. To prepare the input data, we manually move and rotate an object in front of the Azure Kinect camera, ensuring exhaustive coverage of the surfaces for high fidelity reconstruction.

Yu Xiang 的头像
Yu Xiang1 年前

64 objects are reconstructed in the HO-Cap dataset

Yu Xiang 的头像
Yu Xiang1 年前

In our semi-automatic annotation pipeline, we use FoundationPose for initial object pose estimation, MediaPipe for hand pose estimation followed by joint optimization of hands and objects based on SDF optimization

Yu Xiang 的头像
Yu Xiang1 年前

Finally, the HO-Cap dataset provides segmentation masks and poses of hands and objects in the collected 64 videos, including first-person view videos from HoloLens

Yu Xiang 的头像
Yu Xiang1 年前

This annotation pipeline has limitations: 1) BuddleSDF cannot reconstruct some objects very well, 2) MediaPipe occasionally fails to detect hand joints, 3) Object pose estimation may fail when small objects are held within the hand. Failed videos are excluded from the dataset

Yu Xiang 的头像
Yu Xiang1 年前

This project is led by Jikai Wang from IRVL at UT Dallas, in collaboration with Yu-Wei Chao @yu_wei_chao and Bowen Wen @bowenwen_me at NVIDIA. A toolbox to use the dataset is available at

Yu Xiang 的头像
Yu Xiang1 年前

In addition to use this dataset to study vision problems, another motivation to is to use the data as human demonstrations for dexterous robot manipulation. These manipulation actions in HO-Cap are very difficult for the current robots, a challenge for future robotic systems

相关视频

MAGS-SLAM: Monocular Multi-Agent Gaussian Splatting SLAM for Geometrically and Photometrically Consistent Reconstruction TL;DR: The first RGB-only multi-agent 3D Gaussian Splatting SLAM for collaborative photorealistic scene reconstruction. Contributions: (1) We propose the first monocular RGB-only multi-agent 3D Gaussian Splatting SLAM system. It integrates Gaussian front-ends, compact submap summaries, inter-agent verification, Sim(3) submap pose graph, and occupancy-aware fusion into a unified framework, achieving accurate tracking and photorealistic reconstruction without depth sensors. (2) We propose a Pose-Graph Bundle Adjustment (PGBA)-consistent Sim(3) loop closure mechanism for multi-agent systems, which jointly resolves intra- and inter-agent scale drift through a submap-level Sim(3) pose graph coupling geometric and photometric residuals. Robustness is ensured by a spatial-extent gate that rejects degenerate loops and an adaptive edge invalidation scheme consistent with evolving PGBA corrections. (3) We propose an occupancy-aware fusion framework for coherent multi-agent Gaussian maps. It combines occupancy-grid deduplication, decoupled coordinator, and joint pose-Gaussian photometric refinement to eliminate duplicated Gaussians, residual misalignment, and photometric seams across agents. (4) We introduce ReplicaMultiagent Plus dataset. While existing multi-agent datasets are typically limited to 2-3 agents with short trajectories, our dataset scales to 4 agents with long-horizon trajectories. In addition, we provide ground-truth geometry and semantic annotations, supporting the evaluation of monocular, RGB-D, and semantic multi-agent SLAM for collaborative dense reconstruction.

MrNeRF

19,223 次观看 • 1 个月前