Загрузка видео...

Не удалось загрузить видео

На главную

At #CVPR2024: Tactile-augmented Radiance Fields! We probe a scene with a touch sensor and localize each sample within a NeRF. We use diffusion to estimate the tactile signals for the points we didn't touch. w/ Yiming Dou, Antonio Loquercio, Fengyu Yang, Yi Liu

38,611 просмотров • 2 лет назад •via X (Twitter)

Комментарии: 8

Фото профиля Andrew Owens
Andrew Owens2 лет назад

One fun technical detail: we mount the touch sensor to an RGB-D camera using a selfie stick. Since "vision-based" touch sensors (like DIGIT, GelSight) are based on ordinary cameras, you can use multi-view geometry to estimate the relative pose between both sensors!

Фото профиля Andrew Owens
Andrew Owens2 лет назад

Here's what the capturing procedure looks like.

Фото профиля Andrew Owens
Andrew Owens2 лет назад

Project page: Paper:

Фото профиля Andrew Owens
Andrew Owens2 лет назад

The idea of filling in a touch signal using a generative model is similar to recent work by @ShaohongZhong et al.: which uses robotic proprioception and GANs for object-scale reconstructions.

Фото профиля Yiming Dou
Yiming Dou2 лет назад

@antoniloq Thanks Andrew! This project wouldn’t be possible without you advising on every single detail!😀

Фото профиля Igor Gilitschenski
Igor Gilitschenski2 лет назад

@_YimingDou @antoniloq That is amazing work! Congrats everyone!

Фото профиля Andrew Owens
Andrew Owens2 лет назад

@_YimingDou @antoniloq Thanks so much, Igor!

Фото профиля Mustafa
Mustafa2 лет назад

@_YimingDou @antoniloq Let's chat. Unfortunately can't dm you

Похожие видео

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

AK

62,768 просмотров • 3 лет назад

I was really impressed by the UMI gripper (Cheng Chi et al.), but a key limitation is that **force-related data wasn’t captured**: humans feel haptic feedback through the mechanical springs, but the robot couldn’t leverage that info, limiting the data’s value for fine-grained manipulation tasks. Led by my amazing students Yolanda Zhu and Binghao Huang, we designed a **portable visuo-tactile gripper** by integrating our dense, flexible tactile arrays with the UMI gripper to enable large-scale in-the-wild data collection. 🔗 We demonstrate **cross-modal representation learning** and **downstream policy learning** on tasks requiring in-hand state estimation (e.g., test tube reorientation) and fine-grained force sensing (e.g., pipette fluid transfer). Key takeaways: - Our flexible tactile arrays store the rich haptic information humans perceive as dense tactile signals. - Portability and robustness are key for in-the-wild data collection; our portable gripper is compact, lightweight, and durable. - Touch provides precise, robust measurements of in-hand object pose, invariant to lighting and viewpoint. - Cross-modal pretraining on large-scale in-the-wild data significantly improves policy robustness and sample efficiency (as shown many times before — and verified again here!). Also check out our previous investigations of dense, flexible tactile grids for understanding human-robot-environment interactions: - Dense tactile glove (Nature ’19): - 3D-ViTac (CoRL ’24):

Yunzhu Li

13,188 просмотров • 11 месяцев назад