Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Humans grasp objects with a purpose! Web2Grasp enables such functional grasping for dexterous robot hands via hand-object reconstruction from web images - without *any* robot teleop data collection 1/n

28,408 görüntüleme • 1 yıl önce •via X (Twitter)

4 Yorum

Homanga @ CVPR profil fotoğrafı
Homanga @ CVPR1 yıl önce

To train a functional grasp prediction policy: -> we create a dataset of robot-object grasps by re-targeting human hand-object interactions from web images followed by object mesh refinement with a text-to-3D model -> perform imitation learning on the resulting dataset! 2/n

Homanga @ CVPR profil fotoğrafı
Homanga @ CVPR1 yıl önce

Web2Grasp was led by amazing @CMU_Robotics collaborators @chen_hongyi_ and @YaoYunchao Check out Hongyi's thread below and the website for more details n/n

AirFranz profil fotoğrafı
AirFranz1 yıl önce

Hey Homanga! Great work. Just DM’ed you. Franz

Craig ⚔️ profil fotoğrafı
Craig ⚔️1 yıl önce

Is this 1994 tech?

Benzer Videolar

We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (R² = 0.998) between human video volume and action prediction loss, and this loss directly predicts real-robot success rate. Humanoid robots will be the end game, because they are the practical form factor with minimal embodiment gap from humans. Call it the Bitter Lesson of robot hardware: the kinematic similarity lets us simply retarget human finger motion onto dexterous robot hand joints. No learned embeddings, no fancy transfer algorithms needed. Relative wrist motion + retargeted 22-DoF finger actions serve as a unified action space that carries through from pre-training to robot execution. Our recipe is called "EgoScale": - Pre-train GR00T N1.5 on 20K hours of human video, mid-train with only 4 hours (!) of robot play data with Sharpa hands. 54% gains over training from scratch across 5 highly dexterous tasks. - Most surprising result: a *single* teleop demo is sufficient to learn a never-before-seen task. Our recipe enables extreme data efficiency. - Although we pre-train in 22-DoF hand joint space, the policy transfers to a Unitree G1 with 7-DoF tri-finger hands. 30%+ gains over training on G1 data alone. The scalable path to robot dexterity was never more robots. It was always us. Deep dives in thread:

Jim Fan

291,303 görüntüleme • 3 ay önce