正在加载视频...

视频加载失败

Can we collect robot data without any robots? Introducing Universal Manipulation Interface (UMI) An open-source $400 system from Stanford University designed to democratize robot data collection 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)

438,741 次观看 • 2 年前 •via X (Twitter)

11 条评论

Cheng Chi 的头像
Cheng Chi2 年前

With UMI, you can go to any home, any restaurant and start data collection within 2 minutes. With a diverse in-the-wild cup manipulation dataset, we can train a diffusion policy that generalizes to the top of a water fountain – clearly unseen environments and objects. 2/9

Cheng Chi 的头像
Cheng Chi2 年前

UMI data is robot agnostic. Here we can deploy the same policy on both UR5e and Franka robots. In fact, you can deploy it on any robot with a parallel jaw stroke > 85mm. 3/9

Cheng Chi 的头像
Cheng Chi2 年前

Enabled by our unique wrist-only camera configuration and camera-centric action representation, our robot systems are calibration-free (works even with base movement) and robust against distractors and lighting changes. 4/9

Cheng Chi 的头像
Cheng Chi2 年前

Please check out our website for code, CAD models, tutorials and even more videos! 5/9

Cheng Chi 的头像
Cheng Chi2 年前

Please also check out our epic fails compilation! We achieve a 70-90% success rate on most tasks, which still doesn’t hit the bar for commercial deployment. However, we think getting a larger in-the-wild dataset will get us a lot closer! 6/9

Cheng Chi 的头像
Cheng Chi2 年前

This project would have been impossible without the hard work from co-authors: @Zhenjia_Xu @chuer_pan @eacousineau @Ben_Burchfiel Siyuan Feng @RussTedrake @SongShuran 7/9

Cheng Chi 的头像
Cheng Chi2 年前

It was a blast working with @tonyzzhao and @zipengfu in the Stanford Robotic Center! 8/9

Cheng Chi 的头像
Cheng Chi2 年前

technologies: GPMF, QR control, Voice control, media mod, max lens … Has been indispensable for this project. Shout out to @David_Newman who personally responded to my questions related to timecodes, which is critical for bimanual UMI. 9/9

Advait 的头像
Advait2 年前

@Stanford really cool! reminds me of this - will have to dive into the paper

Keerthana Gopalakrishnan 的头像
Keerthana Gopalakrishnan2 年前

@Stanford I love this but do you think wrist cam only view point is enough?

Cheng Chi 的头像
Cheng Chi2 年前

@Stanford I think wrist fisheye cams are sufficient for a surprisingly wide range of tasks. I do think there are tasks that could benefit from more views. For those cases, UMI data pipeline supports unlimited number of non-gripper GoPros (e.g. head mounted)

相关视频