正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

How can robots learn generalizable manipulation skills for diverse objects? Going beyond pick-and-place, our recent work “HACMan” enables complex interactions for unseen objects, such as flipping, pushing, or tilting, using spatial action maps + RL with point clouds. (w/ @MetaAI)

Wenxuan Zhou

3,112 subscribers

49,846 次观看 • 3 年前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 条评论

Wenxuan Zhou 的头像

Wenxuan Zhou3 年前

We find that defining the right action space is crucial for learning a manipulation task. We explore an object-centric action representation in RL that consists of selecting a contact location on the object and a set of parameters describing the robot's movement after contact.

Wenxuan Zhou 的头像

Wenxuan Zhou3 年前

Our object-centric action representation has two benefits. It is… 1. Spatially-grounded: because the learned contact location is selected from the observed object points. 2. Temporally-abstracted: because we focus only on learning the contact-rich portions of the action.

Wenxuan Zhou 的头像

Wenxuan Zhou3 年前

With off-policy RL, given a point cloud, the actor outputs per-point motion parameters (Actor Map) while the critic outputs per-point Q-values (Critic Map). The Critic Map is not only used to update the actor but also serves as the scores for selecting the contact location.

Wenxuan Zhou 的头像

Wenxuan Zhou3 年前

We evaluate our method with a 6D object pose alignment task with randomized initial poses, randomized 6D goals, and diverse unseen objects in both simulation and in the real world.

Wenxuan Zhou 的头像

Wenxuan Zhou3 年前

HACMan outperforms the baselines, with a larger margin for more challenging tasks. Success rates for simple tasks - pushing a single object to an in-plane goal - are high for all methods, but only HACMan achieves high success rates for 6D alignment of diverse objects.

Wenxuan Zhou 的头像

Wenxuan Zhou3 年前

Check out the paper and the website for more information and video results showing HACMan generalizing to different objects and goals! w/@bwww08, Fan Yang, @chris_j_paxton, @davheld

Brett Adcock 的头像

Brett Adcock3 年前

@MetaAI Congrats, thanks for sharing.

Arnav Wadhwa 的头像

Arnav Wadhwa3 年前

@MetaAI Amazing work! I’m wondering about the challenges/improvements tradeoff when using a human-hand like end effector with 5 fingers. Curious to know what you think

Wenxuan Zhou 的头像

Wenxuan Zhou3 年前

@MetaAI Multi-fingered hands may allow a wider variety of motions and have more tolerance (picking an object with a multi-fingered hand can be less sensitive to object shapes than a simple gripper). However, they are more expensive, easier to break, and have a bigger sim2real gap.

Sasha Salter 的头像

Sasha Salter2 年前

@MetaAI Great use of temporal abstraction to simplify learning!

相关视频

How to learn dexterous manipulation for any robot hand from a single human demonstration? Check out DexMachina, our new RL algorithm that learns long-horizon, bimanual dexterous policies for a variety of dexterous hands, articulated objects, and complex motions.

How to learn dexterous manipulation for any robot hand from a single human demonstration? Check out DexMachina, our new RL algorithm that learns long-horizon, bimanual dexterous policies for a variety of dexterous hands, articulated objects, and complex motions.

Mandi Zhao

120,257 次观看 • 1 年前

🚀🚀🚀 Ever wondered what it takes for robots to handle real-world household tasks? long-horizon execution, deformable object dexterity, and unseen object generalization — meet GR-3, ByteDance Seed’s new Vision-Language-Action (VLA) model! GR-3 is a generalizable Vision-Language-Action (VLA) model with strong capabilities in complex long-horizon tasks. It understands unseen abstract concepts, manipulates deformable objects robustly, and adapts to novel settings with minimal human data. ✨ Generalization: Generalizes well to unseen objects, environments, and even instructions with abstract concepts. ✨ Long-Horizon Manipulation: Completes long-horizon tasks with strong instruction-following capabilities. ✨ Deformable Object Manipulation: Manipulate deformable objects robustly. Project Page: Arxiv: #ByteDance #ByteDanceSeed #GR3 #VLA #Robotics #FoundationModels

🚀🚀🚀 Ever wondered what it takes for robots to handle real-world household tasks? long-horizon execution, deformable object dexterity, and unseen object generalization — meet GR-3, ByteDance Seed’s new Vision-Language-Action (VLA) model! GR-3 is a generalizable Vision-Language-Action (VLA) model with strong capabilities in complex long-horizon tasks. It understands unseen abstract concepts, manipulates deformable objects robustly, and adapts to novel settings with minimal human data. ✨ Generalization: Generalizes well to unseen objects, environments, and even instructions with abstract concepts. ✨ Long-Horizon Manipulation: Completes long-horizon tasks with strong instruction-following capabilities. ✨ Deformable Object Manipulation: Manipulate deformable objects robustly. Project Page: Arxiv: #ByteDance #ByteDanceSeed #GR3 #VLA #Robotics #FoundationModels

Xiao Ma

46,260 次观看 • 11 个月前

In just ~3 months, as a solo founder with no prior robotics experience, General Trajectory trained a foundation model for dexterous manipulation that lets humanoid robots pick up unseen objects and perform real-world work. It generalizes to novel objects and scenes, including cases where prior SoTA models achieve 0% success. Congrats on the launch Joshua!

In just ~3 months, as a solo founder with no prior robotics experience, General Trajectory trained a foundation model for dexterous manipulation that lets humanoid robots pick up unseen objects and perform real-world work. It generalizes to novel objects and scenes, including cases where prior SoTA models achieve 0% success. Congrats on the launch Joshua!

Y Combinator

68,512 次观看 • 5 个月前

DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects abs: paper page:

DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects abs: paper page:

AK

17,532 次观看 • 3 年前

A paper in Nature Communications describes a detachable robotic hand that can crawl and grab objects. The design enables tasks such as retrieving objects beyond normal reach and performing multi-object handling.

A paper in Nature Communications describes a detachable robotic hand that can crawl and grab objects. The design enables tasks such as retrieving objects beyond normal reach and performing multi-object handling.

Nature Portfolio

28,358 次观看 • 4 个月前

Where can linearized dynamics be trusted w/contact? MIT's "Contact Trust Region" answers this using manipulation theory, differentiable simulation & control. It enables manipulation w/less compute than RL, powering robots to be more energy-efficient:

Where can linearized dynamics be trusted w/contact? MIT's "Contact Trust Region" answers this using manipulation theory, differentiable simulation & control. It enables manipulation w/less compute than RL, powering robots to be more energy-efficient:

MIT CSAIL

20,180 次观看 • 2 个月前

Drop the camera, and bang -- the robots do the job without camera calibration or additional data collection. We want robots to manipulate anywhere! Our new work, **Maniwhere**, enables robots to manipulate objects from any camera view with any background. Isn't this the generalization every roboticist is looking for? How do we achieve this with sim2real? You can check this thread from Zhecheng Yuan ! And the website is here:

Drop the camera, and bang -- the robots do the job without camera calibration or additional data collection. We want robots to manipulate anywhere! Our new work, Maniwhere, enables robots to manipulate objects from any camera view with any background. Isn't this the generalization every roboticist is looking for? How do we achieve this with sim2real? You can check this thread from Zhecheng Yuan ! And the website is here:

Huazhe Harry Xu

26,214 次观看 • 1 年前

Training a Whole-Body Control Foundation Model -- new work from my team at Agility Robotics A neural network for controlling our humanoid robots which is robust to disturbances, can handle heavy objects, and is a powerful platform for learning new whole-body skills learn more in our blog post ->

Training a Whole-Body Control Foundation Model -- new work from my team at Agility Robotics A neural network for controlling our humanoid robots which is robust to disturbances, can handle heavy objects, and is a powerful platform for learning new whole-body skills learn more in our blog post ->

Chris Paxton

45,068 次观看 • 9 个月前

Robots need to be able to apply pressure and make contact with objects as needed in order to accomplish their tasks. From compliance to working safely around humans to whole-body manipulation of heavy objects, combining force and position control can dramatically expand the capabilities of robots. This is especially true for legged robots, which have so much ability to exert forces on the world around them. But how do we train robots which can do this? Baoxiong Jia tells us more in our discussion of his team’s recent, Best Paper Award winning work on learning a unified policy for position and force control, called UniFP. To learn more, watch Episode #49 of RoboPapers, hosted by Michael Cho - Rbt/Acc and Chris Paxton.

Robots need to be able to apply pressure and make contact with objects as needed in order to accomplish their tasks. From compliance to working safely around humans to whole-body manipulation of heavy objects, combining force and position control can dramatically expand the capabilities of robots. This is especially true for legged robots, which have so much ability to exert forces on the world around them. But how do we train robots which can do this? Baoxiong Jia tells us more in our discussion of his team’s recent, Best Paper Award winning work on learning a unified policy for position and force control, called UniFP. To learn more, watch Episode #49 of RoboPapers, hosted by Michael Cho - Rbt/Acc and Chris Paxton.

RoboPapers

44,774 次观看 • 6 个月前

Our CoRL 2024 paper shows Reinforcement Learning can allow robots to learn skills via real-world practice, without any demonstrations or simulation engineering. Rewards are provided using language/vision models, and mobility of robots enables autonomous exploration. 1/N

Our CoRL 2024 paper shows Reinforcement Learning can allow robots to learn skills via real-world practice, without any demonstrations or simulation engineering. Rewards are provided using language/vision models, and mobility of robots enables autonomous exploration. 1/N

Russell Mendonca

38,454 次观看 • 1 年前

Current robot policies often face a tradeoff: they're either precise (but brittle) or generalizable (but imprecise). We present ViTaL, a framework that lets robots generalize precise, contact-rich manipulation skills across unseen environments with millimeter-level precision. 🧵

Current robot policies often face a tradeoff: they're either precise (but brittle) or generalizable (but imprecise). We present ViTaL, a framework that lets robots generalize precise, contact-rich manipulation skills across unseen environments with millimeter-level precision. 🧵

Siddhant Haldar

145,945 次观看 • 11 个月前

We have seen a lot of legged robots doing navigation in the wild. But how about mobile manipulation in the wild? I have been pushing the direction of learning a unified, efficient, and dynamic 3D representation of scenes (for navigation) and objects (for manipulation) for the past two years. And now we have GeFF --- our large-scale, generalizable feature field, that combines the speed of a feed-forward neural network with the rich semantics from Foundation Models, to handle dynamically changing scenes, and enable open-ended, language-grounded scene and object understanding.

We have seen a lot of legged robots doing navigation in the wild. But how about mobile manipulation in the wild? I have been pushing the direction of learning a unified, efficient, and dynamic 3D representation of scenes (for navigation) and objects (for manipulation) for the past two years. And now we have GeFF --- our large-scale, generalizable feature field, that combines the speed of a feed-forward neural network with the rich semantics from Foundation Models, to handle dynamically changing scenes, and enable open-ended, language-grounded scene and object understanding.

Xiaolong Wang

42,767 次观看 • 2 年前

Introduce HumanPlus - Autonomous Skills part Humanoids are born for using human data. Imitating humans, our humanoid learns: - fold sweatshirts - unload objects from warehouse racks - diverse locomotion skills (squatting, jumping, standing) - greet another robot Open-sourced!

Introduce HumanPlus - Autonomous Skills part Humanoids are born for using human data. Imitating humans, our humanoid learns: - fold sweatshirts - unload objects from warehouse racks - diverse locomotion skills (squatting, jumping, standing) - greet another robot Open-sourced!

Zipeng Fu

158,140 次观看 • 2 年前

Its been doing this for like two hours. RL whole body mobile manipulation, objects bought on Monday, never seen before

Its been doing this for like two hours. RL whole body mobile manipulation, objects bought on Monday, never seen before

Chris Paxton

43,535 次观看 • 1 年前

The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us? Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.

The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us? Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.

Siddhant Haldar

69,055 次观看 • 1 年前

Can we bring human-like Touch to robots🤖? Introducing our CoRL work on 3D-ViTac. Humans rely on both vision 👁️ and touch 🫳 for complex tasks. With combined visual-tactile sensing, robots can now tackle challenging tasks, like precise in-hand reorientation, fragile objects grasping. Website: #Robotics #CoRL2024 #Touch #tactile #AI #ML

Can we bring human-like Touch to robots🤖? Introducing our CoRL work on 3D-ViTac. Humans rely on both vision 👁️ and touch 🫳 for complex tasks. With combined visual-tactile sensing, robots can now tackle challenging tasks, like precise in-hand reorientation, fragile objects grasping. Website: #Robotics #CoRL2024 #Touch #tactile #AI #ML

Binghao Huang

49,500 次观看 • 1 年前

Introducing Omnigrasp: Grasping Diverse Objects with Simulated Humanoids. With Omnigrasp, we show that we can control a humanoid equipped with dexterous hands to grasp diverse objects (>1200) and follow diverse trajectories, with one policy! 🌐: 📜:

Introducing Omnigrasp: Grasping Diverse Objects with Simulated Humanoids. With Omnigrasp, we show that we can control a humanoid equipped with dexterous hands to grasp diverse objects (>1200) and follow diverse trajectories, with one policy! 🌐: 📜:

Zhengyi “Zen” Luo

92,104 次观看 • 1 年前

Five years ago, OpenAI trained a 5-fingered humanoid hand to solve a Rubik’s Cube using RL and domain randomization, pushing the boundaries of sim-to-real for fine-motor tasks. It underscored the value of creating sufficiently complex simulated worlds in which robots can learn.

Five years ago, OpenAI trained a 5-fingered humanoid hand to solve a Rubik’s Cube using RL and domain randomization, pushing the boundaries of sim-to-real for fine-motor tasks. It underscored the value of creating sufficiently complex simulated worlds in which robots can learn.

The Humanoid Hub

66,761 次观看 • 1 年前

Work smarter with Clip Studio Paint's 3D models 💪 You can also use them as references for light sources, or to see how different objects bend and move. Have you tried using a 3D model in your artwork? Thanks for letting us share this, @Shei_babu!

Work smarter with Clip Studio Paint's 3D models 💪 You can also use them as references for light sources, or to see how different objects bend and move. Have you tried using a 3D model in your artwork? Thanks for letting us share this, @Shei_babu!

CLIP STUDIO PAINT

15,656 次观看 • 1 年前

Can VLMs build Spatial Mental Models like humans? Reasoning from limited views? Reasoning from partial observations? Reasoning about unseen objects behind furniture / beyond current view? Check out MindCube! 🌐 📰 🤗 👩‍💻

Can VLMs build Spatial Mental Models like humans? Reasoning from limited views? Reasoning from partial observations? Reasoning about unseen objects behind furniture / beyond current view? Check out MindCube! 🌐 📰 🤗 👩‍💻

Manling Li

40,959 次观看 • 11 个月前