Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

How can robots learn generalizable manipulation skills for diverse objects? Going beyond pick-and-place, our recent work “HACMan” enables complex interactions for unseen objects, such as flipping, pushing, or tilting, using spatial action maps + RL with point clouds. (w/ @MetaAI)

Wenxuan Zhou

3,112 subscribers

49,846 Aufrufe • vor 3 Jahren •via X (Twitter)

Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Wenxuan Zhou

Wenxuan Zhouvor 3 Jahren

We find that defining the right action space is crucial for learning a manipulation task. We explore an object-centric action representation in RL that consists of selecting a contact location on the object and a set of parameters describing the robot's movement after contact.

Profilbild von Wenxuan Zhou

Wenxuan Zhouvor 3 Jahren

Our object-centric action representation has two benefits. It is… 1. Spatially-grounded: because the learned contact location is selected from the observed object points. 2. Temporally-abstracted: because we focus only on learning the contact-rich portions of the action.

Profilbild von Wenxuan Zhou

Wenxuan Zhouvor 3 Jahren

With off-policy RL, given a point cloud, the actor outputs per-point motion parameters (Actor Map) while the critic outputs per-point Q-values (Critic Map). The Critic Map is not only used to update the actor but also serves as the scores for selecting the contact location.

Profilbild von Wenxuan Zhou

Wenxuan Zhouvor 3 Jahren

We evaluate our method with a 6D object pose alignment task with randomized initial poses, randomized 6D goals, and diverse unseen objects in both simulation and in the real world.

Profilbild von Wenxuan Zhou

Wenxuan Zhouvor 3 Jahren

HACMan outperforms the baselines, with a larger margin for more challenging tasks. Success rates for simple tasks - pushing a single object to an in-plane goal - are high for all methods, but only HACMan achieves high success rates for 6D alignment of diverse objects.

Profilbild von Wenxuan Zhou

Wenxuan Zhouvor 3 Jahren

Check out the paper and the website for more information and video results showing HACMan generalizing to different objects and goals! w/@bwww08, Fan Yang, @chris_j_paxton, @davheld

Profilbild von Brett Adcock

Brett Adcockvor 3 Jahren

@MetaAI Congrats, thanks for sharing.

Profilbild von Arnav Wadhwa

Arnav Wadhwavor 3 Jahren

@MetaAI Amazing work! I’m wondering about the challenges/improvements tradeoff when using a human-hand like end effector with 5 fingers. Curious to know what you think

Profilbild von Wenxuan Zhou

Wenxuan Zhouvor 3 Jahren

@MetaAI Multi-fingered hands may allow a wider variety of motions and have more tolerance (picking an object with a multi-fingered hand can be less sensitive to object shapes than a simple gripper). However, they are more expensive, easier to break, and have a bigger sim2real gap.

Profilbild von Sasha Salter

Sasha Saltervor 2 Jahren

@MetaAI Great use of temporal abstraction to simplify learning!

Ähnliche Videos

How to learn dexterous manipulation for any robot hand from a single human demonstration? Check out DexMachina, our new RL algorithm that learns long-horizon, bimanual dexterous policies for a variety of dexterous hands, articulated objects, and complex motions.

How to learn dexterous manipulation for any robot hand from a single human demonstration? Check out DexMachina, our new RL algorithm that learns long-horizon, bimanual dexterous policies for a variety of dexterous hands, articulated objects, and complex motions.

Mandi Zhao

120,188 Aufrufe • vor 1 Jahr

🚀🚀🚀 Ever wondered what it takes for robots to handle real-world household tasks? long-horizon execution, deformable object dexterity, and unseen object generalization — meet GR-3, ByteDance Seed’s new Vision-Language-Action (VLA) model! GR-3 is a generalizable Vision-Language-Action (VLA) model with strong capabilities in complex long-horizon tasks. It understands unseen abstract concepts, manipulates deformable objects robustly, and adapts to novel settings with minimal human data. ✨ Generalization: Generalizes well to unseen objects, environments, and even instructions with abstract concepts. ✨ Long-Horizon Manipulation: Completes long-horizon tasks with strong instruction-following capabilities. ✨ Deformable Object Manipulation: Manipulate deformable objects robustly. Project Page: Arxiv: #ByteDance #ByteDanceSeed #GR3 #VLA #Robotics #FoundationModels

🚀🚀🚀 Ever wondered what it takes for robots to handle real-world household tasks? long-horizon execution, deformable object dexterity, and unseen object generalization — meet GR-3, ByteDance Seed’s new Vision-Language-Action (VLA) model! GR-3 is a generalizable Vision-Language-Action (VLA) model with strong capabilities in complex long-horizon tasks. It understands unseen abstract concepts, manipulates deformable objects robustly, and adapts to novel settings with minimal human data. ✨ Generalization: Generalizes well to unseen objects, environments, and even instructions with abstract concepts. ✨ Long-Horizon Manipulation: Completes long-horizon tasks with strong instruction-following capabilities. ✨ Deformable Object Manipulation: Manipulate deformable objects robustly. Project Page: Arxiv: #ByteDance #ByteDanceSeed #GR3 #VLA #Robotics #FoundationModels

Xiao Ma

46,260 Aufrufe • vor 10 Monaten

In just ~3 months, as a solo founder with no prior robotics experience, General Trajectory trained a foundation model for dexterous manipulation that lets humanoid robots pick up unseen objects and perform real-world work. It generalizes to novel objects and scenes, including cases where prior SoTA models achieve 0% success. Congrats on the launch Joshua!

In just ~3 months, as a solo founder with no prior robotics experience, General Trajectory trained a foundation model for dexterous manipulation that lets humanoid robots pick up unseen objects and perform real-world work. It generalizes to novel objects and scenes, including cases where prior SoTA models achieve 0% success. Congrats on the launch Joshua!

Y Combinator

68,505 Aufrufe • vor 5 Monaten

DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects abs: paper page:

DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects abs: paper page:

AK

17,532 Aufrufe • vor 3 Jahren

A paper in Nature Communications describes a detachable robotic hand that can crawl and grab objects. The design enables tasks such as retrieving objects beyond normal reach and performing multi-object handling.

A paper in Nature Communications describes a detachable robotic hand that can crawl and grab objects. The design enables tasks such as retrieving objects beyond normal reach and performing multi-object handling.

Nature Portfolio

28,358 Aufrufe • vor 4 Monaten

Where can linearized dynamics be trusted w/contact? MIT's "Contact Trust Region" answers this using manipulation theory, differentiable simulation & control. It enables manipulation w/less compute than RL, powering robots to be more energy-efficient:

Where can linearized dynamics be trusted w/contact? MIT's "Contact Trust Region" answers this using manipulation theory, differentiable simulation & control. It enables manipulation w/less compute than RL, powering robots to be more energy-efficient:

MIT CSAIL

20,180 Aufrufe • vor 2 Monaten

Training a Whole-Body Control Foundation Model -- new work from my team at Agility Robotics A neural network for controlling our humanoid robots which is robust to disturbances, can handle heavy objects, and is a powerful platform for learning new whole-body skills learn more in our blog post ->

Training a Whole-Body Control Foundation Model -- new work from my team at Agility Robotics A neural network for controlling our humanoid robots which is robust to disturbances, can handle heavy objects, and is a powerful platform for learning new whole-body skills learn more in our blog post ->

Chris Paxton

45,068 Aufrufe • vor 9 Monaten

Drop the camera, and bang -- the robots do the job without camera calibration or additional data collection. We want robots to manipulate anywhere! Our new work, **Maniwhere**, enables robots to manipulate objects from any camera view with any background. Isn't this the generalization every roboticist is looking for? How do we achieve this with sim2real? You can check this thread from Zhecheng Yuan ! And the website is here:

Drop the camera, and bang -- the robots do the job without camera calibration or additional data collection. We want robots to manipulate anywhere! Our new work, Maniwhere, enables robots to manipulate objects from any camera view with any background. Isn't this the generalization every roboticist is looking for? How do we achieve this with sim2real? You can check this thread from Zhecheng Yuan ! And the website is here:

Huazhe Harry Xu

26,214 Aufrufe • vor 1 Jahr

Robots need to be able to apply pressure and make contact with objects as needed in order to accomplish their tasks. From compliance to working safely around humans to whole-body manipulation of heavy objects, combining force and position control can dramatically expand the capabilities of robots. This is especially true for legged robots, which have so much ability to exert forces on the world around them. But how do we train robots which can do this? Baoxiong Jia tells us more in our discussion of his team’s recent, Best Paper Award winning work on learning a unified policy for position and force control, called UniFP. To learn more, watch Episode #49 of RoboPapers, hosted by Michael Cho - Rbt/Acc and Chris Paxton.

Robots need to be able to apply pressure and make contact with objects as needed in order to accomplish their tasks. From compliance to working safely around humans to whole-body manipulation of heavy objects, combining force and position control can dramatically expand the capabilities of robots. This is especially true for legged robots, which have so much ability to exert forces on the world around them. But how do we train robots which can do this? Baoxiong Jia tells us more in our discussion of his team’s recent, Best Paper Award winning work on learning a unified policy for position and force control, called UniFP. To learn more, watch Episode #49 of RoboPapers, hosted by Michael Cho - Rbt/Acc and Chris Paxton.

RoboPapers

44,774 Aufrufe • vor 6 Monaten

Our CoRL 2024 paper shows Reinforcement Learning can allow robots to learn skills via real-world practice, without any demonstrations or simulation engineering. Rewards are provided using language/vision models, and mobility of robots enables autonomous exploration. 1/N

Our CoRL 2024 paper shows Reinforcement Learning can allow robots to learn skills via real-world practice, without any demonstrations or simulation engineering. Rewards are provided using language/vision models, and mobility of robots enables autonomous exploration. 1/N

Russell Mendonca

38,454 Aufrufe • vor 1 Jahr

Current robot policies often face a tradeoff: they're either precise (but brittle) or generalizable (but imprecise). We present ViTaL, a framework that lets robots generalize precise, contact-rich manipulation skills across unseen environments with millimeter-level precision. 🧵

Current robot policies often face a tradeoff: they're either precise (but brittle) or generalizable (but imprecise). We present ViTaL, a framework that lets robots generalize precise, contact-rich manipulation skills across unseen environments with millimeter-level precision. 🧵

Siddhant Haldar

145,945 Aufrufe • vor 11 Monaten

We have seen a lot of legged robots doing navigation in the wild. But how about mobile manipulation in the wild? I have been pushing the direction of learning a unified, efficient, and dynamic 3D representation of scenes (for navigation) and objects (for manipulation) for the past two years. And now we have GeFF --- our large-scale, generalizable feature field, that combines the speed of a feed-forward neural network with the rich semantics from Foundation Models, to handle dynamically changing scenes, and enable open-ended, language-grounded scene and object understanding.

We have seen a lot of legged robots doing navigation in the wild. But how about mobile manipulation in the wild? I have been pushing the direction of learning a unified, efficient, and dynamic 3D representation of scenes (for navigation) and objects (for manipulation) for the past two years. And now we have GeFF --- our large-scale, generalizable feature field, that combines the speed of a feed-forward neural network with the rich semantics from Foundation Models, to handle dynamically changing scenes, and enable open-ended, language-grounded scene and object understanding.

Xiaolong Wang

42,767 Aufrufe • vor 2 Jahren

Introduce HumanPlus - Autonomous Skills part Humanoids are born for using human data. Imitating humans, our humanoid learns: - fold sweatshirts - unload objects from warehouse racks - diverse locomotion skills (squatting, jumping, standing) - greet another robot Open-sourced!

Introduce HumanPlus - Autonomous Skills part Humanoids are born for using human data. Imitating humans, our humanoid learns: - fold sweatshirts - unload objects from warehouse racks - diverse locomotion skills (squatting, jumping, standing) - greet another robot Open-sourced!

Zipeng Fu

158,140 Aufrufe • vor 2 Jahren

Its been doing this for like two hours. RL whole body mobile manipulation, objects bought on Monday, never seen before

Its been doing this for like two hours. RL whole body mobile manipulation, objects bought on Monday, never seen before

Chris Paxton

43,535 Aufrufe • vor 1 Jahr

The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us? Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.

The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us? Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.

Siddhant Haldar

69,031 Aufrufe • vor 1 Jahr

Can we bring human-like Touch to robots🤖? Introducing our CoRL work on 3D-ViTac. Humans rely on both vision 👁️ and touch 🫳 for complex tasks. With combined visual-tactile sensing, robots can now tackle challenging tasks, like precise in-hand reorientation, fragile objects grasping. Website: #Robotics #CoRL2024 #Touch #tactile #AI #ML

Can we bring human-like Touch to robots🤖? Introducing our CoRL work on 3D-ViTac. Humans rely on both vision 👁️ and touch 🫳 for complex tasks. With combined visual-tactile sensing, robots can now tackle challenging tasks, like precise in-hand reorientation, fragile objects grasping. Website: #Robotics #CoRL2024 #Touch #tactile #AI #ML

Binghao Huang

49,500 Aufrufe • vor 1 Jahr

Introducing Omnigrasp: Grasping Diverse Objects with Simulated Humanoids. With Omnigrasp, we show that we can control a humanoid equipped with dexterous hands to grasp diverse objects (>1200) and follow diverse trajectories, with one policy! 🌐: 📜:

Introducing Omnigrasp: Grasping Diverse Objects with Simulated Humanoids. With Omnigrasp, we show that we can control a humanoid equipped with dexterous hands to grasp diverse objects (>1200) and follow diverse trajectories, with one policy! 🌐: 📜:

Zhengyi “Zen” Luo

92,104 Aufrufe • vor 1 Jahr

Five years ago, OpenAI trained a 5-fingered humanoid hand to solve a Rubik’s Cube using RL and domain randomization, pushing the boundaries of sim-to-real for fine-motor tasks. It underscored the value of creating sufficiently complex simulated worlds in which robots can learn.

Five years ago, OpenAI trained a 5-fingered humanoid hand to solve a Rubik’s Cube using RL and domain randomization, pushing the boundaries of sim-to-real for fine-motor tasks. It underscored the value of creating sufficiently complex simulated worlds in which robots can learn.

The Humanoid Hub

66,761 Aufrufe • vor 1 Jahr

Work smarter with Clip Studio Paint's 3D models 💪 You can also use them as references for light sources, or to see how different objects bend and move. Have you tried using a 3D model in your artwork? Thanks for letting us share this, @Shei_babu!

Work smarter with Clip Studio Paint's 3D models 💪 You can also use them as references for light sources, or to see how different objects bend and move. Have you tried using a 3D model in your artwork? Thanks for letting us share this, @Shei_babu!

CLIP STUDIO PAINT

15,656 Aufrufe • vor 1 Jahr

Can VLMs build Spatial Mental Models like humans? Reasoning from limited views? Reasoning from partial observations? Reasoning about unseen objects behind furniture / beyond current view? Check out MindCube! 🌐 📰 🤗 👩‍💻

Can VLMs build Spatial Mental Models like humans? Reasoning from limited views? Reasoning from partial observations? Reasoning about unseen objects behind furniture / beyond current view? Check out MindCube! 🌐 📰 🤗 👩‍💻

Manling Li

40,959 Aufrufe • vor 11 Monaten