Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

How can robots learn generalizable manipulation skills for diverse objects? Going beyond pick-and-place, our recent work “HACMan” enables complex interactions for unseen objects, such as flipping, pushing, or tilting, using spatial action maps + RL with point clouds. (w/ @MetaAI)

Wenxuan Zhou

3,112 subscribers

49,846 просмотров • 3 лет назад •via X (Twitter)

Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 10

Фото профиля Wenxuan Zhou

Wenxuan Zhou3 лет назад

We find that defining the right action space is crucial for learning a manipulation task. We explore an object-centric action representation in RL that consists of selecting a contact location on the object and a set of parameters describing the robot's movement after contact.

Фото профиля Wenxuan Zhou

Wenxuan Zhou3 лет назад

Our object-centric action representation has two benefits. It is… 1. Spatially-grounded: because the learned contact location is selected from the observed object points. 2. Temporally-abstracted: because we focus only on learning the contact-rich portions of the action.

Фото профиля Wenxuan Zhou

Wenxuan Zhou3 лет назад

With off-policy RL, given a point cloud, the actor outputs per-point motion parameters (Actor Map) while the critic outputs per-point Q-values (Critic Map). The Critic Map is not only used to update the actor but also serves as the scores for selecting the contact location.

Фото профиля Wenxuan Zhou

Wenxuan Zhou3 лет назад

We evaluate our method with a 6D object pose alignment task with randomized initial poses, randomized 6D goals, and diverse unseen objects in both simulation and in the real world.

Фото профиля Wenxuan Zhou

Wenxuan Zhou3 лет назад

HACMan outperforms the baselines, with a larger margin for more challenging tasks. Success rates for simple tasks - pushing a single object to an in-plane goal - are high for all methods, but only HACMan achieves high success rates for 6D alignment of diverse objects.

Фото профиля Wenxuan Zhou

Wenxuan Zhou3 лет назад

Check out the paper and the website for more information and video results showing HACMan generalizing to different objects and goals! w/@bwww08, Fan Yang, @chris_j_paxton, @davheld

Фото профиля Brett Adcock

Brett Adcock3 лет назад

@MetaAI Congrats, thanks for sharing.

Фото профиля Arnav Wadhwa

Arnav Wadhwa3 лет назад

@MetaAI Amazing work! I’m wondering about the challenges/improvements tradeoff when using a human-hand like end effector with 5 fingers. Curious to know what you think

Фото профиля Wenxuan Zhou

Wenxuan Zhou3 лет назад

@MetaAI Multi-fingered hands may allow a wider variety of motions and have more tolerance (picking an object with a multi-fingered hand can be less sensitive to object shapes than a simple gripper). However, they are more expensive, easier to break, and have a bigger sim2real gap.

Фото профиля Sasha Salter

Sasha Salter2 лет назад

@MetaAI Great use of temporal abstraction to simplify learning!

Похожие видео

How to learn dexterous manipulation for any robot hand from a single human demonstration? Check out DexMachina, our new RL algorithm that learns long-horizon, bimanual dexterous policies for a variety of dexterous hands, articulated objects, and complex motions.

How to learn dexterous manipulation for any robot hand from a single human demonstration? Check out DexMachina, our new RL algorithm that learns long-horizon, bimanual dexterous policies for a variety of dexterous hands, articulated objects, and complex motions.

Mandi Zhao

120,226 просмотров • 1 год назад

🚀🚀🚀 Ever wondered what it takes for robots to handle real-world household tasks? long-horizon execution, deformable object dexterity, and unseen object generalization — meet GR-3, ByteDance Seed’s new Vision-Language-Action (VLA) model! GR-3 is a generalizable Vision-Language-Action (VLA) model with strong capabilities in complex long-horizon tasks. It understands unseen abstract concepts, manipulates deformable objects robustly, and adapts to novel settings with minimal human data. ✨ Generalization: Generalizes well to unseen objects, environments, and even instructions with abstract concepts. ✨ Long-Horizon Manipulation: Completes long-horizon tasks with strong instruction-following capabilities. ✨ Deformable Object Manipulation: Manipulate deformable objects robustly. Project Page: Arxiv: #ByteDance #ByteDanceSeed #GR3 #VLA #Robotics #FoundationModels

🚀🚀🚀 Ever wondered what it takes for robots to handle real-world household tasks? long-horizon execution, deformable object dexterity, and unseen object generalization — meet GR-3, ByteDance Seed’s new Vision-Language-Action (VLA) model! GR-3 is a generalizable Vision-Language-Action (VLA) model with strong capabilities in complex long-horizon tasks. It understands unseen abstract concepts, manipulates deformable objects robustly, and adapts to novel settings with minimal human data. ✨ Generalization: Generalizes well to unseen objects, environments, and even instructions with abstract concepts. ✨ Long-Horizon Manipulation: Completes long-horizon tasks with strong instruction-following capabilities. ✨ Deformable Object Manipulation: Manipulate deformable objects robustly. Project Page: Arxiv: #ByteDance #ByteDanceSeed #GR3 #VLA #Robotics #FoundationModels

Xiao Ma

46,260 просмотров • 11 месяцев назад

In just ~3 months, as a solo founder with no prior robotics experience, General Trajectory trained a foundation model for dexterous manipulation that lets humanoid robots pick up unseen objects and perform real-world work. It generalizes to novel objects and scenes, including cases where prior SoTA models achieve 0% success. Congrats on the launch Joshua!

In just ~3 months, as a solo founder with no prior robotics experience, General Trajectory trained a foundation model for dexterous manipulation that lets humanoid robots pick up unseen objects and perform real-world work. It generalizes to novel objects and scenes, including cases where prior SoTA models achieve 0% success. Congrats on the launch Joshua!

Y Combinator

68,512 просмотров • 5 месяцев назад

DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects abs: paper page:

DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects abs: paper page:

AK

17,532 просмотров • 3 лет назад

A paper in Nature Communications describes a detachable robotic hand that can crawl and grab objects. The design enables tasks such as retrieving objects beyond normal reach and performing multi-object handling.

A paper in Nature Communications describes a detachable robotic hand that can crawl and grab objects. The design enables tasks such as retrieving objects beyond normal reach and performing multi-object handling.

Nature Portfolio

28,358 просмотров • 4 месяцев назад

Where can linearized dynamics be trusted w/contact? MIT's "Contact Trust Region" answers this using manipulation theory, differentiable simulation & control. It enables manipulation w/less compute than RL, powering robots to be more energy-efficient:

Where can linearized dynamics be trusted w/contact? MIT's "Contact Trust Region" answers this using manipulation theory, differentiable simulation & control. It enables manipulation w/less compute than RL, powering robots to be more energy-efficient:

MIT CSAIL

20,180 просмотров • 2 месяцев назад

Drop the camera, and bang -- the robots do the job without camera calibration or additional data collection. We want robots to manipulate anywhere! Our new work, **Maniwhere**, enables robots to manipulate objects from any camera view with any background. Isn't this the generalization every roboticist is looking for? How do we achieve this with sim2real? You can check this thread from Zhecheng Yuan ! And the website is here:

Drop the camera, and bang -- the robots do the job without camera calibration or additional data collection. We want robots to manipulate anywhere! Our new work, Maniwhere, enables robots to manipulate objects from any camera view with any background. Isn't this the generalization every roboticist is looking for? How do we achieve this with sim2real? You can check this thread from Zhecheng Yuan ! And the website is here:

Huazhe Harry Xu

26,214 просмотров • 1 год назад

Training a Whole-Body Control Foundation Model -- new work from my team at Agility Robotics A neural network for controlling our humanoid robots which is robust to disturbances, can handle heavy objects, and is a powerful platform for learning new whole-body skills learn more in our blog post ->

Training a Whole-Body Control Foundation Model -- new work from my team at Agility Robotics A neural network for controlling our humanoid robots which is robust to disturbances, can handle heavy objects, and is a powerful platform for learning new whole-body skills learn more in our blog post ->

Chris Paxton

45,068 просмотров • 9 месяцев назад

Robots need to be able to apply pressure and make contact with objects as needed in order to accomplish their tasks. From compliance to working safely around humans to whole-body manipulation of heavy objects, combining force and position control can dramatically expand the capabilities of robots. This is especially true for legged robots, which have so much ability to exert forces on the world around them. But how do we train robots which can do this? Baoxiong Jia tells us more in our discussion of his team’s recent, Best Paper Award winning work on learning a unified policy for position and force control, called UniFP. To learn more, watch Episode #49 of RoboPapers, hosted by Michael Cho - Rbt/Acc and Chris Paxton.

Robots need to be able to apply pressure and make contact with objects as needed in order to accomplish their tasks. From compliance to working safely around humans to whole-body manipulation of heavy objects, combining force and position control can dramatically expand the capabilities of robots. This is especially true for legged robots, which have so much ability to exert forces on the world around them. But how do we train robots which can do this? Baoxiong Jia tells us more in our discussion of his team’s recent, Best Paper Award winning work on learning a unified policy for position and force control, called UniFP. To learn more, watch Episode #49 of RoboPapers, hosted by Michael Cho - Rbt/Acc and Chris Paxton.

RoboPapers

44,774 просмотров • 6 месяцев назад

Our CoRL 2024 paper shows Reinforcement Learning can allow robots to learn skills via real-world practice, without any demonstrations or simulation engineering. Rewards are provided using language/vision models, and mobility of robots enables autonomous exploration. 1/N

Our CoRL 2024 paper shows Reinforcement Learning can allow robots to learn skills via real-world practice, without any demonstrations or simulation engineering. Rewards are provided using language/vision models, and mobility of robots enables autonomous exploration. 1/N

Russell Mendonca

38,454 просмотров • 1 год назад

Current robot policies often face a tradeoff: they're either precise (but brittle) or generalizable (but imprecise). We present ViTaL, a framework that lets robots generalize precise, contact-rich manipulation skills across unseen environments with millimeter-level precision. 🧵

Current robot policies often face a tradeoff: they're either precise (but brittle) or generalizable (but imprecise). We present ViTaL, a framework that lets robots generalize precise, contact-rich manipulation skills across unseen environments with millimeter-level precision. 🧵

Siddhant Haldar

145,945 просмотров • 11 месяцев назад

We have seen a lot of legged robots doing navigation in the wild. But how about mobile manipulation in the wild? I have been pushing the direction of learning a unified, efficient, and dynamic 3D representation of scenes (for navigation) and objects (for manipulation) for the past two years. And now we have GeFF --- our large-scale, generalizable feature field, that combines the speed of a feed-forward neural network with the rich semantics from Foundation Models, to handle dynamically changing scenes, and enable open-ended, language-grounded scene and object understanding.

We have seen a lot of legged robots doing navigation in the wild. But how about mobile manipulation in the wild? I have been pushing the direction of learning a unified, efficient, and dynamic 3D representation of scenes (for navigation) and objects (for manipulation) for the past two years. And now we have GeFF --- our large-scale, generalizable feature field, that combines the speed of a feed-forward neural network with the rich semantics from Foundation Models, to handle dynamically changing scenes, and enable open-ended, language-grounded scene and object understanding.

Xiaolong Wang

42,767 просмотров • 2 лет назад

Introduce HumanPlus - Autonomous Skills part Humanoids are born for using human data. Imitating humans, our humanoid learns: - fold sweatshirts - unload objects from warehouse racks - diverse locomotion skills (squatting, jumping, standing) - greet another robot Open-sourced!

Introduce HumanPlus - Autonomous Skills part Humanoids are born for using human data. Imitating humans, our humanoid learns: - fold sweatshirts - unload objects from warehouse racks - diverse locomotion skills (squatting, jumping, standing) - greet another robot Open-sourced!

Zipeng Fu

158,140 просмотров • 2 лет назад

Its been doing this for like two hours. RL whole body mobile manipulation, objects bought on Monday, never seen before

Its been doing this for like two hours. RL whole body mobile manipulation, objects bought on Monday, never seen before

Chris Paxton

43,535 просмотров • 1 год назад

The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us? Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.

The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us? Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.

Siddhant Haldar

69,031 просмотров • 1 год назад

Can we bring human-like Touch to robots🤖? Introducing our CoRL work on 3D-ViTac. Humans rely on both vision 👁️ and touch 🫳 for complex tasks. With combined visual-tactile sensing, robots can now tackle challenging tasks, like precise in-hand reorientation, fragile objects grasping. Website: #Robotics #CoRL2024 #Touch #tactile #AI #ML

Can we bring human-like Touch to robots🤖? Introducing our CoRL work on 3D-ViTac. Humans rely on both vision 👁️ and touch 🫳 for complex tasks. With combined visual-tactile sensing, robots can now tackle challenging tasks, like precise in-hand reorientation, fragile objects grasping. Website: #Robotics #CoRL2024 #Touch #tactile #AI #ML

Binghao Huang

49,500 просмотров • 1 год назад

Introducing Omnigrasp: Grasping Diverse Objects with Simulated Humanoids. With Omnigrasp, we show that we can control a humanoid equipped with dexterous hands to grasp diverse objects (>1200) and follow diverse trajectories, with one policy! 🌐: 📜:

Introducing Omnigrasp: Grasping Diverse Objects with Simulated Humanoids. With Omnigrasp, we show that we can control a humanoid equipped with dexterous hands to grasp diverse objects (>1200) and follow diverse trajectories, with one policy! 🌐: 📜:

Zhengyi “Zen” Luo

92,104 просмотров • 1 год назад

Five years ago, OpenAI trained a 5-fingered humanoid hand to solve a Rubik’s Cube using RL and domain randomization, pushing the boundaries of sim-to-real for fine-motor tasks. It underscored the value of creating sufficiently complex simulated worlds in which robots can learn.

Five years ago, OpenAI trained a 5-fingered humanoid hand to solve a Rubik’s Cube using RL and domain randomization, pushing the boundaries of sim-to-real for fine-motor tasks. It underscored the value of creating sufficiently complex simulated worlds in which robots can learn.

The Humanoid Hub

66,761 просмотров • 1 год назад

Work smarter with Clip Studio Paint's 3D models 💪 You can also use them as references for light sources, or to see how different objects bend and move. Have you tried using a 3D model in your artwork? Thanks for letting us share this, @Shei_babu!

Work smarter with Clip Studio Paint's 3D models 💪 You can also use them as references for light sources, or to see how different objects bend and move. Have you tried using a 3D model in your artwork? Thanks for letting us share this, @Shei_babu!

CLIP STUDIO PAINT

15,656 просмотров • 1 год назад

Can VLMs build Spatial Mental Models like humans? Reasoning from limited views? Reasoning from partial observations? Reasoning about unseen objects behind furniture / beyond current view? Check out MindCube! 🌐 📰 🤗 👩‍💻

Can VLMs build Spatial Mental Models like humans? Reasoning from limited views? Reasoning from partial observations? Reasoning about unseen objects behind furniture / beyond current view? Check out MindCube! 🌐 📰 🤗 👩‍💻

Manling Li

40,959 просмотров • 11 месяцев назад