正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Learning H-Infinity Locomotion Control Stable locomotion in precipitous environments is an essential capability of quadruped robots, demanding the ability to resist various external disturbances. However, recent learning-based policies only use basic domain randomization to

AK

514,714 subscribers

38,641 次观看 • 2 年前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 条评论

AK 的头像

AK2 年前

improve the robustness of learned policies, which cannot guarantee that the robot has adequate disturbance resistance capabilities. In this paper, we propose to model the learning process as an adversarial interaction between the actor and a newly introduced

AK 的头像

AK2 年前

disturber and ensure their optimization with H_{infty} constraint. In contrast to the actor that maximizes the discounted overall reward, the disturber is responsible for generating effective external forces and is optimized by maximizing the error between the task reward

AK 的头像

AK2 年前

and its oracle, i.e., "cost" in each iteration. To keep joint optimization between the actor and the disturber stable, our H_{infty} constraint mandates the bound of ratio between the cost to the intensity of the external forces. Through reciprocal interaction

AK 的头像

AK2 年前

throughout the training phase, the actor can acquire the capability to navigate increasingly complex physical disturbances. We verify the robustness of our approach on quadrupedal locomotion tasks with Unitree Aliengo robot, and also a more challenging task with Unitree

AK 的头像

AK2 年前

A1 robot, where the quadruped is expected to perform locomotion merely on its hind legs as if it is a bipedal robot. The simulated quantitative results show improvement against baselines, demonstrating the effectiveness of the method and each design choice. On the

AK 的头像

AK2 年前

other hand, real-robot experiments qualitatively exhibit how robust the policy is when interfering with various disturbances on various terrains, including stairs, high platforms,

AK 的头像

AK2 年前

slopes, and slippery terrains. All code, checkpoints, and real-world deployment guidance will be made public.

AK 的头像

AK2 年前

paper page:

AK 的头像

AK2 年前

daily papers:

UserInterface 的头像

UserInterface2 年前

Unveiling the Future of Prompt Engineering for Better AI Interactions #tech

相关视频

We are excited to share our #CORL2024 paper (oral) on "Learning Quadruped Locomotion Using Differentiable Simulation" done in collaboration with Sangbae Kim Massachusetts Institute of Technology (MIT). We present a new way to learn to walk in minutes without parallelization, outperforming PPO in sample efficiency! PDF: Video: We present a new framework for learning quadruped locomotion. By leveraging differentiable simulation for policy optimization, our approach achieves fast convergence and stable training, significantly outperforming model-free #ReinforcementLearning methods like PPO in sample efficiency. The key enabler is to combine a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. This work highlights one of the first successful real-world applications of differentiable simulation for quadruped robots, offering a compelling alternative to traditional RL methods. Kudos to Yunlong Song! UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) Massachusetts Institute of Technology (MIT)MechE

We are excited to share our #CORL2024 paper (oral) on "Learning Quadruped Locomotion Using Differentiable Simulation" done in collaboration with Sangbae Kim Massachusetts Institute of Technology (MIT). We present a new way to learn to walk in minutes without parallelization, outperforming PPO in sample efficiency! PDF: Video: We present a new framework for learning quadruped locomotion. By leveraging differentiable simulation for policy optimization, our approach achieves fast convergence and stable training, significantly outperforming model-free #ReinforcementLearning methods like PPO in sample efficiency. The key enabler is to combine a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. This work highlights one of the first successful real-world applications of differentiable simulation for quadruped robots, offering a compelling alternative to traditional RL methods. Kudos to Yunlong Song! UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) Massachusetts Institute of Technology (MIT)MechE

Davide Scaramuzza

15,533 次观看 • 1 年前

Quadrupeds are fast. Agile. Great at locomotion. But can they manipulate? A new approach from Carnegie Mellon University, Google DeepMind, and Bosch is teaching quadrupedal robots to do more than walk, they’re learning to interact. It’s called Human2LocoMan: a system that uses human data to pretrain robot policies before finetuning on real hardware. The result? A four-legged robot that can walk, carry, organize, scoop, and sort with both single and dual-arm control. By pretraining on human motion, they cut the amount of robot data in half—while improving success rates by over 80% in unfamiliar environments. Their Modularized Cross-Embodiment Transformer (MXT) learns from both human and robot demonstrations, then generalizes those skills to physical tasks—no hardcoded behaviors required. It’s locomotion and manipulation. A quadruped that can walk and clean up after itself?

Quadrupeds are fast. Agile. Great at locomotion. But can they manipulate? A new approach from Carnegie Mellon University, Google DeepMind, and Bosch is teaching quadrupedal robots to do more than walk, they’re learning to interact. It’s called Human2LocoMan: a system that uses human data to pretrain robot policies before finetuning on real hardware. The result? A four-legged robot that can walk, carry, organize, scoop, and sort with both single and dual-arm control. By pretraining on human motion, they cut the amount of robot data in half—while improving success rates by over 80% in unfamiliar environments. Their Modularized Cross-Embodiment Transformer (MXT) learns from both human and robot demonstrations, then generalizes those skills to physical tasks—no hardcoded behaviors required. It’s locomotion and manipulation. A quadruped that can walk and clean up after itself?

Lukas Ziegler

60,546 次观看 • 1 年前

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:07 - Is robot locomotion solved? 06:04 - Sim-to-real gap 08:58 - Adding semantics to policies 09:42 - Modular vs end-to-end architectures 10:29 - Planner model 12:21 - Adapting RL techniques from quadrupeds to humanoids 15:39 - Behind robot demos 18:09 - Humanoid robots in home environments 22:03 - Training approach 23:56 - VLA models 27:59 - Closing the sim-to-real gap 32:55 - Task orchestration using VLMs 36:38 - Tool use 38:10 - Model hierarchy 43:37 - Simulator versus simulation environment 44:57 - Combining imitation learning and reinforcement learning 46:42 - RL in real world versus RL in simulation 52:58 - Reward tuning and value functions in robotics 56:38 - Predictions 1:00:10 - Humanoids, quadropeds, and wheeled platforms 1:02:45 - Advice, recommended robot kits, and community pla

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:07 - Is robot locomotion solved? 06:04 - Sim-to-real gap 08:58 - Adding semantics to policies 09:42 - Modular vs end-to-end architectures 10:29 - Planner model 12:21 - Adapting RL techniques from quadrupeds to humanoids 15:39 - Behind robot demos 18:09 - Humanoid robots in home environments 22:03 - Training approach 23:56 - VLA models 27:59 - Closing the sim-to-real gap 32:55 - Task orchestration using VLMs 36:38 - Tool use 38:10 - Model hierarchy 43:37 - Simulator versus simulation environment 44:57 - Combining imitation learning and reinforcement learning 46:42 - RL in real world versus RL in simulation 52:58 - Reward tuning and value functions in robotics 56:38 - Predictions 1:00:10 - Humanoids, quadropeds, and wheeled platforms 1:02:45 - Advice, recommended robot kits, and community pla

The TWIML AI Podcast

22,592 次观看 • 6 个月前

See Spot perform dynamic whole-body manipulation. Using a combination of reinforcement learning (RL) and sampling-based control, the robot is able to autonomously drag, roll, and stack tires weighing 15 kg (33 lb), well above its maximum arm lift capacity. Learn more about coordinating locomotion and manipulation processes:

See Spot perform dynamic whole-body manipulation. Using a combination of reinforcement learning (RL) and sampling-based control, the robot is able to autonomously drag, roll, and stack tires weighing 15 kg (33 lb), well above its maximum arm lift capacity. Learn more about coordinating locomotion and manipulation processes:

RAI Institute

87,583 次观看 • 9 个月前

Excited to share our latest work: “Bio-Inspired Plastic Neural Networks for Zero-Shot Out-of-Distribution Generalization in Complex Animal-Inspired Robots” 🪲🦎 We show that Hebbian learning outperforms LSTM-based adaptation for real-world transfer. It even works without domain randomization! It can handle: ✅ Uneven terrain ✅ Morphological damage ✅ Sim-to-real gaps

Excited to share our latest work: “Bio-Inspired Plastic Neural Networks for Zero-Shot Out-of-Distribution Generalization in Complex Animal-Inspired Robots” 🪲🦎 We show that Hebbian learning outperforms LSTM-based adaptation for real-world transfer. It even works without domain randomization! It can handle: ✅ Uneven terrain ✅ Morphological damage ✅ Sim-to-real gaps

Sebastian Risi

14,390 次观看 • 1 年前

New mobility for robots! 🛹 The Collinear Mecanum Drive (CMD) is a novel robot locomotion system that combines omnidirectional motion with dynamic balancing. The robot in the video was developed at The University of Sheffield by Matthew Watson. By using three or more collinear Mecanum wheels, the CMD enables robots to move in any direction while maintaining a narrow ground footprint, limited only by wheel diameter. ⚙️ This allows robots to navigate through tighter spaces compared to traditional omnidirectional systems by moving directly along the axis of wheel rotation. Additionally, the CMD’s dynamic balancing capability permits taller robot designs without needing a wider base to prevent tipping during movement or external disturbances. ~~~ ♻️ RT to help 1 robot find a new workplace.

New mobility for robots! 🛹 The Collinear Mecanum Drive (CMD) is a novel robot locomotion system that combines omnidirectional motion with dynamic balancing. The robot in the video was developed at The University of Sheffield by Matthew Watson. By using three or more collinear Mecanum wheels, the CMD enables robots to move in any direction while maintaining a narrow ground footprint, limited only by wheel diameter. ⚙️ This allows robots to navigate through tighter spaces compared to traditional omnidirectional systems by moving directly along the axis of wheel rotation. Additionally, the CMD’s dynamic balancing capability permits taller robot designs without needing a wider base to prevent tipping during movement or external disturbances. ~~~ ♻️ RT to help 1 robot find a new workplace.

Lukas Ziegler

32,471 次观看 • 1 年前

The quadruped you can assemble yourself. Stella is an educational quadruped robot developed by Ahead. Designed for hands-on learning and self-assembly, the robot uses durable 3D-printed components. Ideal for all education levels, it is easy to repair, infinitely extendable, and equipped with advanced hardware and software. Despite its compact design, Stella is versatile, capable of operating in various environments and even carrying payloads like an optional robotic arm. A powerful tool for exploring robotics.

The quadruped you can assemble yourself. Stella is an educational quadruped robot developed by Ahead. Designed for hands-on learning and self-assembly, the robot uses durable 3D-printed components. Ideal for all education levels, it is easy to repair, infinitely extendable, and equipped with advanced hardware and software. Despite its compact design, Stella is versatile, capable of operating in various environments and even carrying payloads like an optional robotic arm. A powerful tool for exploring robotics.

Circuit

23,215 次观看 • 1 年前

Multi legged robots built for tough terrain! 🐛 Unlike wheeled or long-legged machines that stall in cluttered fields, their multi legged design provides stable, low-profile locomotion through confined and unstructured environments. The robots from Ground Control Robotics move close to the ground, giving direct access to crops while carrying precision sensors, from hyperspectral cameras to soil moisture and VOC (Volatile Organic Compound) detectors, without disturbing fragile gases. Applications range from agriculture to defense, search & rescue, and pest control, but the team is first targeting the multi-billion dollar specialty agriculture market, where traditional robots fail to operate effectively. Developed in collaboration with Georgia Institute of Technology, the platform is positioned as a versatile sensor and mobility system for environments that defeat conventional designs.

Lukas Ziegler

44,288 次观看 • 11 个月前

Legged Locomotion… meets Skateboarding [Paper ⬇️] Most robot movement models either rely on fixed patterns or struggle to handle complex changes. DHAL (Discrete-time Hybrid Automata Learning) takes a different approach: using reinforcement learning to teach robots when and how to switch movements in real-time: ✅ Learns when to switch between different motions without pre-labeled data ✅ Handles complex, high-dimensional movements like a quadrupedal robot on a skateboard ✅ Uses a multi-critic architecture to improve contact-based motion control ✅ Works in both simulation and real-world environments with strong results It proves that robots can learn movement transitions on their own, without predefined rules. Paper: Thanks to Hang Liu for bringing this to my attention!

Legged Locomotion… meets Skateboarding [Paper ⬇️] Most robot movement models either rely on fixed patterns or struggle to handle complex changes. DHAL (Discrete-time Hybrid Automata Learning) takes a different approach: using reinforcement learning to teach robots when and how to switch movements in real-time: ✅ Learns when to switch between different motions without pre-labeled data ✅ Handles complex, high-dimensional movements like a quadrupedal robot on a skateboard ✅ Uses a multi-critic architecture to improve contact-based motion control ✅ Works in both simulation and real-world environments with strong results It proves that robots can learn movement transitions on their own, without predefined rules. Paper: Thanks to Hang Liu for bringing this to my attention!

Ilir Aliu - eu/acc

41,894 次观看 • 1 年前

Robots are evolving from basic tools into real-world workforce. Meet Kengo, the latest embodied AI bipedal robot platform from Galaxea Driven by an advanced locomotion cerebellum and embodied AI brain, Kengo boasts elite mobility, high adaptability, and the power to evolve in real-world scenarios. The future of next-gen productivity is here.

Robots are evolving from basic tools into real-world workforce. Meet Kengo, the latest embodied AI bipedal robot platform from Galaxea Driven by an advanced locomotion cerebellum and embodied AI brain, Kengo boasts elite mobility, high adaptability, and the power to evolve in real-world scenarios. The future of next-gen productivity is here.

Galaxea Dynamics

19,364 次观看 • 2 个月前

🚨🇺🇸FIGURE ROBOT LEARNS TO WALK IN HOURS Turns out you can teach a robot to walk—if you’ve got a few years to burn in a physics simulator (or a few hours of GPU time). Figure’s team trained a neural network using reinforcement learning to mimic natural human locomotion, and the wild part is it worked on the first try with real hardware. No tuning. No fiddling. Just plug and strut. Thanks to sim-to-real transfer and some sneaky domain randomization, the robot didn’t just wobble out of the lab—it walked like it had somewhere to be. Next up: running. Not terrifying at all. Source: Figure

🚨🇺🇸FIGURE ROBOT LEARNS TO WALK IN HOURS Turns out you can teach a robot to walk—if you’ve got a few years to burn in a physics simulator (or a few hours of GPU time). Figure’s team trained a neural network using reinforcement learning to mimic natural human locomotion, and the wild part is it worked on the first try with real hardware. No tuning. No fiddling. Just plug and strut. Thanks to sim-to-real transfer and some sneaky domain randomization, the robot didn’t just wobble out of the lab—it walked like it had somewhere to be. Next up: running. Not terrifying at all. Source: Figure

Mario Nawfal

117,866 次观看 • 1 年前

This robodog aims to remove humans from hazardous work environments The Robot Teleoperativo 2 project out of the Dynamic Legged Systems lab has developed an advanced teleoperation system designed to operate in high-risk environments, minimizing dangers to human workers. By integrating cutting-edge technologies in tele-locomotion, tele-manipulation, and remote human-robot interaction, the system enables precise, dual-arm control for complex tasks in hazardous conditions.

This robodog aims to remove humans from hazardous work environments The Robot Teleoperativo 2 project out of the Dynamic Legged Systems lab has developed an advanced teleoperation system designed to operate in high-risk environments, minimizing dangers to human workers. By integrating cutting-edge technologies in tele-locomotion, tele-manipulation, and remote human-robot interaction, the system enables precise, dual-arm control for complex tasks in hazardous conditions.

Circuit

22,827 次观看 • 1 年前

This research from ETH Zurich examines advancements in quadruped robots for industrial settings, specifically addressing their ability to climb ladders—a previously overlooked but essential capability. While legged robots excel over wheeled robots on rough terrain, their inability to scale ladders has limited their effectiveness in inspecting dangerous areas, impacting both safety and efficiency. Credit: ETH Zurich #engineering #technology #robotics #robots -------------------------------- Stay ahead of the curve! Follow us now on our WhatsApp ( and Telegram ( channels and stay updated about the cutting edge.

This research from ETH Zurich examines advancements in quadruped robots for industrial settings, specifically addressing their ability to climb ladders—a previously overlooked but essential capability. While legged robots excel over wheeled robots on rough terrain, their inability to scale ladders has limited their effectiveness in inspecting dangerous areas, impacting both safety and efficiency. Credit: ETH Zurich #engineering #technology #robotics #robots -------------------------------- Stay ahead of the curve! Follow us now on our WhatsApp ( and Telegram ( channels and stay updated about the cutting edge.

Wevolver

10,615 次观看 • 1 年前

Multi legged robots built for tough terrain! 🐛 Unlike wheeled or long-legged machines that stall in cluttered fields, their multi legged design provides stable, low-profile locomotion through confined and unstructured environments. The robots from Ground Control Robotics move close to the ground, giving direct access to crops while carrying precision sensors, from hyperspectral cameras to soil moisture and VOC (Volatile Organic Compound) detectors, without disturbing fragile gases. Applications range from agriculture to defense, search & rescue, and pest control, but the team is first targeting the multi-billion dollar specialty agriculture market, where traditional robots fail to operate effectively. Developed in collaboration with Georgia Institute of Technology. ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Lukas Ziegler

10,965 次观看 • 6 个月前

The SOTA in humanoid locomotion is already insanely good. The control stack is almost fully leveraging what the hardware can do. For real-world use, the active safety is still remain unsolved: Common-sense intelligence to move safely around people, and graceful handling of possible hardware failures in the wild.

The SOTA in humanoid locomotion is already insanely good. The control stack is almost fully leveraging what the hardware can do. For real-world use, the active safety is still remain unsolved: Common-sense intelligence to move safely around people, and graceful handling of possible hardware failures in the wild.

The Humanoid Hub

57,576 次观看 • 3 个月前

Robots Using Their Feet to Manipulate Objects ETH Zurich researchers have advanced "pedipulation," where legged robots use their feet for manipulation instead of robotic arms. Their new approach trains robots to navigate and manipulate objects while avoiding static and dynamic obstacles. Tested on the ANYmal quadruped, the method enables precise foot control in complex, unseen environments after minimal training in simulation. Will this research open up new possibilities for efficient, obstacle-aware robot manipulation?

Robots Using Their Feet to Manipulate Objects ETH Zurich researchers have advanced "pedipulation," where legged robots use their feet for manipulation instead of robotic arms. Their new approach trains robots to navigate and manipulate objects while avoiding static and dynamic obstacles. Tested on the ANYmal quadruped, the method enables precise foot control in complex, unseen environments after minimal training in simulation. Will this research open up new possibilities for efficient, obstacle-aware robot manipulation?

Circuit

11,802 次观看 • 1 年前

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision paper page: We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However, existing scene-level datasets for deep learning-based 3D vision, limited to either synthetic environments or a narrow selection of real-world scenes, are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap, we present DL3DV-10K, a large-scale scene dataset, featuring 51.2 million frames from 10,510 videos captured from 65 types of point-of-interest (POI) locations, covering both bounded and unbounded scenes, with different levels of reflection, transparency, and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K, which revealed valuable insights for future research in NVS. In addition, we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K, which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation.

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision paper page: We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However, existing scene-level datasets for deep learning-based 3D vision, limited to either synthetic environments or a narrow selection of real-world scenes, are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap, we present DL3DV-10K, a large-scale scene dataset, featuring 51.2 million frames from 10,510 videos captured from 65 types of point-of-interest (POI) locations, covering both bounded and unbounded scenes, with different levels of reflection, transparency, and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K, which revealed valuable insights for future research in NVS. In addition, we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K, which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation.

AK

49,920 次观看 • 2 年前

Over the last few months, we’ve been thinking about how to learn from “off-domain” data - data from non-robot sources like video or simulation. These data sources are not quite good enough to learn policies (even monolithic VLA models) directly, but they still contain lots of information that can be useful for generalizable robot control. How can we develop robot learning models that are able to make use of this type of data for generalizable control? In new work, that we call HAMSTER, we show that VLMs can be useful for enabling robotic learning from off-domain data, but specifically when used through hierarchical VLA architectures. We show that this class of models can learn generalizable robot policies for the real world from large-scale, off-domain data. A 🧵 (1/10)

Over the last few months, we’ve been thinking about how to learn from “off-domain” data - data from non-robot sources like video or simulation. These data sources are not quite good enough to learn policies (even monolithic VLA models) directly, but they still contain lots of information that can be useful for generalizable robot control. How can we develop robot learning models that are able to make use of this type of data for generalizable control? In new work, that we call HAMSTER, we show that VLMs can be useful for enabling robotic learning from off-domain data, but specifically when used through hierarchical VLA architectures. We show that this class of models can learn generalizable robot policies for the real world from large-scale, off-domain data. A 🧵 (1/10)

Abhishek Gupta

11,994 次观看 • 1 年前

Robots need to be able to apply pressure and make contact with objects as needed in order to accomplish their tasks. From compliance to working safely around humans to whole-body manipulation of heavy objects, combining force and position control can dramatically expand the capabilities of robots. This is especially true for legged robots, which have so much ability to exert forces on the world around them. But how do we train robots which can do this? Baoxiong Jia tells us more in our discussion of his team’s recent, Best Paper Award winning work on learning a unified policy for position and force control, called UniFP. To learn more, watch Episode #49 of RoboPapers, hosted by Michael Cho - Rbt/Acc and Chris Paxton.

Robots need to be able to apply pressure and make contact with objects as needed in order to accomplish their tasks. From compliance to working safely around humans to whole-body manipulation of heavy objects, combining force and position control can dramatically expand the capabilities of robots. This is especially true for legged robots, which have so much ability to exert forces on the world around them. But how do we train robots which can do this? Baoxiong Jia tells us more in our discussion of his team’s recent, Best Paper Award winning work on learning a unified policy for position and force control, called UniFP. To learn more, watch Episode #49 of RoboPapers, hosted by Michael Cho - Rbt/Acc and Chris Paxton.

RoboPapers

44,803 次观看 • 8 个月前

Experiments in progress. The one on the right has been learning for ~3 hours, the one in the middle for ~1 hour, and the one on the left just started a few minutes ago. The initial motivation for making the physical Atari was just to commit ourselves to a subset of algorithms that can make progress in this setup. This commitment rules out algorithms that require billions of samples to learn (or worse, require multiple environments running in parallel). Atari games are simple enough that we should be able to show learning on them in a short amount of time with no prior knowledge. Since then, I've realized that this setup is also a good way to compare different paradigms in robotics in a principled way. These paradigms are sim2real, learning from tele-operated data, and learning directly on the robots. So far, I have observed that getting sim2real to work reliably is hard. It requires tweaks that don't scale. Policies that can play perfectly in simulation fall apart because of latencies and the messiness of the real world. These aspects could be modeled to improve the simulation, but not without sinking significant human engineering hours. I have higher hopes for learning from tele-operated data, but that requires a human to learn the task first. These experiments are on my to-do list. I have to learn to play some of the games well through the robot. I’m half-decent at playing Pong and Ms Pacman now. Learning directly on robots is looking like the most promising approach. This approach takes away pesky distribution shifts and makes it possible to have algorithms that continually improve with more data and time without any human intervention. It feels great to let experiments run overnight and wake up to find improved policies. With learning on robots, I should, in principle, be able to go on a long vacation and come back to find better policies for complex tasks beyond Atari games. Whether that is possible with current learning algorithms is a different question.

Experiments in progress. The one on the right has been learning for ~3 hours, the one in the middle for ~1 hour, and the one on the left just started a few minutes ago. The initial motivation for making the physical Atari was just to commit ourselves to a subset of algorithms that can make progress in this setup. This commitment rules out algorithms that require billions of samples to learn (or worse, require multiple environments running in parallel). Atari games are simple enough that we should be able to show learning on them in a short amount of time with no prior knowledge. Since then, I've realized that this setup is also a good way to compare different paradigms in robotics in a principled way. These paradigms are sim2real, learning from tele-operated data, and learning directly on the robots. So far, I have observed that getting sim2real to work reliably is hard. It requires tweaks that don't scale. Policies that can play perfectly in simulation fall apart because of latencies and the messiness of the real world. These aspects could be modeled to improve the simulation, but not without sinking significant human engineering hours. I have higher hopes for learning from tele-operated data, but that requires a human to learn the task first. These experiments are on my to-do list. I have to learn to play some of the games well through the robot. I’m half-decent at playing Pong and Ms Pacman now. Learning directly on robots is looking like the most promising approach. This approach takes away pesky distribution shifts and makes it possible to have algorithms that continually improve with more data and time without any human intervention. It feels great to let experiments run overnight and wake up to find improved policies. With learning on robots, I should, in principle, be able to go on a long vacation and come back to find better policies for complex tasks beyond Atari games. Whether that is possible with current learning algorithms is a different question.

Khurram Javed

52,110 次观看 • 8 个月前