Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Foundation models are enough to solve robotics! Unfortunately, this is not true. We keep hearing that Vision-Language-Action (VLA) models struggle because of the gap between static training and the dynamic real world. A German startup (Sereact) just released a solution that bridges this gap perfectly. They are introducing a... new paradigm called Interactive RL Policy Patching. It's a distributed framework that allows robots to self-learn from human corrections without needing full retraining. When a robot fails, a human operator provides a brief "patch" or demonstration. The system then uses online off-policy reinforcement learning to update the behavior instantly. This is powered by a massive foundation model trained on hundreds of millions of interactions from over 100 deployed robot stations. The best part is the distributed parameter synchronization... When one robot learns a fix, the update is published fleet-wide... meaning the entire swarm gets smarter from a single human intervention. They are already proving this on complex manipulation tasks like shoe unboxing and screw sorting, drastically reducing the data needed to handle edge cases. Real-world environments are unforgiving, and I love seeing systems that can actually adapt on the fly! 📍 More info:show more

Ilir Aliu

50,022 subscribers

18,210 görüntüleme • 5 ay önce •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

Qwen-VLA feels like one of the first real robotics foundation models. A single system trained across robot manipulation, navigation, egocentric human video, simulation, and vision-language reasoning instead of isolated robot policies.

Qwen-VLA feels like one of the first real robotics foundation models. A single system trained across robot manipulation, navigation, egocentric human video, simulation, and vision-language reasoning instead of isolated robot policies.

Robots Digest 🤖

14,601 görüntüleme • 1 ay önce

Happy to share what I’ve been working on since joining Genesis! GENE-26.5 is a one-of-a-kind, robotics-native multimodal foundation model that learns from diverse, in-the-wild data across modalities and outputs actions enabling a 54-DoF robot system to perform the most dexterous, long-horizon manipulation tasks to date—approaching human-level capability. This is the result of innovations across the full stack—data collection and processing, robot systems, model architecture, training strategies, and scalable evaluation infrastructure.

Happy to share what I’ve been working on since joining Genesis! GENE-26.5 is a one-of-a-kind, robotics-native multimodal foundation model that learns from diverse, in-the-wild data across modalities and outputs actions enabling a 54-DoF robot system to perform the most dexterous, long-horizon manipulation tasks to date—approaching human-level capability. This is the result of innovations across the full stack—data collection and processing, robot systems, model architecture, training strategies, and scalable evaluation infrastructure.

Zu Wang

18,402 görüntüleme • 1 ay önce

Robots are getting smarter, but most still fail the same way. They don’t learn from their own mistakes. A new paper proposes something different: a way for robots to self-improve directly from their failures in the real world. It’s called PLD (Probe, Learn, Distill). The idea: instead of collecting endless human demos, let the robot figure out where it fails, learn how to recover, and then distill that knowledge back into its main model. Key takeaways from the research: ✅ Uses residual reinforcement learning to recover from policy failures ✅ Achieves 99% success on LIBERO and 100% on real Franka and YAM arms ✅ Runs hour-long manipulation tasks without human resets ✅ Builds a feedback loop between real-world data and model adaptation Unlike supervised fine-tuning, which relies on humans, PLD learns from the robot’s own experience. By training on its own failure distribution, the model becomes both more efficient and more aligned with the real world. It’s not just a technical shift, it’s a step toward robots that improve themselves through real-world practice. Thanks for sharing, Wenli Xiao !! Paper and demos: —- Weekly robotics and AI insights. Subscribe free:

Robots are getting smarter, but most still fail the same way. They don’t learn from their own mistakes. A new paper proposes something different: a way for robots to self-improve directly from their failures in the real world. It’s called PLD (Probe, Learn, Distill). The idea: instead of collecting endless human demos, let the robot figure out where it fails, learn how to recover, and then distill that knowledge back into its main model. Key takeaways from the research: ✅ Uses residual reinforcement learning to recover from policy failures ✅ Achieves 99% success on LIBERO and 100% on real Franka and YAM arms ✅ Runs hour-long manipulation tasks without human resets ✅ Builds a feedback loop between real-world data and model adaptation Unlike supervised fine-tuning, which relies on humans, PLD learns from the robot’s own experience. By training on its own failure distribution, the model becomes both more efficient and more aligned with the real world. It’s not just a technical shift, it’s a step toward robots that improve themselves through real-world practice. Thanks for sharing, Wenli Xiao !! Paper and demos: —- Weekly robotics and AI insights. Subscribe free:

Ilir Aliu

22,812 görüntüleme • 7 ay önce

In order for robots to be deployed in the real world, performing tasks of real value, they must be reliable. Unfortunately, even more, most robotic demos work maybe 70-80% of the time at best. The way to get better reliability is to do real-world reinforcement learning: having the robot teach itself how to perform the task up to a high level of success. The key to doing this is to start with a core of expert human data, use that to train a policy then iteratively improve it, until finally finishing with on-policy reinforcement learning. Kun Lei talks through a unified framework for imitation and reinforcement learning based on PPO, which enables this improvement process. In this episode, Kun Lei explains the theory behind his reinforcement learning method and how it allowed his robot to run in a shopping mall juicing oranges for seven hours at a time, among experiments on a wide variety of tasks and embodiments. Watch episode 58 of RoboPapers now, hosted by Michael Cho - Rbt/Acc and Chris Paxton!

In order for robots to be deployed in the real world, performing tasks of real value, they must be reliable. Unfortunately, even more, most robotic demos work maybe 70-80% of the time at best. The way to get better reliability is to do real-world reinforcement learning: having the robot teach itself how to perform the task up to a high level of success. The key to doing this is to start with a core of expert human data, use that to train a policy then iteratively improve it, until finally finishing with on-policy reinforcement learning. Kun Lei talks through a unified framework for imitation and reinforcement learning based on PPO, which enables this improvement process. In this episode, Kun Lei explains the theory behind his reinforcement learning method and how it allowed his robot to run in a shopping mall juicing oranges for seven hours at a time, among experiments on a wide variety of tasks and embodiments. Watch episode 58 of RoboPapers now, hosted by Michael Cho - Rbt/Acc and Chris Paxton!

RoboPapers

18,813 görüntüleme • 5 ay önce

Quadrupeds are fast. Agile. Great at locomotion. But can they manipulate? A new approach from Carnegie Mellon University, Google DeepMind, and Bosch is teaching quadrupedal robots to do more than walk, they’re learning to interact. It’s called Human2LocoMan: a system that uses human data to pretrain robot policies before finetuning on real hardware. The result? A four-legged robot that can walk, carry, organize, scoop, and sort with both single and dual-arm control. By pretraining on human motion, they cut the amount of robot data in half—while improving success rates by over 80% in unfamiliar environments. Their Modularized Cross-Embodiment Transformer (MXT) learns from both human and robot demonstrations, then generalizes those skills to physical tasks—no hardcoded behaviors required. It’s locomotion and manipulation. A quadruped that can walk and clean up after itself?

Quadrupeds are fast. Agile. Great at locomotion. But can they manipulate? A new approach from Carnegie Mellon University, Google DeepMind, and Bosch is teaching quadrupedal robots to do more than walk, they’re learning to interact. It’s called Human2LocoMan: a system that uses human data to pretrain robot policies before finetuning on real hardware. The result? A four-legged robot that can walk, carry, organize, scoop, and sort with both single and dual-arm control. By pretraining on human motion, they cut the amount of robot data in half—while improving success rates by over 80% in unfamiliar environments. Their Modularized Cross-Embodiment Transformer (MXT) learns from both human and robot demonstrations, then generalizes those skills to physical tasks—no hardcoded behaviors required. It’s locomotion and manipulation. A quadruped that can walk and clean up after itself?

Lukas Ziegler

60,546 görüntüleme • 1 yıl önce

Over the last few months, we’ve been thinking about how to learn from “off-domain” data - data from non-robot sources like video or simulation. These data sources are not quite good enough to learn policies (even monolithic VLA models) directly, but they still contain lots of information that can be useful for generalizable robot control. How can we develop robot learning models that are able to make use of this type of data for generalizable control? In new work, that we call HAMSTER, we show that VLMs can be useful for enabling robotic learning from off-domain data, but specifically when used through hierarchical VLA architectures. We show that this class of models can learn generalizable robot policies for the real world from large-scale, off-domain data. A 🧵 (1/10)

Over the last few months, we’ve been thinking about how to learn from “off-domain” data - data from non-robot sources like video or simulation. These data sources are not quite good enough to learn policies (even monolithic VLA models) directly, but they still contain lots of information that can be useful for generalizable robot control. How can we develop robot learning models that are able to make use of this type of data for generalizable control? In new work, that we call HAMSTER, we show that VLMs can be useful for enabling robotic learning from off-domain data, but specifically when used through hierarchical VLA architectures. We show that this class of models can learn generalizable robot policies for the real world from large-scale, off-domain data. A 🧵 (1/10)

Abhishek Gupta

11,994 görüntüleme • 1 yıl önce

Let's reverse engineer Disney's adorable, lifelike robot! I couldn't find a whitepaper, but this is how I think it's trained: 1. The emotional behaviors are curated by Disney animation artists, keyframe by keyframe. But it cannot be "rendered" directly on the robot because it doesn't take into account the complex real-world physics. 2. Reinforcement learning (RL) is a great tool for training low-level robot controllers. RL needs a reward function to optimize, and it's typically a task reward (e.g. walk in a straight line as fast as possible). The problem is that RL doesn't know what counts as "natural behavior", and often produces weird-looking body postures that somehow still maximize the reward. This is a human alignment problem just like ChatGPT. 3. Enters Adversarial Motion Prior (AMP): a technique that learns the human preference by training a classifier on what we consider "emotional & cute". In GAN literature, this is called a discriminator. Disney artists are good at creating such a dataset. You can then add AMP as an auxiliary reward in simulation to nudge the robot towards desired behaviors. AMP was developed by Peng et al. 2021 and Escontrela et al. 2022. 4. Add lots of data augmentation to make the controller robust to physical disturbances. In RL, it's called "domain randomization". This is a very powerful technique that bridges the gap between simulator and reality. Previously, OpenAI used domain randomization to train a 5-finger robot hand to manipulate a Rubik's Cube: IEEE news article gave hints about the pipeline: Finally, praying for world peace 🙏. I hope robotics like this will bring more joy to the world.

Let's reverse engineer Disney's adorable, lifelike robot! I couldn't find a whitepaper, but this is how I think it's trained: 1. The emotional behaviors are curated by Disney animation artists, keyframe by keyframe. But it cannot be "rendered" directly on the robot because it doesn't take into account the complex real-world physics. 2. Reinforcement learning (RL) is a great tool for training low-level robot controllers. RL needs a reward function to optimize, and it's typically a task reward (e.g. walk in a straight line as fast as possible). The problem is that RL doesn't know what counts as "natural behavior", and often produces weird-looking body postures that somehow still maximize the reward. This is a human alignment problem just like ChatGPT. 3. Enters Adversarial Motion Prior (AMP): a technique that learns the human preference by training a classifier on what we consider "emotional & cute". In GAN literature, this is called a discriminator. Disney artists are good at creating such a dataset. You can then add AMP as an auxiliary reward in simulation to nudge the robot towards desired behaviors. AMP was developed by Peng et al. 2021 and Escontrela et al. 2022. 4. Add lots of data augmentation to make the controller robust to physical disturbances. In RL, it's called "domain randomization". This is a very powerful technique that bridges the gap between simulator and reality. Previously, OpenAI used domain randomization to train a 5-finger robot hand to manipulate a Rubik's Cube: IEEE news article gave hints about the pipeline: Finally, praying for world peace 🙏. I hope robotics like this will bring more joy to the world.

Jim Fan

314,611 görüntüleme • 2 yıl önce

NVIDIA just announced EgoScale 🤖🧠 NVIDIA Research has uncovered a log-linear scaling law for robot dexterity by pretraining VLA models on over 20,000 hours of egocentric human video This massive dataset is 20 times larger than previous efforts and proves that robot intelligence follows a predictable path: the more human data, the lower the loss The secret is a simple recipe combining large-scale human pretraining with a small amount of aligned human-robot mid-training to bridge the gap In testing, this method boosted the average success rate by 54% on a 22-DoF robotic hand compared to policies built without pretraining EgoScale also enables one-shot task adaptation and works across different hardware, suggesting that human motion is a universal motor prior for robots Website: Paper: Source: NVIDIA Research #Robot #Humanoid #Robotics #AI #EmbodiedAI #PhysicalAI #NVIDIA #EgoScale #GR00T

NVIDIA just announced EgoScale 🤖🧠 NVIDIA Research has uncovered a log-linear scaling law for robot dexterity by pretraining VLA models on over 20,000 hours of egocentric human video This massive dataset is 20 times larger than previous efforts and proves that robot intelligence follows a predictable path: the more human data, the lower the loss The secret is a simple recipe combining large-scale human pretraining with a small amount of aligned human-robot mid-training to bridge the gap In testing, this method boosted the average success rate by 54% on a 22-DoF robotic hand compared to policies built without pretraining EgoScale also enables one-shot task adaptation and works across different hardware, suggesting that human motion is a universal motor prior for robots Website: Paper: Source: NVIDIA Research #Robot #Humanoid #Robotics #AI #EmbodiedAI #PhysicalAI #NVIDIA #EgoScale #GR00T

RoboHub🤖

43,752 görüntüleme • 4 ay önce

A new robot policy just cleaned up a kitchen it had never seen before [watch what happens. paper included ⬇️] Pi-0.5 builds on top of Pi-0 and shows how smart co-training with diverse data can unlock real generalization in the home. It doesn’t just learn from one setup but adapts to many, including homes it’s never seen. What it does ✅ Handles new homes without training in them ✅ Follows complex language instructions ✅ Cleans, places dishes, handles spills ✅ Matches in-home training models using cross-embodiment and web data Robots that understand tasks and adapt to new spaces are finally within reach. More in the blog: Read the paper ⬇️ Physical Intelligence, co-founded by UC Berkeley professor Sergey Levine, is a robotics startup developing general-purpose AI foundation models that enable robots to perform a wide variety of real-world tasks with human-like adaptability, recently raising $400 million to advance this vision.

A new robot policy just cleaned up a kitchen it had never seen before [watch what happens. paper included ⬇️] Pi-0.5 builds on top of Pi-0 and shows how smart co-training with diverse data can unlock real generalization in the home. It doesn’t just learn from one setup but adapts to many, including homes it’s never seen. What it does ✅ Handles new homes without training in them ✅ Follows complex language instructions ✅ Cleans, places dishes, handles spills ✅ Matches in-home training models using cross-embodiment and web data Robots that understand tasks and adapt to new spaces are finally within reach. More in the blog: Read the paper ⬇️ Physical Intelligence, co-founded by UC Berkeley professor Sergey Levine, is a robotics startup developing general-purpose AI foundation models that enable robots to perform a wide variety of real-world tasks with human-like adaptability, recently raising $400 million to advance this vision.

Ilir Aliu - eu/acc

18,120 görüntüleme • 1 yıl önce

I don’t know if we live in a Matrix, but I know for sure that robots will spend most of their lives in simulation. Let machines train machines. I’m excited to introduce DexMimicGen, a massive-scale synthetic data generator that enables a humanoid robot to learn complex skills from only a handful of human demonstrations. Yes, as few as 5! DexMimicGen addresses the biggest pain point in robotics: where do we get data? Unlike with LLMs, where vast amounts of texts are readily available, you cannot simply download motor control signals from the internet. So researchers teleoperate the robots to collect motion data via XR headsets. They have to repeat the same skill over and over and over again, because neural nets are data hungry. This is a very slow and uncomfortable process. At NVIDIA, we believe the majority of high-quality tokens for robot foundation models will come from simulation. What DexMimicGen does is to trade GPU compute time for human time. It takes one motion trajectory from human, and multiplies into 1000s of new trajectories. A robot brain trained on this augmented dataset will generalize far better in the real world. Think of DexMimicGen as a learning signal amplifier. It maps a small dataset to a large (de facto infinite) dataset, using physics simulation in the loop. In this way, we free humans from babysitting the bots all day. The future of robot data is generative. The future of the entire robot learning pipeline will also be generative. 🧵

I don’t know if we live in a Matrix, but I know for sure that robots will spend most of their lives in simulation. Let machines train machines. I’m excited to introduce DexMimicGen, a massive-scale synthetic data generator that enables a humanoid robot to learn complex skills from only a handful of human demonstrations. Yes, as few as 5! DexMimicGen addresses the biggest pain point in robotics: where do we get data? Unlike with LLMs, where vast amounts of texts are readily available, you cannot simply download motor control signals from the internet. So researchers teleoperate the robots to collect motion data via XR headsets. They have to repeat the same skill over and over and over again, because neural nets are data hungry. This is a very slow and uncomfortable process. At NVIDIA, we believe the majority of high-quality tokens for robot foundation models will come from simulation. What DexMimicGen does is to trade GPU compute time for human time. It takes one motion trajectory from human, and multiplies into 1000s of new trajectories. A robot brain trained on this augmented dataset will generalize far better in the real world. Think of DexMimicGen as a learning signal amplifier. It maps a small dataset to a large (de facto infinite) dataset, using physics simulation in the loop. In this way, we free humans from babysitting the bots all day. The future of robot data is generative. The future of the entire robot learning pipeline will also be generative. 🧵

Jim Fan

165,246 görüntüleme • 1 yıl önce

The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us? Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.

The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us? Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.

Siddhant Haldar

69,056 görüntüleme • 1 yıl önce

Helix is now learning directly from human video data We have already trained on data collected in the real world, including Brookfield residential units To our knowledge, this is the first instance of a humanoid robot learning navigation end-to-end using only human video

Helix is now learning directly from human video data We have already trained on data collected in the real world, including Brookfield residential units To our knowledge, this is the first instance of a humanoid robot learning navigation end-to-end using only human video

Figure

46,041 görüntüleme • 9 ay önce

The future of home cleaning just landed in Shenzhen and it is walking right into your living room. 🤖🏠 X Square Robot and officially launched China’s first robot home service, moving embodied AI from the lab to your front door. When you book a cleaning on the app, a professional cleaner now shows up with an X Square robot partner to tag team the house. The human handles the tricky stuff that needs real judgment while the robot takes over repetitive tasks like wiping tables and tidying up surfaces. X Square is using an end to end foundation model which means the robot actually perceives and plans its own moves instead of just following a script. By testing in the messy reality of a real home, they are proving that if a robot can master a living room, it can handle almost any physical space. This pilot is part of a massive push to turn these machines into reliable partners that can actually assist in our daily lives.

The future of home cleaning just landed in Shenzhen and it is walking right into your living room. 🤖🏠 X Square Robot and officially launched China’s first robot home service, moving embodied AI from the lab to your front door. When you book a cleaning on the app, a professional cleaner now shows up with an X Square robot partner to tag team the house. The human handles the tricky stuff that needs real judgment while the robot takes over repetitive tasks like wiping tables and tidying up surfaces. X Square is using an end to end foundation model which means the robot actually perceives and plans its own moves instead of just following a script. By testing in the messy reality of a real home, they are proving that if a robot can master a living room, it can handle almost any physical space. This pilot is part of a massive push to turn these machines into reliable partners that can actually assist in our daily lives.

RoboHub🤖

101,350 görüntüleme • 3 ay önce

We are back. After one year of quiet building. Introducing GENE-26.5, our first robotic brain that takes a major step toward human-level capability. For years, robotics has struggled to learn from the world’s largest and valuable data source: Humans. Solving it means rethinking the whole stack from the ground up: - A robotics-native foundation model. - A 1:1 human-like robotic hand. - A noninvasive data collection glove for motion, force, and touch. - A simulator that turns weeks of experiments into minutes. GENE-26.5 is trained across language, vision, proprioception, tactile, and action. We designed a set of tasks to test how far we can go with this new paradigm. Fully autonomous, 1x speed, one model, same weights. (Enjoy with sound on) We are approaching the endgame for robotics. And this is just a beginning.

We are back. After one year of quiet building. Introducing GENE-26.5, our first robotic brain that takes a major step toward human-level capability. For years, robotics has struggled to learn from the world’s largest and valuable data source: Humans. Solving it means rethinking the whole stack from the ground up: - A robotics-native foundation model. - A 1:1 human-like robotic hand. - A noninvasive data collection glove for motion, force, and touch. - A simulator that turns weeks of experiments into minutes. GENE-26.5 is trained across language, vision, proprioception, tactile, and action. We designed a set of tasks to test how far we can go with this new paradigm. Fully autonomous, 1x speed, one model, same weights. (Enjoy with sound on) We are approaching the endgame for robotics. And this is just a beginning.

Genesis AI

2,703,018 görüntüleme • 1 ay önce

A simple idea. Let robots collect the data that current foundation models are missing. A robot that gets better by doing real work in the real world. For two weeks in the Stanford East Asia Library, Scanford scanned shelves, helped librarians, and improved the vision language model it depends on. The idea is very simple: Robots do useful work. They gather the real world data foundation models never see online. They fine tune their own model They go back out stronger A full loop. What they found in deployment: ✅ 2103 shelves scanned with multilingual, faded, occluded book spines ✅ 18.7 hours of librarian time saved ✅ Book ID accuracy jumped from 32.0 percent to 71.8 percent ✅ English OCR improved from 24.8 percent to 46.6 percent ✅ Chinese OCR improved from 30.8 percent to 38.0 percent The most interesting part is the shift. Robots do not only consume foundation models. They create the data these models are missing. A clean robot powered data flywheel. Work. Collect. Fine tune. Repeat. Thanks for sharing, Jenn Grannen! If you want the full write up: 📍Website: Paper: —- Weekly robotics and AI insights. Subscribe free:

A simple idea. Let robots collect the data that current foundation models are missing. A robot that gets better by doing real work in the real world. For two weeks in the Stanford East Asia Library, Scanford scanned shelves, helped librarians, and improved the vision language model it depends on. The idea is very simple: Robots do useful work. They gather the real world data foundation models never see online. They fine tune their own model They go back out stronger A full loop. What they found in deployment: ✅ 2103 shelves scanned with multilingual, faded, occluded book spines ✅ 18.7 hours of librarian time saved ✅ Book ID accuracy jumped from 32.0 percent to 71.8 percent ✅ English OCR improved from 24.8 percent to 46.6 percent ✅ Chinese OCR improved from 30.8 percent to 38.0 percent The most interesting part is the shift. Robots do not only consume foundation models. They create the data these models are missing. A clean robot powered data flywheel. Work. Collect. Fine tune. Repeat. Thanks for sharing, Jenn Grannen! If you want the full write up: 📍Website: Paper: —- Weekly robotics and AI insights. Subscribe free:

Ilir Aliu

44,660 görüntüleme • 7 ay önce

Experiments in progress. The one on the right has been learning for ~3 hours, the one in the middle for ~1 hour, and the one on the left just started a few minutes ago. The initial motivation for making the physical Atari was just to commit ourselves to a subset of algorithms that can make progress in this setup. This commitment rules out algorithms that require billions of samples to learn (or worse, require multiple environments running in parallel). Atari games are simple enough that we should be able to show learning on them in a short amount of time with no prior knowledge. Since then, I've realized that this setup is also a good way to compare different paradigms in robotics in a principled way. These paradigms are sim2real, learning from tele-operated data, and learning directly on the robots. So far, I have observed that getting sim2real to work reliably is hard. It requires tweaks that don't scale. Policies that can play perfectly in simulation fall apart because of latencies and the messiness of the real world. These aspects could be modeled to improve the simulation, but not without sinking significant human engineering hours. I have higher hopes for learning from tele-operated data, but that requires a human to learn the task first. These experiments are on my to-do list. I have to learn to play some of the games well through the robot. I’m half-decent at playing Pong and Ms Pacman now. Learning directly on robots is looking like the most promising approach. This approach takes away pesky distribution shifts and makes it possible to have algorithms that continually improve with more data and time without any human intervention. It feels great to let experiments run overnight and wake up to find improved policies. With learning on robots, I should, in principle, be able to go on a long vacation and come back to find better policies for complex tasks beyond Atari games. Whether that is possible with current learning algorithms is a different question.

Experiments in progress. The one on the right has been learning for ~3 hours, the one in the middle for ~1 hour, and the one on the left just started a few minutes ago. The initial motivation for making the physical Atari was just to commit ourselves to a subset of algorithms that can make progress in this setup. This commitment rules out algorithms that require billions of samples to learn (or worse, require multiple environments running in parallel). Atari games are simple enough that we should be able to show learning on them in a short amount of time with no prior knowledge. Since then, I've realized that this setup is also a good way to compare different paradigms in robotics in a principled way. These paradigms are sim2real, learning from tele-operated data, and learning directly on the robots. So far, I have observed that getting sim2real to work reliably is hard. It requires tweaks that don't scale. Policies that can play perfectly in simulation fall apart because of latencies and the messiness of the real world. These aspects could be modeled to improve the simulation, but not without sinking significant human engineering hours. I have higher hopes for learning from tele-operated data, but that requires a human to learn the task first. These experiments are on my to-do list. I have to learn to play some of the games well through the robot. I’m half-decent at playing Pong and Ms Pacman now. Learning directly on robots is looking like the most promising approach. This approach takes away pesky distribution shifts and makes it possible to have algorithms that continually improve with more data and time without any human intervention. It feels great to let experiments run overnight and wake up to find improved policies. With learning on robots, I should, in principle, be able to go on a long vacation and come back to find better policies for complex tasks beyond Atari games. Whether that is possible with current learning algorithms is a different question.

Khurram Javed

52,110 görüntüleme • 6 ay önce

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

Nishanth Kumar

77,488 görüntüleme • 3 ay önce

Just collecting manipulation data isn’t enough for robots - they need to be able to move around in the world, which has a whole different set of challenges from pure manipulation. And bringing navigation and manipulation together in a single framework is even more challenging. Enter HERMES, from Zhecheng Yuan and Tianming Wei. This is a four-stage process in which human videos are used to set up an RL sim-to-real training pipeline in order to overcome differences between robot and human kinematics, and used together with a navigation foundation model to move around in a variety of environments. To learn more, join us as Zhecheng Yuan and Tianming Wei tell us about how they built their system to perform mobile dexterous manipulation from human videos in a variety of environments. Watch Episode #45 of RoboPapers today, hosted by Michael Cho - Rbt/Acc and Chris Paxton!

Just collecting manipulation data isn’t enough for robots - they need to be able to move around in the world, which has a whole different set of challenges from pure manipulation. And bringing navigation and manipulation together in a single framework is even more challenging. Enter HERMES, from Zhecheng Yuan and Tianming Wei. This is a four-stage process in which human videos are used to set up an RL sim-to-real training pipeline in order to overcome differences between robot and human kinematics, and used together with a navigation foundation model to move around in a variety of environments. To learn more, join us as Zhecheng Yuan and Tianming Wei tell us about how they built their system to perform mobile dexterous manipulation from human videos in a variety of environments. Watch Episode #45 of RoboPapers today, hosted by Michael Cho - Rbt/Acc and Chris Paxton!

RoboPapers

15,045 görüntüleme • 7 ay önce

This work makes a humanoid robot do simple parkour moves by looking with a depth camera and choosing the right move on the fly. The big deal is that it turns lots of small human moves into long, real-time robot behavior, without hand-coding every transition or retraining for each new course. A humanoid robot is usually good at steady walking, but it often fails when it has to do fast moves like jumping up, vaulting, or rolling, and then keep going to the next obstacle. The hard part is that you cannot easily collect training data for every possible obstacle shape, distance, and mistake, so robots end up learning a few moves that only work in a narrow setup. This work starts from short clips of real human parkour moves, like stepping over, vaulting, climbing, and rolling. It uses motion matching, which is basically a smart “pick the next clip that fits best right now” search, to stitch those short clips into a long, smooth plan that looks like a human doing a whole course. Then it trains a controller with reinforcement learning (RL), which means the robot learns by trial and error to copy that plan while staying balanced and not falling. After training separate expert controllers for different moves, it compresses them into 1 controller that uses only onboard depth sensing and a simple “go this fast in this direction” command. In real tests on a Unitree G1 humanoid, it can clear multiple obstacles in a row, adapt when obstacles get moved, and climb a wall up to 1.25m.

This work makes a humanoid robot do simple parkour moves by looking with a depth camera and choosing the right move on the fly. The big deal is that it turns lots of small human moves into long, real-time robot behavior, without hand-coding every transition or retraining for each new course. A humanoid robot is usually good at steady walking, but it often fails when it has to do fast moves like jumping up, vaulting, or rolling, and then keep going to the next obstacle. The hard part is that you cannot easily collect training data for every possible obstacle shape, distance, and mistake, so robots end up learning a few moves that only work in a narrow setup. This work starts from short clips of real human parkour moves, like stepping over, vaulting, climbing, and rolling. It uses motion matching, which is basically a smart “pick the next clip that fits best right now” search, to stitch those short clips into a long, smooth plan that looks like a human doing a whole course. Then it trains a controller with reinforcement learning (RL), which means the robot learns by trial and error to copy that plan while staying balanced and not falling. After training separate expert controllers for different moves, it compresses them into 1 controller that uses only onboard depth sensing and a simple “go this fast in this direction” command. In real tests on a Unitree G1 humanoid, it can clear multiple obstacles in a row, adapt when obstacles get moved, and climb a wall up to 1.25m.

Rohan Paul

37,121 görüntüleme • 4 ay önce