Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

The best way to get robust, high-quality robot performance is through reinforcement learning; but RL in either the real world or a traditional simulation has lots of limitations. Instead, Jiazhi Yang in RISE does RL in a compositional world model. Learn more ->

Chris Paxton

33,462 subscribers

33,766 görüntüleme • 23 gün önce •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

Robot policies must be both reliable and highly capable to be useful; the best way to achieve this level of performance is with reinforcement learning. However, for reinforcement learning you are usually stuck between two difficult options: reinforcement in the real world is often risky and expensive, while reinforcement learning in a traditional simulator takes a lot of engineering work and has a persistent sim-to-real gap. What if instead you could train your robot purely in a world model? RISE by Jiazhi Yang et al. uses a compositional world model to predict the future and evaluate progress. This allows for a self-improving pipeline, which learns a world model from real data and then learns how the robot should perform different tasks. This pipeline results in a data-driven way to improve policy performance from real data but without real-world reinforcement learning. Watch Episode #86 of RoboPapers, with Chris Paxton and Jiafei Duan, to learn more!

Robot policies must be both reliable and highly capable to be useful; the best way to achieve this level of performance is with reinforcement learning. However, for reinforcement learning you are usually stuck between two difficult options: reinforcement in the real world is often risky and expensive, while reinforcement learning in a traditional simulator takes a lot of engineering work and has a persistent sim-to-real gap. What if instead you could train your robot purely in a world model? RISE by Jiazhi Yang et al. uses a compositional world model to predict the future and evaluate progress. This allows for a self-improving pipeline, which learns a world model from real data and then learns how the robot should perform different tasks. This pipeline results in a data-driven way to improve policy performance from real data but without real-world reinforcement learning. Watch Episode #86 of RoboPapers, with Chris Paxton and Jiafei Duan, to learn more!

RoboPapers

38,334 görüntüleme • 23 gün önce

OpenDriveLab just dropped RISE — a new paradigm for robot learning. Instead of expensive real-world trials, robots learn in imagination. A compositional world model simulates futures → evaluates outcomes → updates policy.

OpenDriveLab just dropped RISE — a new paradigm for robot learning. Instead of expensive real-world trials, robots learn in imagination. A compositional world model simulates futures → evaluates outcomes → updates policy.

Robots Digest 🤖

23,754 görüntüleme • 2 ay önce

RL-100 Performant Robotic Manipulation with Real-World Reinforcement Learning

RL-100 Performant Robotic Manipulation with Real-World Reinforcement Learning

AK

15,364 görüntüleme • 8 ay önce

🤔Want a principled way to RL your diffusion model? Check Data-regularized Reinforcement Learning (DDRL)! Post-train NVIDIA #Cosmos World Foundation models with a million GPU hours! 🤯 Novel formulation ➡️ Theoretically integrates SFT into RL ➡️ Robust to Reward Hacking 🛑 Details: #DDRL #Diffusion #RL #NVIDIA #Cosmos

🤔Want a principled way to RL your diffusion model? Check Data-regularized Reinforcement Learning (DDRL)! Post-train NVIDIA #Cosmos World Foundation models with a million GPU hours! 🤯 Novel formulation ➡️ Theoretically integrates SFT into RL ➡️ Robust to Reward Hacking 🛑 Details: #DDRL #Diffusion #RL #NVIDIA #Cosmos

Haotian Ye

77,657 görüntüleme • 7 ay önce

RL in the real world presents some big challenges, but also some really big opportunities. In our new work, HIL-SERL, Charles Xu, Jeffrey Wu, Jianlan Luo show that real-world RL can learn a huge range of precise and robust tasks, and perform them much faster than imitation.

RL in the real world presents some big challenges, but also some really big opportunities. In our new work, HIL-SERL, Charles Xu, Jeffrey Wu, Jianlan Luo show that real-world RL can learn a huge range of precise and robust tasks, and perform them much faster than imitation.

Sergey Levine

36,489 görüntüleme • 1 yıl önce

So we did a bunch of projects with real world reinforcement learning - but it was often too inefficient to be practical to train tabula rasa. This suggests we need better priors, but acquiring these from on-robot data can often be expensive as well. In our recent work, we show that despite being fundamentally inaccurate, simulation can guide provide a cheap way to guide real-world RL finetuning to be super efficient! We propose Simulation-Guided Fine-Tuning (SGFT) - a simple paradigm for sim2real finetuning that uses simulation to provide reward shaping that accelerates real world RL finetuning *beyond* just providing an initialization. TLDR: Use value functions from sim to shape rewards for real-world RL, see large sample efficiency improvements 🧵(1/6)

So we did a bunch of projects with real world reinforcement learning - but it was often too inefficient to be practical to train tabula rasa. This suggests we need better priors, but acquiring these from on-robot data can often be expensive as well. In our recent work, we show that despite being fundamentally inaccurate, simulation can guide provide a cheap way to guide real-world RL finetuning to be super efficient! We propose Simulation-Guided Fine-Tuning (SGFT) - a simple paradigm for sim2real finetuning that uses simulation to provide reward shaping that accelerates real world RL finetuning beyond just providing an initialization. TLDR: Use value functions from sim to shape rewards for real-world RL, see large sample efficiency improvements 🧵(1/6)

Abhishek Gupta

13,631 görüntüleme • 1 yıl önce

We asked Sholto Douglas from Anthropic about the costs of RL (Reinforcement Learning) runs. "In Dario Amodei's essay, he said that RL runs cost only $1M back in December." "RL is a more naively parallelizable and scalable than pre-training." "With pre-training, you need everything in one big data center ideally. For RL, in theory, you could scale all over the world."

We asked Sholto Douglas from Anthropic about the costs of RL (Reinforcement Learning) runs. "In Dario Amodei's essay, he said that RL runs cost only $1M back in December." "RL is a more naively parallelizable and scalable than pre-training." "With pre-training, you need everything in one big data center ideally. For RL, in theory, you could scale all over the world."

TBPN

76,634 görüntüleme • 1 yıl önce

Full episode dropping soon! Geeking out with Jiazhi Yang on RISE: Self-Improving Robot Policy with Compositional World Model Co-hosted by Chris Paxton Jiafei Duan

Full episode dropping soon! Geeking out with Jiazhi Yang on RISE: Self-Improving Robot Policy with Compositional World Model Co-hosted by Chris Paxton Jiafei Duan

RoboPapers

13,597 görüntüleme • 27 gün önce

Using our brain simulator, we’ve trained a reinforcement learning agent to maximize bits per second. Here is the RL policy converting brain data to cursor control in simulation:

Using our brain simulator, we’ve trained a reinforcement learning agent to maximize bits per second. Here is the RL policy converting brain data to cursor control in simulation:

Neuralink

18,410 görüntüleme • 1 yıl önce

Watch this robot dog learn to walk from scratch in real time! Our new method, APRL, dynamically adjusts exploration constraints to enable fast and performant RL directly in the real world. APRL can also adapt to changes in the terrain. No simulation, no demos. A thread 👇

Watch this robot dog learn to walk from scratch in real time! Our new method, APRL, dynamically adjusts exploration constraints to enable fast and performant RL directly in the real world. APRL can also adapt to changes in the terrain. No simulation, no demos. A thread 👇

Sergey Levine

105,568 görüntüleme • 2 yıl önce

Meet XGO-Duck —our latest bipedal desktop robot trained via Reinforcement Learning (RL)! Through millions of trials in simulation, it has developed biological-like muscle memory. Even if it trips or gets pushed over, it automatically flaps its head and recovers seamlessly!

Meet XGO-Duck —our latest bipedal desktop robot trained via Reinforcement Learning (RL)! Through millions of trials in simulation, it has developed biological-like muscle memory. Even if it trips or gets pushed over, it automatically flaps its head and recovers seamlessly!

XGO Robot

14,386 görüntüleme • 29 gün önce

Introducing RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning. 7 real robot tasks, 900/900 successes. Up to 250 consecutive trials in one task, running 2 hours nonstop without failure. High success rate against physical disturbances, zero-shot, and few-shot adaptation Our first step toward a deployable robot learning system.

Introducing RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning. 7 real robot tasks, 900/900 successes. Up to 250 consecutive trials in one task, running 2 hours nonstop without failure. High success rate against physical disturbances, zero-shot, and few-shot adaptation Our first step toward a deployable robot learning system.

Kun Lei

90,789 görüntüleme • 8 ay önce

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion Robotics to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:07 - Is robot locomotion solved? 06:04 - Sim-to-real gap 08:58 - Adding semantics to policies 09:42 - Modular vs end-to-end architectures 10:29 - Planner model 12:21 - Adapting RL techniques from quadrupeds to humanoids 15:39 - Behind robot demos 18:09 - Humanoid robots in home environments 22:03 - Training approach 23:56 - VLA models 27:59 - Closing the sim-to-real gap 32:55 - Task orchestration using VLMs 36:38 - Tool use 38:10 - Model hierarchy 43:37 - Simulator versus simulation environment 44:57 - Combining imitation learning and reinforcement learning 46:42 - RL in real world versus RL in simulation 52:58 - Reward tuning and value functions in robotics 56:38 - Predictions 1:00:10 - Humanoids, quadropeds, and wheeled platforms 1:02:45 - Advice, recommended robot kits, and community pla

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion Robotics to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:07 - Is robot locomotion solved? 06:04 - Sim-to-real gap 08:58 - Adding semantics to policies 09:42 - Modular vs end-to-end architectures 10:29 - Planner model 12:21 - Adapting RL techniques from quadrupeds to humanoids 15:39 - Behind robot demos 18:09 - Humanoid robots in home environments 22:03 - Training approach 23:56 - VLA models 27:59 - Closing the sim-to-real gap 32:55 - Task orchestration using VLMs 36:38 - Tool use 38:10 - Model hierarchy 43:37 - Simulator versus simulation environment 44:57 - Combining imitation learning and reinforcement learning 46:42 - RL in real world versus RL in simulation 52:58 - Reward tuning and value functions in robotics 56:38 - Predictions 1:00:10 - Humanoids, quadropeds, and wheeled platforms 1:02:45 - Advice, recommended robot kits, and community pla

The TWIML AI Podcast

22,264 görüntüleme • 5 ay önce

Our model can now learn from its own experience with RL! Our new π*0.6 model can more than double throughput over a base model trained without RL, and can perform real-world tasks: making espresso drinks, folding diverse laundry, and assembling boxes. More in the thread below.

Our model can now learn from its own experience with RL! Our new π*0.6 model can more than double throughput over a base model trained without RL, and can perform real-world tasks: making espresso drinks, folding diverse laundry, and assembling boxes. More in the thread below.

Physical Intelligence

704,798 görüntüleme • 7 ay önce

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

Physical Intelligence

431,700 görüntüleme • 3 ay önce

Our work, "A Primer on SO(3) Action Representations in Deep Reinforcement Learning," was accepted to #ICLR2026! We provide a systematic study of action representation choices in RL, showing that they fundamentally impact training stability and performance. #Robotics #AI #RL

Our work, "A Primer on SO(3) Action Representations in Deep Reinforcement Learning," was accepted to #ICLR2026! We provide a systematic study of action representation choices in RL, showing that they fundamentally impact training stability and performance. #Robotics #AI #RL

Learning Systems and Robotics Lab (is hiring!)

49,655 görüntüleme • 4 ay önce

These are not CGI. Reinforcement learning is so back. When operating on strings, it gives us o3. When operating on physical motors, it gives us a perfect humanoid backflip and a robot creature that out-maneuvers almost every animal on earth. RL is one of the only learning algorithms that can master both the world of bits and the world of atoms. Give me a reward function, and I shall move the world. 2025, Year of RL.

These are not CGI. Reinforcement learning is so back. When operating on strings, it gives us o3. When operating on physical motors, it gives us a perfect humanoid backflip and a robot creature that out-maneuvers almost every animal on earth. RL is one of the only learning algorithms that can master both the world of bits and the world of atoms. Give me a reward function, and I shall move the world. 2025, Year of RL.

Jim Fan

356,732 görüntüleme • 1 yıl önce

In order for robots to be deployed in the real world, performing tasks of real value, they must be reliable. Unfortunately, even more, most robotic demos work maybe 70-80% of the time at best. The way to get better reliability is to do real-world reinforcement learning: having the robot teach itself how to perform the task up to a high level of success. The key to doing this is to start with a core of expert human data, use that to train a policy then iteratively improve it, until finally finishing with on-policy reinforcement learning. Kun Lei talks through a unified framework for imitation and reinforcement learning based on PPO, which enables this improvement process. In this episode, Kun Lei explains the theory behind his reinforcement learning method and how it allowed his robot to run in a shopping mall juicing oranges for seven hours at a time, among experiments on a wide variety of tasks and embodiments. Watch episode 58 of RoboPapers now, hosted by Michael Cho - Rbt/Acc and Chris Paxton!

In order for robots to be deployed in the real world, performing tasks of real value, they must be reliable. Unfortunately, even more, most robotic demos work maybe 70-80% of the time at best. The way to get better reliability is to do real-world reinforcement learning: having the robot teach itself how to perform the task up to a high level of success. The key to doing this is to start with a core of expert human data, use that to train a policy then iteratively improve it, until finally finishing with on-policy reinforcement learning. Kun Lei talks through a unified framework for imitation and reinforcement learning based on PPO, which enables this improvement process. In this episode, Kun Lei explains the theory behind his reinforcement learning method and how it allowed his robot to run in a shopping mall juicing oranges for seven hours at a time, among experiments on a wide variety of tasks and embodiments. Watch episode 58 of RoboPapers now, hosted by Michael Cho - Rbt/Acc and Chris Paxton!

RoboPapers

18,813 görüntüleme • 5 ay önce

Let’s do 🍒 Cherry Picking with Reinforcement Learning - 🥢 Dynamic fine manipulation with chopsticks - 🤖 Only 30 minutes of real world interactions - ⛔️ Too lazy for parameter tuning = off-the-shelf RL algo + default params + 3 seeds in real world

Let’s do 🍒 Cherry Picking with Reinforcement Learning - 🥢 Dynamic fine manipulation with chopsticks - 🤖 Only 30 minutes of real world interactions - ⛔️ Too lazy for parameter tuning = off-the-shelf RL algo + default params + 3 seeds in real world

Kay - Liyiming Ke

51,182 görüntüleme • 3 yıl önce

Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL 2. Distributional Successor Features Enable Zero-Shot Policy Optimization 3. Learning to Cooperate with Humans using Generative Agents 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Find more details on each paper and where to find us in this thread (1/6)

Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL 2. Distributional Successor Features Enable Zero-Shot Policy Optimization 3. Learning to Cooperate with Humans using Generative Agents 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Find more details on each paper and where to find us in this thread (1/6)

Abhishek Gupta

10,802 görüntüleme • 1 yıl önce