正在加载视频...

视频加载失败

Check out our #ICRA2024 paper "Contrastive Initial State Buffer for Reinforcement Learning," which tackles the sample inefficiency in #ReinforcementLearning head-on. Code released! We introduce an approach agnostic to the underlying RL algorithm: the Contrastive Initial State Buffer. This tool strategically selects states from past experiences and uses them to...

13,846 次观看 • 2 年前 •via X (Twitter)

1 条评论

Stone Tao 的头像
Stone Tao2 年前

Interesting work! We recently also explored the angle of modifying the initial state distribution but in a learning from demos context: Same outcome: better initial state distribution in sim = far more sample efficient

相关视频

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Davide Scaramuzza

26,960 次观看 • 4 个月前

We are thrilled to share our breakthrough research on "Agile Flight from Pixels without State Estimation," to be presented and live-demonstrated at #RSS2024 next week! You heard well: no state estimation means no explicit visual localization, no SLAM, no VIO, and no IMU! Paper: Video (Narrated): Last year, we demonstrated that #ReinforcementLearning (RL) policies could outperform world-champion drone-racing pilots using the same quadrotor hardware; however, unlike human pilots, these policies continuously estimated an explicit state from known gate positions, the camera feed, and inertial measurements (IMU). In this new work, we tackle the challenge of learning vision-based drone racing using an end-to-end reinforcement learning approach that eliminates the need for IMU data or explicit state estimation. Like professional pilots, we go directly from images to control commands. The training is facilitated by an asymmetric actor-critic with access to privileged information. To overcome the computational complexity during image-based RL training, we use an appropriate sensor representation, which can be efficiently simulated during training without rendering images. We achieve agile flight at speeds up to 40 km/h with accelerations up to 2 g's. Although our demonstration focuses on drone racing, we believe that our method has an impact beyond drone racing and can serve as a foundation for future research into real-world applications in structured environments. Besides the paper presentation, we will also give a live demo next Tuesday and Wednesday between and hrs at TU Delft: Reference: Ismail Geles*, Leonard Bauersfeld*, Angel Romero, Jiaxu Xing, Davide Scaramuzza "Demonstrating Agile Flight from Pixels without State Estimation" Robotics: Science and Systems (RSS), 2024. Kudos to Ismail Geles Leonard Bauersfeld Ángel Romero Jiaxu Xing! University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Davide Scaramuzza

27,891 次观看 • 1 年前

Can an inexpensive, off-the-shelf IMU be the only sensor to estimate the full state (position, velocity, orientation) of a quadrotor flying through a track at high speed and even be on-pair with vision-based localization? The answer is yes, within certain limitations! In this #RAL2023 paper, we propose a learning-based odometry algorithm that couples a model-based filter driven by the inertial measurements with a learning-based module with access to the control commands. Our system outperforms by a large margin the state-of-the-art visual-inertial odometry (#VIO) algorithms and the state-of-the-art learned-inertial odometry algorithm, #TLIO, for the task of drone racing. Additionally, we show that our system is as accurate as a VIO algorithm that uses a camera to localize to a known map of the racing track. The main limitation of our approach is that it cannot generalize to trajectories that have not been seen at training time. However, in drone racing competitions, the track is known beforehand. Human pilots spend hours or even days of practice on the race track before the competition. Similarly, our system can be trained with the data collected during practice time and deployed during the competition. Future work will investigate how to generalize to trajectories not seen at training time. The code is released! Paper: Video: Code: Kudos to Giovanni Cioffi Leonard Bauersfeld Elia Kaufmann European Research Council (ERC) University of Zurich UZH Science UZH Space Hub NCCR Robotics Aerial Core #RAL2023 #IROS2023 #SLAM

Davide Scaramuzza

37,049 次观看 • 2 年前

We are excited to share our work “Event-Aided Sharp Radiance Field Reconstruction for Fast-Flying Drones” published in IEEE Transactions on Robotics IEEE Transactions on Robotics (T-RO), which tackles sharp radiance field reconstruction under agile drone motion, where RGB frames are heavily motion-blurred and pose priors become unreliable! 4 years in the making! Code & dataset released! PDF: Code & Dataset: Full Narrated Video: High-speed flight is essential for time- and battery-constrained missions (e.g., inspection, exploration, search & rescue). However, fast motion corrupts visual data with severe motion blur and introduces drift/noise in visual-inertial odometry, making NeRF-based 3D reconstruction particularly brittle. We propose a unified framework that leverages asynchronous #EventCamera streams together with motion-blurred frames to reconstruct high-fidelity radiance fields from agile drone flights. Our key idea is to embed event-image fusion directly into radiance field optimization while jointly refining a shared, continuous-time camera trajectory initialized from event-based VIO. This enables us to recover sharp radiance fields and accurate trajectories without ground-truth supervision during training. We validate our method on synthetic data and on real sequences captured by a drone flying up to 2 m/s. Despite severe blur and noisy pose priors, our method preserves fine scene details and achieves a performance gain of over 50% on real-world data compared to state-of-the-art methods. Kudos to Rong Zou and Marco Cannici! Marco Cannici Reference: Rong Zou*, Marco Cannici*, Davide Scaramuzza Event-Aided Sharp Radiance Field Reconstruction for Fast-Flying Drones IEEE Transactions on Robotics (T-RO), 2026 NCCR Robotics European Research Council (ERC) AUTOASSESS UZH IfI University of Zurich UZH Science Prophesee SynSense UZH Space Hub

Davide Scaramuzza

10,839 次观看 • 3 个月前

We are excited to share our latest work, "Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning," done in collaboration with Google DeepMind . Autonomous drones have reached superhuman speed in isolation, but what happens when multiple agents share the same airspace? Paper: Website: Video: Using league-based self-play, we train #ReinforcementLearning agents that race against a diverse, evolving population of opponents. Through this competitive training, sophisticated behaviors emerge without explicit programming: strategic overtaking, proactive collision avoidance, and even awareness of aerodynamic downwash from nearby drones. In real-world multi-player races at speeds exceeding 80kph (50 mph) and accelerations up to 7g, our agents outperform a five-time Swiss national drone racing champion while reducing collision rates by 50% compared to single-agent baselines. Crucially, training against diverse artificial opponents enables zero-shot generalization to human pilots, achieving over 90% race completion in mixed human-AI races with up to four competitors. A key insight: human pilots adopt riskier strategies when trailing, leading to more crashes under competitive pressure. Our learned policies, by contrast, maintain consistent safety margins regardless of race standing, a property essential for deploying autonomous systems alongside humans. Also, the multi-agent self-play policies are more robust than those trained independently, suggesting that training in competitive environments is not only key to winning races but also to learning safer, more reliable autonomy for real-world multi-robot systems. Kudos to Ismail Geles, Leonard Bauersfeld, Markus Wulfmeier! Ismail Geles Leonard Bauersfeld Markus Wulfmeier European Research Council (ERC) UZH IfI University of Zurich UZH Science UZH Space Hub Swiss Robotics NCCR Robotics

Davide Scaramuzza

14,315 次观看 • 20 天前

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion Robotics to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:07 - Is robot locomotion solved? 06:04 - Sim-to-real gap 08:58 - Adding semantics to policies 09:42 - Modular vs end-to-end architectures 10:29 - Planner model 12:21 - Adapting RL techniques from quadrupeds to humanoids 15:39 - Behind robot demos 18:09 - Humanoid robots in home environments 22:03 - Training approach 23:56 - VLA models 27:59 - Closing the sim-to-real gap 32:55 - Task orchestration using VLMs 36:38 - Tool use 38:10 - Model hierarchy 43:37 - Simulator versus simulation environment 44:57 - Combining imitation learning and reinforcement learning 46:42 - RL in real world versus RL in simulation 52:58 - Reward tuning and value functions in robotics 56:38 - Predictions 1:00:10 - Humanoids, quadropeds, and wheeled platforms 1:02:45 - Advice, recommended robot kits, and community pla

The TWIML AI Podcast

22,264 次观看 • 5 个月前