正在加载视频...

视频加载失败

Check out our #ICRA2024 paper "Actor-Critic Model Predictive Control." Model-free #reinforcementlearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) benefits from robustness and online replanning capabilities. We combine both approaches by introducing a new framework called...

4 条评论

Kaustubh Sridhar 的头像
Kaustubh Sridhar2 年前

Hi Davide, we wrote a paper with a very similar idea but focused on discrete optimization a year ago: I believe it’s highly relevant to your interesting ICRA paper.

Davide Scaramuzza 的头像
Davide Scaramuzza2 年前

Thanks for the pointer! We will read it!

Peter Soetens⚡ 的头像
Peter Soetens⚡2 年前

Davide, big fan of your work and publications, but the music in this video makes me nervous. I'm trying to focus on the contents and I get the constant feeling that something will explode any second. [/2cents]

Yura Kriachko 的头像
Yura Kriachko2 年前

Do you plan to open-source the code or code structure?

相关视频

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Davide Scaramuzza

26,960 次观看 • 5 个月前

We are thrilled to share our breakthrough research on "Agile Flight from Pixels without State Estimation," to be presented and live-demonstrated at #RSS2024 next week! You heard well: no state estimation means no explicit visual localization, no SLAM, no VIO, and no IMU! Paper: Video (Narrated): Last year, we demonstrated that #ReinforcementLearning (RL) policies could outperform world-champion drone-racing pilots using the same quadrotor hardware; however, unlike human pilots, these policies continuously estimated an explicit state from known gate positions, the camera feed, and inertial measurements (IMU). In this new work, we tackle the challenge of learning vision-based drone racing using an end-to-end reinforcement learning approach that eliminates the need for IMU data or explicit state estimation. Like professional pilots, we go directly from images to control commands. The training is facilitated by an asymmetric actor-critic with access to privileged information. To overcome the computational complexity during image-based RL training, we use an appropriate sensor representation, which can be efficiently simulated during training without rendering images. We achieve agile flight at speeds up to 40 km/h with accelerations up to 2 g's. Although our demonstration focuses on drone racing, we believe that our method has an impact beyond drone racing and can serve as a foundation for future research into real-world applications in structured environments. Besides the paper presentation, we will also give a live demo next Tuesday and Wednesday between and hrs at TU Delft: Reference: Ismail Geles*, Leonard Bauersfeld*, Angel Romero, Jiaxu Xing, Davide Scaramuzza "Demonstrating Agile Flight from Pixels without State Estimation" Robotics: Science and Systems (RSS), 2024. Kudos to Ismail Geles Leonard Bauersfeld Ángel Romero Jiaxu Xing! University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Davide Scaramuzza

27,891 次观看 • 2 年前

Can an inexpensive, off-the-shelf IMU be the only sensor to estimate the full state (position, velocity, orientation) of a quadrotor flying through a track at high speed and even be on-pair with vision-based localization? The answer is yes, within certain limitations! In this #RAL2023 paper, we propose a learning-based odometry algorithm that couples a model-based filter driven by the inertial measurements with a learning-based module with access to the control commands. Our system outperforms by a large margin the state-of-the-art visual-inertial odometry (#VIO) algorithms and the state-of-the-art learned-inertial odometry algorithm, #TLIO, for the task of drone racing. Additionally, we show that our system is as accurate as a VIO algorithm that uses a camera to localize to a known map of the racing track. The main limitation of our approach is that it cannot generalize to trajectories that have not been seen at training time. However, in drone racing competitions, the track is known beforehand. Human pilots spend hours or even days of practice on the race track before the competition. Similarly, our system can be trained with the data collected during practice time and deployed during the competition. Future work will investigate how to generalize to trajectories not seen at training time. The code is released! Paper: Video: Code: Kudos to Giovanni Cioffi Leonard Bauersfeld Elia Kaufmann European Research Council (ERC) University of Zurich UZH Science UZH Space Hub NCCR Robotics Aerial Core #RAL2023 #IROS2023 #SLAM

Davide Scaramuzza

37,061 次观看 • 2 年前

Physics-based Motion Retargeting from Sparse Inputs paper page: Avatars are important to create interactive and immersive experiences in virtual worlds. One challenge in animating these characters to mimic a user's motion is that commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose. Another challenge is that an avatar might have a different skeleton structure than a human and the mapping between them is unclear. In this work we address both of these challenges. We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies. Our method uses reinforcement learning to train a policy to control characters in a physics simulator. We only require human motion capture data for training, without relying on artist-generated animations for each avatar. This allows us to use large motion capture datasets to train general policies that can track unseen users from real and sparse data in real-time. We demonstrate the feasibility of our approach on three characters with different skeleton structure: a dinosaur, a mouse-like creature and a human. We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available. We discuss and ablate the important components in our framework, specifically the kinematic retargeting step, the imitation, contact and action reward as well as our asymmetric actor-critic observations. We further explore the robustness of our method in a variety of settings including unbalancing, dancing and sports motions.

AK

106,519 次观看 • 3 年前