Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Check out our #ICRA2024 paper "Actor-Critic Model Predictive Control." Model-free #reinforcementlearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) benefits from robustness and online replanning capabilities. We combine both approaches by introducing a new framework called... Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an Actor-Critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in simulation and the real world with a quadcopter across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out-of-distribution behavior. Paper: Full Video with more details: Kudos to Ángel Romero, Yunlong Song IEEE ICRA University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)show more

Davide Scaramuzza

18,192 subscribers

34,874 Aufrufe • vor 2 Jahren •via X (Twitter)

Bildung Wissenschaft & Technologie Nachrichten & Politik #ICRA2024 #reinforcementlearning #ModelPredictiveControl

Anya Rossi• Live Now

Private livecam show

4 Kommentare

Profilbild von Kaustubh Sridhar

Kaustubh Sridharvor 2 Jahren

Hi Davide, we wrote a paper with a very similar idea but focused on discrete optimization a year ago: I believe it’s highly relevant to your interesting ICRA paper.

Profilbild von Davide Scaramuzza

Davide Scaramuzzavor 2 Jahren

Thanks for the pointer! We will read it!

Profilbild von Peter Soetens⚡

Peter Soetens⚡vor 2 Jahren

Davide, big fan of your work and publications, but the music in this video makes me nervous. I'm trying to focus on the contents and I get the constant feeling that something will explode any second. [/2cents]

Profilbild von Yura Kriachko

Yura Kriachkovor 2 Jahren

Do you plan to open-source the code or code structure?

Ähnliche Videos

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Davide Scaramuzza

26,960 Aufrufe • vor 5 Monaten

Check out our #RSS2024 paper "#MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints." Model Predictive Contouring Control (MPCC) has shown promising results for agile robotics applications, including car and drone racing. Existing approaches struggle to introduce safety considerations, often resulting in crashes. What does it take to drive or fly fast and safe? We enhance our former MPCC by incorporating spatial constraints that reliably prevent obstacle collisions, allowing planning the fastest trajectory within these safety limits. To improve performance, we leverage real-world data to refine the dynamic model. Our approach is the first to achieve a 100% success rate in real-world experiments. This safety benefit comes without compromising performance, as our method achieves lap times comparable to the best-performing state-based #ReinforcementLearning (RL) policies. Reference M. Krinner, A. Romero, L. Bauersfeld, M. Zeilinger, A. Carron, D. Scaramuzza, "MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints" Robotics, Science and Systems, 2024 PDF: Video: Kudos to Maria Krinner, Angel Romero Aguilar, Leonard Bauersfeld, Melanie Zeilinger, Andrea Carron! Ángel Romero Leonard Bauersfeld University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) #MPC #ModelPredictiveControl

Check out our #RSS2024 paper "#MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints." Model Predictive Contouring Control (MPCC) has shown promising results for agile robotics applications, including car and drone racing. Existing approaches struggle to introduce safety considerations, often resulting in crashes. What does it take to drive or fly fast and safe? We enhance our former MPCC by incorporating spatial constraints that reliably prevent obstacle collisions, allowing planning the fastest trajectory within these safety limits. To improve performance, we leverage real-world data to refine the dynamic model. Our approach is the first to achieve a 100% success rate in real-world experiments. This safety benefit comes without compromising performance, as our method achieves lap times comparable to the best-performing state-based #ReinforcementLearning (RL) policies. Reference M. Krinner, A. Romero, L. Bauersfeld, M. Zeilinger, A. Carron, D. Scaramuzza, "MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints" Robotics, Science and Systems, 2024 PDF: Video: Kudos to Maria Krinner, Angel Romero Aguilar, Leonard Bauersfeld, Melanie Zeilinger, Andrea Carron! Ángel Romero Leonard Bauersfeld University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) #MPC #ModelPredictiveControl

Davide Scaramuzza

17,888 Aufrufe • vor 2 Jahren

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

Physical Intelligence

431,552 Aufrufe • vor 3 Monaten

Check out our #ICRA2024 paper "Contrastive Initial State Buffer for Reinforcement Learning," which tackles the sample inefficiency in #ReinforcementLearning head-on. Code released! We introduce an approach agnostic to the underlying RL algorithm: the Contrastive Initial State Buffer. This tool strategically selects states from past experiences and uses them to initialize the agent in the environment to guide it toward more informative states. Our experiments on drone racing and legged locomotion show that our method achieves higher task performance while also speeding up training convergence. Reference: Nico Messikommer, Yunlong Song, Davide Scaramuzza Contrastive Initial State Buffer for Reinforcement Learning IEEE International Conference on Robotics and Automation (ICRA), 2024. PDF: Code: Video: Kudos to Messikommer Yunlong Song Aerial Core European Research Council (ERC) University of Zurich UZH Space Hub IEEE ICRA UZH Science

Check out our #ICRA2024 paper "Contrastive Initial State Buffer for Reinforcement Learning," which tackles the sample inefficiency in #ReinforcementLearning head-on. Code released! We introduce an approach agnostic to the underlying RL algorithm: the Contrastive Initial State Buffer. This tool strategically selects states from past experiences and uses them to initialize the agent in the environment to guide it toward more informative states. Our experiments on drone racing and legged locomotion show that our method achieves higher task performance while also speeding up training convergence. Reference: Nico Messikommer, Yunlong Song, Davide Scaramuzza Contrastive Initial State Buffer for Reinforcement Learning IEEE International Conference on Robotics and Automation (ICRA), 2024. PDF: Code: Video: Kudos to Messikommer Yunlong Song Aerial Core European Research Council (ERC) University of Zurich UZH Space Hub IEEE ICRA UZH Science

Davide Scaramuzza

13,846 Aufrufe • vor 2 Jahren

We are excited to share our #CORL2024 paper (oral) on "Learning Quadruped Locomotion Using Differentiable Simulation" done in collaboration with Sangbae Kim Massachusetts Institute of Technology (MIT). We present a new way to learn to walk in minutes without parallelization, outperforming PPO in sample efficiency! PDF: Video: We present a new framework for learning quadruped locomotion. By leveraging differentiable simulation for policy optimization, our approach achieves fast convergence and stable training, significantly outperforming model-free #ReinforcementLearning methods like PPO in sample efficiency. The key enabler is to combine a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. This work highlights one of the first successful real-world applications of differentiable simulation for quadruped robots, offering a compelling alternative to traditional RL methods. Kudos to Yunlong Song! UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) Massachusetts Institute of Technology (MIT)MechE

We are excited to share our #CORL2024 paper (oral) on "Learning Quadruped Locomotion Using Differentiable Simulation" done in collaboration with Sangbae Kim Massachusetts Institute of Technology (MIT). We present a new way to learn to walk in minutes without parallelization, outperforming PPO in sample efficiency! PDF: Video: We present a new framework for learning quadruped locomotion. By leveraging differentiable simulation for policy optimization, our approach achieves fast convergence and stable training, significantly outperforming model-free #ReinforcementLearning methods like PPO in sample efficiency. The key enabler is to combine a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. This work highlights one of the first successful real-world applications of differentiable simulation for quadruped robots, offering a compelling alternative to traditional RL methods. Kudos to Yunlong Song! UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) Massachusetts Institute of Technology (MIT)MechE

Davide Scaramuzza

15,533 Aufrufe • vor 1 Jahr

We are excited to share our latest work, "Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation", where a policy learns to adapt in the real world to unknown disturbances within 5 seconds, both with and without explicit state estimation, directly from visual features. Code released! PDF: Project Page: Starting from a simple analytical dynamics model, the system continuously learns residual dynamics from real-world data and embeds the refined model into a differentiable simulator. This enables fast, gradient-based policy updates that are far more sample-efficient than classical #ReinforcementLearning. We demonstrate rapid adaptation in <5 seconds in agile quadrotor control under challenging conditions, including added payloads, wind disturbances, and large sim-to-real gaps. In real-world experiments, our method reduces hovering error by up to 81% compared to L1-MPC and 55% compared to PPO-based adaptive methods. It also operates directly from visual features without explicit state estimation. Reference: “Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation” IEEE Robotics and Automation Letters, 2026 PDF: Video: Code: Website: Kudos to Michael Pan, Jiaxu Xing, Rudolf Reiter, Yifan Zhai, Elie Aljalbout! UZH Space Hub UZH IfI European Research Council (ERC) AUTOASSESS UZH Science University of Zurich

We are excited to share our latest work, "Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation", where a policy learns to adapt in the real world to unknown disturbances within 5 seconds, both with and without explicit state estimation, directly from visual features. Code released! PDF: Project Page: Starting from a simple analytical dynamics model, the system continuously learns residual dynamics from real-world data and embeds the refined model into a differentiable simulator. This enables fast, gradient-based policy updates that are far more sample-efficient than classical #ReinforcementLearning. We demonstrate rapid adaptation in <5 seconds in agile quadrotor control under challenging conditions, including added payloads, wind disturbances, and large sim-to-real gaps. In real-world experiments, our method reduces hovering error by up to 81% compared to L1-MPC and 55% compared to PPO-based adaptive methods. It also operates directly from visual features without explicit state estimation. Reference: “Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation” IEEE Robotics and Automation Letters, 2026 PDF: Video: Code: Website: Kudos to Michael Pan, Jiaxu Xing, Rudolf Reiter, Yifan Zhai, Elie Aljalbout! UZH Space Hub UZH IfI European Research Council (ERC) AUTOASSESS UZH Science University of Zurich

Davide Scaramuzza

19,144 Aufrufe • vor 5 Monaten

🎉 Diffusion-style annealing + sampling-based MPC can surpass RL, and seamlessly adapt to task parameters, all 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴-𝗳𝗿𝗲𝗲! We open sourced DIAL-MPC, the first training-free method for whole-body torque control using full-order dynamics 🧵

🎉 Diffusion-style annealing + sampling-based MPC can surpass RL, and seamlessly adapt to task parameters, all 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴-𝗳𝗿𝗲𝗲! We open sourced DIAL-MPC, the first training-free method for whole-body torque control using full-order dynamics 🧵

Haoru Xue

172,483 Aufrufe • vor 1 Jahr

We are excited to share our #ICRA2026 paper "Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight"! Paper: Video: Can we use Model-Based #ReinforcementLearning (MBRL) to fly a drone from pixels to commands? In this work, we train quadrotor navigation policies from scratch using #WorldModels, mapping raw onboard camera pixels directly to control commands, much like a human pilot! While model-free methods like PPO are sample-inefficient and struggle in this setting, we leverage #MBRL to train visuomotor policies capable of agile flight through a racetrack using only raw pixel observations, no explicit state estimation needed. A key finding: because our policies are trained end-to-end directly from pixels, we no longer need the perception-aware reward term used in previous methods. Instead, this behavior emerges naturally! The policies learn to guide the camera toward feature-rich areas of the observation space on their own. Kudos to Ángel Romero Ashwin Shenai Ismail Geles Elie Aljalbout Reference: "Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight" Angel Romero*, Ashwin Shenai*, Ismail Geles, Elie Aljalbout, Davide Scaramuzza IEEE International Conference on Robotics and Automation (ICRA), Vienna, 2026. European Research Council (ERC) AUTOASSESS UZH IfI University of Zurich UZH Science Prophesee SynSense UZH Space Hub Swiss Robotics NCCR Robotics

We are excited to share our #ICRA2026 paper "Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight"! Paper: Video: Can we use Model-Based #ReinforcementLearning (MBRL) to fly a drone from pixels to commands? In this work, we train quadrotor navigation policies from scratch using #WorldModels, mapping raw onboard camera pixels directly to control commands, much like a human pilot! While model-free methods like PPO are sample-inefficient and struggle in this setting, we leverage #MBRL to train visuomotor policies capable of agile flight through a racetrack using only raw pixel observations, no explicit state estimation needed. A key finding: because our policies are trained end-to-end directly from pixels, we no longer need the perception-aware reward term used in previous methods. Instead, this behavior emerges naturally! The policies learn to guide the camera toward feature-rich areas of the observation space on their own. Kudos to Ángel Romero Ashwin Shenai Ismail Geles Elie Aljalbout Reference: "Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight" Angel Romero, Ashwin Shenai, Ismail Geles, Elie Aljalbout, Davide Scaramuzza IEEE International Conference on Robotics and Automation (ICRA), Vienna, 2026. European Research Council (ERC) AUTOASSESS UZH IfI University of Zurich UZH Science Prophesee SynSense UZH Space Hub Swiss Robotics NCCR Robotics

Davide Scaramuzza

15,812 Aufrufe • vor 2 Monaten

We are excited to share that our paper “Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe” published in the IEEE Transactions on Robotics! One year in the making! PDF: Video: We focus on autonomous quadrotor flight in narrow pipes, where self-induced, unsteady airflow can significantly affect stability and control. Our approach closes the loop using real-time flow field measurements: an #EventCamera-based smoke velocimetry method provides low-latency airflow estimates, which are used by a learning-based disturbance estimator and integrated into a reinforcement-learning controller. The results are improved hovering and lateral translation performance, helping make flight in confined spaces more stable and safer! Kudos to Leonard Bauersfeld Leonard Bauersfeld! Reference: Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe, IEEE Transactions on Robotics (T-RO), 2026 European Research Council (ERC) UZH IfI UZH Space Hub University of Zurich Swiss Robotics UZH Science UZHai AUTOASSESS IEEE Transactions on Robotics (T-RO) #DVS Prophesee

We are excited to share that our paper “Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe” published in the IEEE Transactions on Robotics! One year in the making! PDF: Video: We focus on autonomous quadrotor flight in narrow pipes, where self-induced, unsteady airflow can significantly affect stability and control. Our approach closes the loop using real-time flow field measurements: an #EventCamera-based smoke velocimetry method provides low-latency airflow estimates, which are used by a learning-based disturbance estimator and integrated into a reinforcement-learning controller. The results are improved hovering and lateral translation performance, helping make flight in confined spaces more stable and safer! Kudos to Leonard Bauersfeld Leonard Bauersfeld! Reference: Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe, IEEE Transactions on Robotics (T-RO), 2026 European Research Council (ERC) UZH IfI UZH Space Hub University of Zurich Swiss Robotics UZH Science UZHai AUTOASSESS IEEE Transactions on Robotics (T-RO) #DVS Prophesee

Davide Scaramuzza

50,131 Aufrufe • vor 4 Monaten

We are thrilled to share our breakthrough research on "Agile Flight from Pixels without State Estimation," to be presented and live-demonstrated at #RSS2024 next week! You heard well: no state estimation means no explicit visual localization, no SLAM, no VIO, and no IMU! Paper: Video (Narrated): Last year, we demonstrated that #ReinforcementLearning (RL) policies could outperform world-champion drone-racing pilots using the same quadrotor hardware; however, unlike human pilots, these policies continuously estimated an explicit state from known gate positions, the camera feed, and inertial measurements (IMU). In this new work, we tackle the challenge of learning vision-based drone racing using an end-to-end reinforcement learning approach that eliminates the need for IMU data or explicit state estimation. Like professional pilots, we go directly from images to control commands. The training is facilitated by an asymmetric actor-critic with access to privileged information. To overcome the computational complexity during image-based RL training, we use an appropriate sensor representation, which can be efficiently simulated during training without rendering images. We achieve agile flight at speeds up to 40 km/h with accelerations up to 2 g's. Although our demonstration focuses on drone racing, we believe that our method has an impact beyond drone racing and can serve as a foundation for future research into real-world applications in structured environments. Besides the paper presentation, we will also give a live demo next Tuesday and Wednesday between and hrs at TU Delft: Reference: Ismail Geles*, Leonard Bauersfeld*, Angel Romero, Jiaxu Xing, Davide Scaramuzza "Demonstrating Agile Flight from Pixels without State Estimation" Robotics: Science and Systems (RSS), 2024. Kudos to Ismail Geles Leonard Bauersfeld Ángel Romero Jiaxu Xing! University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

We are thrilled to share our breakthrough research on "Agile Flight from Pixels without State Estimation," to be presented and live-demonstrated at #RSS2024 next week! You heard well: no state estimation means no explicit visual localization, no SLAM, no VIO, and no IMU! Paper: Video (Narrated): Last year, we demonstrated that #ReinforcementLearning (RL) policies could outperform world-champion drone-racing pilots using the same quadrotor hardware; however, unlike human pilots, these policies continuously estimated an explicit state from known gate positions, the camera feed, and inertial measurements (IMU). In this new work, we tackle the challenge of learning vision-based drone racing using an end-to-end reinforcement learning approach that eliminates the need for IMU data or explicit state estimation. Like professional pilots, we go directly from images to control commands. The training is facilitated by an asymmetric actor-critic with access to privileged information. To overcome the computational complexity during image-based RL training, we use an appropriate sensor representation, which can be efficiently simulated during training without rendering images. We achieve agile flight at speeds up to 40 km/h with accelerations up to 2 g's. Although our demonstration focuses on drone racing, we believe that our method has an impact beyond drone racing and can serve as a foundation for future research into real-world applications in structured environments. Besides the paper presentation, we will also give a live demo next Tuesday and Wednesday between and hrs at TU Delft: Reference: Ismail Geles, Leonard Bauersfeld, Angel Romero, Jiaxu Xing, Davide Scaramuzza "Demonstrating Agile Flight from Pixels without State Estimation" Robotics: Science and Systems (RSS), 2024. Kudos to Ismail Geles Leonard Bauersfeld Ángel Romero Jiaxu Xing! University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Davide Scaramuzza

27,891 Aufrufe • vor 2 Jahren

Today, we’re introducing KinetIQ, our own AI framework for end-to-end orchestration of humanoid robot fleets. One system, multiple robot embodiments. Industrial, service and home environments coordinated in real time. The framework consists of 4 cognitive layers: from high-level task allocation and workflow optimisation down to VLA-based task execution and RL-trained whole-body control. Watch how KinetIQ runs both our wheeled and bipedal robots. Read more on our blog:

Today, we’re introducing KinetIQ, our own AI framework for end-to-end orchestration of humanoid robot fleets. One system, multiple robot embodiments. Industrial, service and home environments coordinated in real time. The framework consists of 4 cognitive layers: from high-level task allocation and workflow optimisation down to VLA-based task execution and RL-trained whole-body control. Watch how KinetIQ runs both our wheeled and bipedal robots. Read more on our blog:

Humanoid

23,887 Aufrufe • vor 4 Monaten

Can an inexpensive, off-the-shelf IMU be the only sensor to estimate the full state (position, velocity, orientation) of a quadrotor flying through a track at high speed and even be on-pair with vision-based localization? The answer is yes, within certain limitations! In this #RAL2023 paper, we propose a learning-based odometry algorithm that couples a model-based filter driven by the inertial measurements with a learning-based module with access to the control commands. Our system outperforms by a large margin the state-of-the-art visual-inertial odometry (#VIO) algorithms and the state-of-the-art learned-inertial odometry algorithm, #TLIO, for the task of drone racing. Additionally, we show that our system is as accurate as a VIO algorithm that uses a camera to localize to a known map of the racing track. The main limitation of our approach is that it cannot generalize to trajectories that have not been seen at training time. However, in drone racing competitions, the track is known beforehand. Human pilots spend hours or even days of practice on the race track before the competition. Similarly, our system can be trained with the data collected during practice time and deployed during the competition. Future work will investigate how to generalize to trajectories not seen at training time. The code is released! Paper: Video: Code: Kudos to Giovanni Cioffi Leonard Bauersfeld Elia Kaufmann European Research Council (ERC) University of Zurich UZH Science UZH Space Hub NCCR Robotics Aerial Core #RAL2023 #IROS2023 #SLAM

Can an inexpensive, off-the-shelf IMU be the only sensor to estimate the full state (position, velocity, orientation) of a quadrotor flying through a track at high speed and even be on-pair with vision-based localization? The answer is yes, within certain limitations! In this #RAL2023 paper, we propose a learning-based odometry algorithm that couples a model-based filter driven by the inertial measurements with a learning-based module with access to the control commands. Our system outperforms by a large margin the state-of-the-art visual-inertial odometry (#VIO) algorithms and the state-of-the-art learned-inertial odometry algorithm, #TLIO, for the task of drone racing. Additionally, we show that our system is as accurate as a VIO algorithm that uses a camera to localize to a known map of the racing track. The main limitation of our approach is that it cannot generalize to trajectories that have not been seen at training time. However, in drone racing competitions, the track is known beforehand. Human pilots spend hours or even days of practice on the race track before the competition. Similarly, our system can be trained with the data collected during practice time and deployed during the competition. Future work will investigate how to generalize to trajectories not seen at training time. The code is released! Paper: Video: Code: Kudos to Giovanni Cioffi Leonard Bauersfeld Elia Kaufmann European Research Council (ERC) University of Zurich UZH Science UZH Space Hub NCCR Robotics Aerial Core #RAL2023 #IROS2023 #SLAM

Davide Scaramuzza

37,061 Aufrufe • vor 2 Jahren

Introducing INTELLECT-3: Scaling RL to a 100B+ MoE model on our end-to-end stack Achieving state-of-the-art performance for its size across math, code and reasoning Built using the same tools we put in your hands, from environments & evals, RL frameworks, sandboxes & more

Introducing INTELLECT-3: Scaling RL to a 100B+ MoE model on our end-to-end stack Achieving state-of-the-art performance for its size across math, code and reasoning Built using the same tools we put in your hands, from environments & evals, RL frameworks, sandboxes & more

Prime Intellect

1,137,660 Aufrufe • vor 7 Monaten

Glad that our work “Inference-Time Enhancement of Generative Robot Policies via Predictive World Modeling”, led by Han Qi, has been accepted to IEEE Robotics and Automation Letters! 🎉 We propose Generative Predictive Control (GPC): sample action proposals from a pretrained diffusion policy (“look back”), roll them out with a diffusion-based action-conditioned video world model (“look forward”), then rank or optimize the actions using either a learned reward model or VLM preferences. Conceptually, this is trajectory optimization / MPC with hybrid sampling + gradient optimization, interpreted through modern diffusion priors and video world models. Interestingly, we first posted the paper on arXiv in Feb 2025, when action-conditioned video world models for planning were still rare—now this direction is rapidly gaining traction. Still many open questions, e.g., • how to avoid local minima in planning • what representations work best for world models • how to balance physics priors vs. data-driven learning Paper:

Glad that our work “Inference-Time Enhancement of Generative Robot Policies via Predictive World Modeling”, led by Han Qi, has been accepted to IEEE Robotics and Automation Letters! 🎉 We propose Generative Predictive Control (GPC): sample action proposals from a pretrained diffusion policy (“look back”), roll them out with a diffusion-based action-conditioned video world model (“look forward”), then rank or optimize the actions using either a learned reward model or VLM preferences. Conceptually, this is trajectory optimization / MPC with hybrid sampling + gradient optimization, interpreted through modern diffusion priors and video world models. Interestingly, we first posted the paper on arXiv in Feb 2025, when action-conditioned video world models for planning were still rare—now this direction is rapidly gaining traction. Still many open questions, e.g., • how to avoid local minima in planning • what representations work best for world models • how to balance physics priors vs. data-driven learning Paper:

Heng Yang

18,994 Aufrufe • vor 3 Monaten

We are excited to share our #CORL2024 paper on learning quadrotor obstacle avoidance from the visual stream of a single #eventcamera! Trained entirely in simulation! We demonstrate obstacle avoidance both in the dark and in a forest up to 5m/s. PDF: Video: Project page: Event cameras are sensors that output per-pixel-level intensity changes at microsecond latency resolution; they feature nearly zero motion blur and high dynamic range but produce a very large volume of events under significant ego-motion and further lack a high-fidelity continuous-time sensor model in simulation, making direct #sim2real transfer not possible. By leveraging depth prediction as a pretext task, we pre-train a reactive obstacle avoidance policy with “approximated” simulated events and then fine-tune the perception component with limited events-and-depth real-world data. This technique bridges the sim2real gap for #eventcameras! As at the current state, there is no continuous-time sensor model for event cameras, we hope that this work can finally spur future research leveraging simulation for training event-vision-based policies to create faster, agile robots! Kudos to Anish Bhattacharya, @marcocannic, Vijay Kumar Nikolai Matni UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) GRASP Laboratory Penn Engineering

We are excited to share our #CORL2024 paper on learning quadrotor obstacle avoidance from the visual stream of a single #eventcamera! Trained entirely in simulation! We demonstrate obstacle avoidance both in the dark and in a forest up to 5m/s. PDF: Video: Project page: Event cameras are sensors that output per-pixel-level intensity changes at microsecond latency resolution; they feature nearly zero motion blur and high dynamic range but produce a very large volume of events under significant ego-motion and further lack a high-fidelity continuous-time sensor model in simulation, making direct #sim2real transfer not possible. By leveraging depth prediction as a pretext task, we pre-train a reactive obstacle avoidance policy with “approximated” simulated events and then fine-tune the perception component with limited events-and-depth real-world data. This technique bridges the sim2real gap for #eventcameras! As at the current state, there is no continuous-time sensor model for event cameras, we hope that this work can finally spur future research leveraging simulation for training event-vision-based policies to create faster, agile robots! Kudos to Anish Bhattacharya, @marcocannic, Vijay Kumar Nikolai Matni UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) GRASP Laboratory Penn Engineering

Davide Scaramuzza

17,219 Aufrufe • vor 1 Jahr

New release from Meta FAIR — Meta Motivo is a first-of-its-kind behavioral foundation model for controlling virtual physics-based humanoid agents for a wide range of complex whole-body tasks. The model is capable of expressing human-like behaviors and achieves performance competitive with task-specific methods and outperforms state-of-the-art unsupervised RL and model-based baselines. Try the demo ➡️ Get the model and code ➡️ We’re excited about how this research could pave the way for fully embodied agents, leading to more lifelike NPCs, democratization of character animation and new types of immersive experiences.

New release from Meta FAIR — Meta Motivo is a first-of-its-kind behavioral foundation model for controlling virtual physics-based humanoid agents for a wide range of complex whole-body tasks. The model is capable of expressing human-like behaviors and achieves performance competitive with task-specific methods and outperforms state-of-the-art unsupervised RL and model-based baselines. Try the demo ➡️ Get the model and code ➡️ We’re excited about how this research could pave the way for fully embodied agents, leading to more lifelike NPCs, democratization of character animation and new types of immersive experiences.

AI at Meta

129,055 Aufrufe • vor 1 Jahr

Loopy Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency paper page: With the introduction of diffusion-based video generation techniques, audio-conditioned human video generation has recently achieved significant breakthroughs in both the naturalness of motion and the synthesis of portrait details. Due to the limited control of audio signals in driving human motion, existing methods often add auxiliary spatial signals to stabilize movements, which may compromise the naturalness and freedom of motion. In this paper, we propose an end-to-end audio-only conditioned video diffusion model named Loopy. Specifically, we designed an inter- and intra-clip temporal module and an audio-to-latents module, enabling the model to leverage long-term motion information from the data to learn natural motion patterns and improving audio-portrait movement correlation. This method removes the need for manually specified spatial motion templates used in existing methods to constrain motion during inference. Extensive experiments show that Loopy outperforms recent audio-driven portrait diffusion models, delivering more lifelike and high-quality results across various scenarios.

AK

128,803 Aufrufe • vor 1 Jahr

New research: Quadruped robots climbing ladders! 🦾🚀 Using RL-based control and a hooked end-effector, we're expanding robot capabilities in industrial environments. More info in the paper: Video:

New research: Quadruped robots climbing ladders! 🦾🚀 Using RL-based control and a hooked end-effector, we're expanding robot capabilities in industrial environments. More info in the paper: Video:

Robotic Systems Lab

16,390 Aufrufe • vor 1 Jahr

Synchronize Dual Hands for Physics-Based Dexterous Guitar Playing discuss: We present a novel approach to synthesize dexterous motions for physically simulated hands in tasks that require coordination between the control of two hands with high temporal precision. Instead of directly learning a joint policy to control two hands, our approach performs bimanual control through cooperative learning where each hand is treated as an individual agent. The individual policies for each hand are first trained separately, and then synchronized through latent space manipulation in a centralized environment to serve as a joint policy for two-hand control. By doing so, we avoid directly performing policy learning in the joint state-action space of two hands with higher dimensions, greatly improving the overall training efficiency. We demonstrate the effectiveness of our proposed approach in the challenging guitar-playing task. The virtual guitarist trained by our approach can synthesize motions from unstructured reference data of general guitar-playing practice motions, and accurately play diverse rhythms with complex chord pressing and string picking patterns based on the input guitar tabs that do not exist in the references. Along with this paper, we provide the motion capture data that we collected as the reference for policy training.

Synchronize Dual Hands for Physics-Based Dexterous Guitar Playing discuss: We present a novel approach to synthesize dexterous motions for physically simulated hands in tasks that require coordination between the control of two hands with high temporal precision. Instead of directly learning a joint policy to control two hands, our approach performs bimanual control through cooperative learning where each hand is treated as an individual agent. The individual policies for each hand are first trained separately, and then synchronized through latent space manipulation in a centralized environment to serve as a joint policy for two-hand control. By doing so, we avoid directly performing policy learning in the joint state-action space of two hands with higher dimensions, greatly improving the overall training efficiency. We demonstrate the effectiveness of our proposed approach in the challenging guitar-playing task. The virtual guitarist trained by our approach can synthesize motions from unstructured reference data of general guitar-playing practice motions, and accurately play diverse rhythms with complex chord pressing and string picking patterns based on the input guitar tabs that do not exist in the references. Along with this paper, we provide the motion capture data that we collected as the reference for policy training.

AK

26,855 Aufrufe • vor 1 Jahr

In case you missed it, we recently launched "Post-training of LLMs," a short course where you'll: ✅ Understand when and why to use post-training methods like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning. ✅ Learn the concepts underlying the three post-training methods of SFT, DPO, and Online RL, their common use-cases, and how to curate high-quality data to effectively train a model using each method. ✅ Download a pre-trained model and implement post-training pipelines to turn a base model into an instruct model, change the identity of a chat assistant, and improve a model’s math capabilities. Learn more and enroll for free:

In case you missed it, we recently launched "Post-training of LLMs," a short course where you'll: ✅ Understand when and why to use post-training methods like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning. ✅ Learn the concepts underlying the three post-training methods of SFT, DPO, and Online RL, their common use-cases, and how to curate high-quality data to effectively train a model using each method. ✅ Download a pre-trained model and implement post-training pipelines to turn a base model into an instruct model, change the identity of a chat assistant, and improve a model’s math capabilities. Learn more and enroll for free:

DeepLearning.AI

16,771 Aufrufe • vor 11 Monaten