Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code:... Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHaishow more

Davide Scaramuzza

18,362 subscribers

27,090 Aufrufe • vor 6 Monaten •via X (Twitter)

#OptimalControl #ReinforcementLearning

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Check out our #ICRA2024 paper "Actor-Critic Model Predictive Control." Model-free #reinforcementlearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) benefits from robustness and online replanning capabilities. We combine both approaches by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an Actor-Critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in simulation and the real world with a quadcopter across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out-of-distribution behavior. Paper: Full Video with more details: Kudos to Ángel Romero, Yunlong Song IEEE ICRA University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Check out our #ICRA2024 paper "Actor-Critic Model Predictive Control." Model-free #reinforcementlearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) benefits from robustness and online replanning capabilities. We combine both approaches by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an Actor-Critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in simulation and the real world with a quadcopter across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out-of-distribution behavior. Paper: Full Video with more details: Kudos to Ángel Romero, Yunlong Song IEEE ICRA University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Davide Scaramuzza

34,889 Aufrufe • vor 2 Jahren

Check out our #RSS2024 paper "#MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints." Model Predictive Contouring Control (MPCC) has shown promising results for agile robotics applications, including car and drone racing. Existing approaches struggle to introduce safety considerations, often resulting in crashes. What does it take to drive or fly fast and safe? We enhance our former MPCC by incorporating spatial constraints that reliably prevent obstacle collisions, allowing planning the fastest trajectory within these safety limits. To improve performance, we leverage real-world data to refine the dynamic model. Our approach is the first to achieve a 100% success rate in real-world experiments. This safety benefit comes without compromising performance, as our method achieves lap times comparable to the best-performing state-based #ReinforcementLearning (RL) policies. Reference M. Krinner, A. Romero, L. Bauersfeld, M. Zeilinger, A. Carron, D. Scaramuzza, "MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints" Robotics, Science and Systems, 2024 PDF: Video: Kudos to Maria Krinner, Angel Romero Aguilar, Leonard Bauersfeld, Melanie Zeilinger, Andrea Carron! Ángel Romero Leonard Bauersfeld University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) #MPC #ModelPredictiveControl

Check out our #RSS2024 paper "#MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints." Model Predictive Contouring Control (MPCC) has shown promising results for agile robotics applications, including car and drone racing. Existing approaches struggle to introduce safety considerations, often resulting in crashes. What does it take to drive or fly fast and safe? We enhance our former MPCC by incorporating spatial constraints that reliably prevent obstacle collisions, allowing planning the fastest trajectory within these safety limits. To improve performance, we leverage real-world data to refine the dynamic model. Our approach is the first to achieve a 100% success rate in real-world experiments. This safety benefit comes without compromising performance, as our method achieves lap times comparable to the best-performing state-based #ReinforcementLearning (RL) policies. Reference M. Krinner, A. Romero, L. Bauersfeld, M. Zeilinger, A. Carron, D. Scaramuzza, "MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints" Robotics, Science and Systems, 2024 PDF: Video: Kudos to Maria Krinner, Angel Romero Aguilar, Leonard Bauersfeld, Melanie Zeilinger, Andrea Carron! Ángel Romero Leonard Bauersfeld University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) #MPC #ModelPredictiveControl

Davide Scaramuzza

17,903 Aufrufe • vor 2 Jahren

We are excited to share that our paper “Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe” published in the IEEE Transactions on Robotics! One year in the making! PDF: Video: We focus on autonomous quadrotor flight in narrow pipes, where self-induced, unsteady airflow can significantly affect stability and control. Our approach closes the loop using real-time flow field measurements: an #EventCamera-based smoke velocimetry method provides low-latency airflow estimates, which are used by a learning-based disturbance estimator and integrated into a reinforcement-learning controller. The results are improved hovering and lateral translation performance, helping make flight in confined spaces more stable and safer! Kudos to Leonard Bauersfeld Leonard Bauersfeld! Reference: Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe, IEEE Transactions on Robotics (T-RO), 2026 European Research Council (ERC) UZH IfI UZH Space Hub University of Zurich Swiss Robotics UZH Science UZHai AUTOASSESS IEEE Transactions on Robotics (T-RO) #DVS Prophesee

We are excited to share that our paper “Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe” published in the IEEE Transactions on Robotics! One year in the making! PDF: Video: We focus on autonomous quadrotor flight in narrow pipes, where self-induced, unsteady airflow can significantly affect stability and control. Our approach closes the loop using real-time flow field measurements: an #EventCamera-based smoke velocimetry method provides low-latency airflow estimates, which are used by a learning-based disturbance estimator and integrated into a reinforcement-learning controller. The results are improved hovering and lateral translation performance, helping make flight in confined spaces more stable and safer! Kudos to Leonard Bauersfeld Leonard Bauersfeld! Reference: Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe, IEEE Transactions on Robotics (T-RO), 2026 European Research Council (ERC) UZH IfI UZH Space Hub University of Zurich Swiss Robotics UZH Science UZHai AUTOASSESS IEEE Transactions on Robotics (T-RO) #DVS Prophesee

Davide Scaramuzza

50,192 Aufrufe • vor 5 Monaten

We are excited to share our latest work, "Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation", where a policy learns to adapt in the real world to unknown disturbances within 5 seconds, both with and without explicit state estimation, directly from visual features. Code released! PDF: Project Page: Starting from a simple analytical dynamics model, the system continuously learns residual dynamics from real-world data and embeds the refined model into a differentiable simulator. This enables fast, gradient-based policy updates that are far more sample-efficient than classical #ReinforcementLearning. We demonstrate rapid adaptation in <5 seconds in agile quadrotor control under challenging conditions, including added payloads, wind disturbances, and large sim-to-real gaps. In real-world experiments, our method reduces hovering error by up to 81% compared to L1-MPC and 55% compared to PPO-based adaptive methods. It also operates directly from visual features without explicit state estimation. Reference: “Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation” IEEE Robotics and Automation Letters, 2026 PDF: Video: Code: Website: Kudos to Michael Pan, Jiaxu Xing, Rudolf Reiter, Yifan Zhai, Elie Aljalbout! UZH Space Hub UZH IfI European Research Council (ERC) AUTOASSESS UZH Science University of Zurich

We are excited to share our latest work, "Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation", where a policy learns to adapt in the real world to unknown disturbances within 5 seconds, both with and without explicit state estimation, directly from visual features. Code released! PDF: Project Page: Starting from a simple analytical dynamics model, the system continuously learns residual dynamics from real-world data and embeds the refined model into a differentiable simulator. This enables fast, gradient-based policy updates that are far more sample-efficient than classical #ReinforcementLearning. We demonstrate rapid adaptation in <5 seconds in agile quadrotor control under challenging conditions, including added payloads, wind disturbances, and large sim-to-real gaps. In real-world experiments, our method reduces hovering error by up to 81% compared to L1-MPC and 55% compared to PPO-based adaptive methods. It also operates directly from visual features without explicit state estimation. Reference: “Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation” IEEE Robotics and Automation Letters, 2026 PDF: Video: Code: Website: Kudos to Michael Pan, Jiaxu Xing, Rudolf Reiter, Yifan Zhai, Elie Aljalbout! UZH Space Hub UZH IfI European Research Council (ERC) AUTOASSESS UZH Science University of Zurich

Davide Scaramuzza

19,184 Aufrufe • vor 6 Monaten

Check out our #ICRA2024 paper "Contrastive Initial State Buffer for Reinforcement Learning," which tackles the sample inefficiency in #ReinforcementLearning head-on. Code released! We introduce an approach agnostic to the underlying RL algorithm: the Contrastive Initial State Buffer. This tool strategically selects states from past experiences and uses them to initialize the agent in the environment to guide it toward more informative states. Our experiments on drone racing and legged locomotion show that our method achieves higher task performance while also speeding up training convergence. Reference: Nico Messikommer, Yunlong Song, Davide Scaramuzza Contrastive Initial State Buffer for Reinforcement Learning IEEE International Conference on Robotics and Automation (ICRA), 2024. PDF: Code: Video: Kudos to Messikommer Yunlong Song Aerial Core European Research Council (ERC) University of Zurich UZH Space Hub IEEE ICRA UZH Science

Check out our #ICRA2024 paper "Contrastive Initial State Buffer for Reinforcement Learning," which tackles the sample inefficiency in #ReinforcementLearning head-on. Code released! We introduce an approach agnostic to the underlying RL algorithm: the Contrastive Initial State Buffer. This tool strategically selects states from past experiences and uses them to initialize the agent in the environment to guide it toward more informative states. Our experiments on drone racing and legged locomotion show that our method achieves higher task performance while also speeding up training convergence. Reference: Nico Messikommer, Yunlong Song, Davide Scaramuzza Contrastive Initial State Buffer for Reinforcement Learning IEEE International Conference on Robotics and Automation (ICRA), 2024. PDF: Code: Video: Kudos to Messikommer Yunlong Song Aerial Core European Research Council (ERC) University of Zurich UZH Space Hub IEEE ICRA UZH Science

Davide Scaramuzza

13,846 Aufrufe • vor 2 Jahren

We are excited to share our #CORL2024 paper (oral) on "Learning Quadruped Locomotion Using Differentiable Simulation" done in collaboration with Sangbae Kim Massachusetts Institute of Technology (MIT). We present a new way to learn to walk in minutes without parallelization, outperforming PPO in sample efficiency! PDF: Video: We present a new framework for learning quadruped locomotion. By leveraging differentiable simulation for policy optimization, our approach achieves fast convergence and stable training, significantly outperforming model-free #ReinforcementLearning methods like PPO in sample efficiency. The key enabler is to combine a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. This work highlights one of the first successful real-world applications of differentiable simulation for quadruped robots, offering a compelling alternative to traditional RL methods. Kudos to Yunlong Song! UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) Massachusetts Institute of Technology (MIT)MechE

We are excited to share our #CORL2024 paper (oral) on "Learning Quadruped Locomotion Using Differentiable Simulation" done in collaboration with Sangbae Kim Massachusetts Institute of Technology (MIT). We present a new way to learn to walk in minutes without parallelization, outperforming PPO in sample efficiency! PDF: Video: We present a new framework for learning quadruped locomotion. By leveraging differentiable simulation for policy optimization, our approach achieves fast convergence and stable training, significantly outperforming model-free #ReinforcementLearning methods like PPO in sample efficiency. The key enabler is to combine a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. This work highlights one of the first successful real-world applications of differentiable simulation for quadruped robots, offering a compelling alternative to traditional RL methods. Kudos to Yunlong Song! UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) Massachusetts Institute of Technology (MIT)MechE

Davide Scaramuzza

15,533 Aufrufe • vor 1 Jahr

We are excited to share our #ICRA2026 paper "Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight"! Paper: Video: Can we use Model-Based #ReinforcementLearning (MBRL) to fly a drone from pixels to commands? In this work, we train quadrotor navigation policies from scratch using #WorldModels, mapping raw onboard camera pixels directly to control commands, much like a human pilot! While model-free methods like PPO are sample-inefficient and struggle in this setting, we leverage #MBRL to train visuomotor policies capable of agile flight through a racetrack using only raw pixel observations, no explicit state estimation needed. A key finding: because our policies are trained end-to-end directly from pixels, we no longer need the perception-aware reward term used in previous methods. Instead, this behavior emerges naturally! The policies learn to guide the camera toward feature-rich areas of the observation space on their own. Kudos to Ángel Romero Ashwin Shenai Ismail Geles Elie Aljalbout Reference: "Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight" Angel Romero*, Ashwin Shenai*, Ismail Geles, Elie Aljalbout, Davide Scaramuzza IEEE International Conference on Robotics and Automation (ICRA), Vienna, 2026. European Research Council (ERC) AUTOASSESS UZH IfI University of Zurich UZH Science Prophesee SynSense UZH Space Hub Swiss Robotics NCCR Robotics

We are excited to share our #ICRA2026 paper "Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight"! Paper: Video: Can we use Model-Based #ReinforcementLearning (MBRL) to fly a drone from pixels to commands? In this work, we train quadrotor navigation policies from scratch using #WorldModels, mapping raw onboard camera pixels directly to control commands, much like a human pilot! While model-free methods like PPO are sample-inefficient and struggle in this setting, we leverage #MBRL to train visuomotor policies capable of agile flight through a racetrack using only raw pixel observations, no explicit state estimation needed. A key finding: because our policies are trained end-to-end directly from pixels, we no longer need the perception-aware reward term used in previous methods. Instead, this behavior emerges naturally! The policies learn to guide the camera toward feature-rich areas of the observation space on their own. Kudos to Ángel Romero Ashwin Shenai Ismail Geles Elie Aljalbout Reference: "Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight" Angel Romero, Ashwin Shenai, Ismail Geles, Elie Aljalbout, Davide Scaramuzza IEEE International Conference on Robotics and Automation (ICRA), Vienna, 2026. European Research Council (ERC) AUTOASSESS UZH IfI University of Zurich UZH Science Prophesee SynSense UZH Space Hub Swiss Robotics NCCR Robotics

Davide Scaramuzza

15,965 Aufrufe • vor 3 Monaten

We are thrilled to share our breakthrough research on "Agile Flight from Pixels without State Estimation," to be presented and live-demonstrated at #RSS2024 next week! You heard well: no state estimation means no explicit visual localization, no SLAM, no VIO, and no IMU! Paper: Video (Narrated): Last year, we demonstrated that #ReinforcementLearning (RL) policies could outperform world-champion drone-racing pilots using the same quadrotor hardware; however, unlike human pilots, these policies continuously estimated an explicit state from known gate positions, the camera feed, and inertial measurements (IMU). In this new work, we tackle the challenge of learning vision-based drone racing using an end-to-end reinforcement learning approach that eliminates the need for IMU data or explicit state estimation. Like professional pilots, we go directly from images to control commands. The training is facilitated by an asymmetric actor-critic with access to privileged information. To overcome the computational complexity during image-based RL training, we use an appropriate sensor representation, which can be efficiently simulated during training without rendering images. We achieve agile flight at speeds up to 40 km/h with accelerations up to 2 g's. Although our demonstration focuses on drone racing, we believe that our method has an impact beyond drone racing and can serve as a foundation for future research into real-world applications in structured environments. Besides the paper presentation, we will also give a live demo next Tuesday and Wednesday between and hrs at TU Delft: Reference: Ismail Geles*, Leonard Bauersfeld*, Angel Romero, Jiaxu Xing, Davide Scaramuzza "Demonstrating Agile Flight from Pixels without State Estimation" Robotics: Science and Systems (RSS), 2024. Kudos to Ismail Geles Leonard Bauersfeld Ángel Romero Jiaxu Xing! University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

We are thrilled to share our breakthrough research on "Agile Flight from Pixels without State Estimation," to be presented and live-demonstrated at #RSS2024 next week! You heard well: no state estimation means no explicit visual localization, no SLAM, no VIO, and no IMU! Paper: Video (Narrated): Last year, we demonstrated that #ReinforcementLearning (RL) policies could outperform world-champion drone-racing pilots using the same quadrotor hardware; however, unlike human pilots, these policies continuously estimated an explicit state from known gate positions, the camera feed, and inertial measurements (IMU). In this new work, we tackle the challenge of learning vision-based drone racing using an end-to-end reinforcement learning approach that eliminates the need for IMU data or explicit state estimation. Like professional pilots, we go directly from images to control commands. The training is facilitated by an asymmetric actor-critic with access to privileged information. To overcome the computational complexity during image-based RL training, we use an appropriate sensor representation, which can be efficiently simulated during training without rendering images. We achieve agile flight at speeds up to 40 km/h with accelerations up to 2 g's. Although our demonstration focuses on drone racing, we believe that our method has an impact beyond drone racing and can serve as a foundation for future research into real-world applications in structured environments. Besides the paper presentation, we will also give a live demo next Tuesday and Wednesday between and hrs at TU Delft: Reference: Ismail Geles, Leonard Bauersfeld, Angel Romero, Jiaxu Xing, Davide Scaramuzza "Demonstrating Agile Flight from Pixels without State Estimation" Robotics: Science and Systems (RSS), 2024. Kudos to Ismail Geles Leonard Bauersfeld Ángel Romero Jiaxu Xing! University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Davide Scaramuzza

27,917 Aufrufe • vor 2 Jahren

Glad that our work “Inference-Time Enhancement of Generative Robot Policies via Predictive World Modeling”, led by Han Qi, has been accepted to IEEE Robotics and Automation Letters! 🎉 We propose Generative Predictive Control (GPC): sample action proposals from a pretrained diffusion policy (“look back”), roll them out with a diffusion-based action-conditioned video world model (“look forward”), then rank or optimize the actions using either a learned reward model or VLM preferences. Conceptually, this is trajectory optimization / MPC with hybrid sampling + gradient optimization, interpreted through modern diffusion priors and video world models. Interestingly, we first posted the paper on arXiv in Feb 2025, when action-conditioned video world models for planning were still rare—now this direction is rapidly gaining traction. Still many open questions, e.g., • how to avoid local minima in planning • what representations work best for world models • how to balance physics priors vs. data-driven learning Paper:

Glad that our work “Inference-Time Enhancement of Generative Robot Policies via Predictive World Modeling”, led by Han Qi, has been accepted to IEEE Robotics and Automation Letters! 🎉 We propose Generative Predictive Control (GPC): sample action proposals from a pretrained diffusion policy (“look back”), roll them out with a diffusion-based action-conditioned video world model (“look forward”), then rank or optimize the actions using either a learned reward model or VLM preferences. Conceptually, this is trajectory optimization / MPC with hybrid sampling + gradient optimization, interpreted through modern diffusion priors and video world models. Interestingly, we first posted the paper on arXiv in Feb 2025, when action-conditioned video world models for planning were still rare—now this direction is rapidly gaining traction. Still many open questions, e.g., • how to avoid local minima in planning • what representations work best for world models • how to balance physics priors vs. data-driven learning Paper:

Heng Yang

18,994 Aufrufe • vor 4 Monaten

We are excited to share our work “Event-Aided Sharp Radiance Field Reconstruction for Fast-Flying Drones” published in IEEE Transactions on Robotics IEEE Transactions on Robotics (T-RO), which tackles sharp radiance field reconstruction under agile drone motion, where RGB frames are heavily motion-blurred and pose priors become unreliable! 4 years in the making! Code & dataset released! PDF: Code & Dataset: Full Narrated Video: High-speed flight is essential for time- and battery-constrained missions (e.g., inspection, exploration, search & rescue). However, fast motion corrupts visual data with severe motion blur and introduces drift/noise in visual-inertial odometry, making NeRF-based 3D reconstruction particularly brittle. We propose a unified framework that leverages asynchronous #EventCamera streams together with motion-blurred frames to reconstruct high-fidelity radiance fields from agile drone flights. Our key idea is to embed event-image fusion directly into radiance field optimization while jointly refining a shared, continuous-time camera trajectory initialized from event-based VIO. This enables us to recover sharp radiance fields and accurate trajectories without ground-truth supervision during training. We validate our method on synthetic data and on real sequences captured by a drone flying up to 2 m/s. Despite severe blur and noisy pose priors, our method preserves fine scene details and achieves a performance gain of over 50% on real-world data compared to state-of-the-art methods. Kudos to Rong Zou and Marco Cannici! Marco Cannici Reference: Rong Zou*, Marco Cannici*, Davide Scaramuzza Event-Aided Sharp Radiance Field Reconstruction for Fast-Flying Drones IEEE Transactions on Robotics (T-RO), 2026 NCCR Robotics European Research Council (ERC) AUTOASSESS UZH IfI University of Zurich UZH Science Prophesee SynSense UZH Space Hub

We are excited to share our work “Event-Aided Sharp Radiance Field Reconstruction for Fast-Flying Drones” published in IEEE Transactions on Robotics IEEE Transactions on Robotics (T-RO), which tackles sharp radiance field reconstruction under agile drone motion, where RGB frames are heavily motion-blurred and pose priors become unreliable! 4 years in the making! Code & dataset released! PDF: Code & Dataset: Full Narrated Video: High-speed flight is essential for time- and battery-constrained missions (e.g., inspection, exploration, search & rescue). However, fast motion corrupts visual data with severe motion blur and introduces drift/noise in visual-inertial odometry, making NeRF-based 3D reconstruction particularly brittle. We propose a unified framework that leverages asynchronous #EventCamera streams together with motion-blurred frames to reconstruct high-fidelity radiance fields from agile drone flights. Our key idea is to embed event-image fusion directly into radiance field optimization while jointly refining a shared, continuous-time camera trajectory initialized from event-based VIO. This enables us to recover sharp radiance fields and accurate trajectories without ground-truth supervision during training. We validate our method on synthetic data and on real sequences captured by a drone flying up to 2 m/s. Despite severe blur and noisy pose priors, our method preserves fine scene details and achieves a performance gain of over 50% on real-world data compared to state-of-the-art methods. Kudos to Rong Zou and Marco Cannici! Marco Cannici Reference: Rong Zou, Marco Cannici, Davide Scaramuzza Event-Aided Sharp Radiance Field Reconstruction for Fast-Flying Drones IEEE Transactions on Robotics (T-RO), 2026 NCCR Robotics European Research Council (ERC) AUTOASSESS UZH IfI University of Zurich UZH Science Prophesee SynSense UZH Space Hub

Davide Scaramuzza

11,989 Aufrufe • vor 4 Monaten

We are excited to share our #CORL2024 paper on learning quadrotor obstacle avoidance from the visual stream of a single #eventcamera! Trained entirely in simulation! We demonstrate obstacle avoidance both in the dark and in a forest up to 5m/s. PDF: Video: Project page: Event cameras are sensors that output per-pixel-level intensity changes at microsecond latency resolution; they feature nearly zero motion blur and high dynamic range but produce a very large volume of events under significant ego-motion and further lack a high-fidelity continuous-time sensor model in simulation, making direct #sim2real transfer not possible. By leveraging depth prediction as a pretext task, we pre-train a reactive obstacle avoidance policy with “approximated” simulated events and then fine-tune the perception component with limited events-and-depth real-world data. This technique bridges the sim2real gap for #eventcameras! As at the current state, there is no continuous-time sensor model for event cameras, we hope that this work can finally spur future research leveraging simulation for training event-vision-based policies to create faster, agile robots! Kudos to Anish Bhattacharya, @marcocannic, Vijay Kumar Nikolai Matni UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) GRASP Laboratory Penn Engineering

We are excited to share our #CORL2024 paper on learning quadrotor obstacle avoidance from the visual stream of a single #eventcamera! Trained entirely in simulation! We demonstrate obstacle avoidance both in the dark and in a forest up to 5m/s. PDF: Video: Project page: Event cameras are sensors that output per-pixel-level intensity changes at microsecond latency resolution; they feature nearly zero motion blur and high dynamic range but produce a very large volume of events under significant ego-motion and further lack a high-fidelity continuous-time sensor model in simulation, making direct #sim2real transfer not possible. By leveraging depth prediction as a pretext task, we pre-train a reactive obstacle avoidance policy with “approximated” simulated events and then fine-tune the perception component with limited events-and-depth real-world data. This technique bridges the sim2real gap for #eventcameras! As at the current state, there is no continuous-time sensor model for event cameras, we hope that this work can finally spur future research leveraging simulation for training event-vision-based policies to create faster, agile robots! Kudos to Anish Bhattacharya, @marcocannic, Vijay Kumar Nikolai Matni UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) GRASP Laboratory Penn Engineering

Davide Scaramuzza

17,219 Aufrufe • vor 1 Jahr

Can an inexpensive, off-the-shelf IMU be the only sensor to estimate the full state (position, velocity, orientation) of a quadrotor flying through a track at high speed and even be on-pair with vision-based localization? The answer is yes, within certain limitations! In this #RAL2023 paper, we propose a learning-based odometry algorithm that couples a model-based filter driven by the inertial measurements with a learning-based module with access to the control commands. Our system outperforms by a large margin the state-of-the-art visual-inertial odometry (#VIO) algorithms and the state-of-the-art learned-inertial odometry algorithm, #TLIO, for the task of drone racing. Additionally, we show that our system is as accurate as a VIO algorithm that uses a camera to localize to a known map of the racing track. The main limitation of our approach is that it cannot generalize to trajectories that have not been seen at training time. However, in drone racing competitions, the track is known beforehand. Human pilots spend hours or even days of practice on the race track before the competition. Similarly, our system can be trained with the data collected during practice time and deployed during the competition. Future work will investigate how to generalize to trajectories not seen at training time. The code is released! Paper: Video: Code: Kudos to Giovanni Cioffi Leonard Bauersfeld Elia Kaufmann European Research Council (ERC) University of Zurich UZH Science UZH Space Hub NCCR Robotics Aerial Core #RAL2023 #IROS2023 #SLAM

Can an inexpensive, off-the-shelf IMU be the only sensor to estimate the full state (position, velocity, orientation) of a quadrotor flying through a track at high speed and even be on-pair with vision-based localization? The answer is yes, within certain limitations! In this #RAL2023 paper, we propose a learning-based odometry algorithm that couples a model-based filter driven by the inertial measurements with a learning-based module with access to the control commands. Our system outperforms by a large margin the state-of-the-art visual-inertial odometry (#VIO) algorithms and the state-of-the-art learned-inertial odometry algorithm, #TLIO, for the task of drone racing. Additionally, we show that our system is as accurate as a VIO algorithm that uses a camera to localize to a known map of the racing track. The main limitation of our approach is that it cannot generalize to trajectories that have not been seen at training time. However, in drone racing competitions, the track is known beforehand. Human pilots spend hours or even days of practice on the race track before the competition. Similarly, our system can be trained with the data collected during practice time and deployed during the competition. Future work will investigate how to generalize to trajectories not seen at training time. The code is released! Paper: Video: Code: Kudos to Giovanni Cioffi Leonard Bauersfeld Elia Kaufmann European Research Council (ERC) University of Zurich UZH Science UZH Space Hub NCCR Robotics Aerial Core #RAL2023 #IROS2023 #SLAM

Davide Scaramuzza

37,061 Aufrufe • vor 2 Jahren

Physics-based Motion Retargeting from Sparse Inputs paper page: Avatars are important to create interactive and immersive experiences in virtual worlds. One challenge in animating these characters to mimic a user's motion is that commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose. Another challenge is that an avatar might have a different skeleton structure than a human and the mapping between them is unclear. In this work we address both of these challenges. We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies. Our method uses reinforcement learning to train a policy to control characters in a physics simulator. We only require human motion capture data for training, without relying on artist-generated animations for each avatar. This allows us to use large motion capture datasets to train general policies that can track unseen users from real and sparse data in real-time. We demonstrate the feasibility of our approach on three characters with different skeleton structure: a dinosaur, a mouse-like creature and a human. We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available. We discuss and ablate the important components in our framework, specifically the kinematic retargeting step, the imitation, contact and action reward as well as our asymmetric actor-critic observations. We further explore the robustness of our method in a variety of settings including unbalancing, dancing and sports motions.

AK

106,527 Aufrufe • vor 3 Jahren

The DeepSeek-R1 paper is a gem! Highly encourage everyone to read it. It's clear that LLM reasoning capabilities can be learned in different ways. RL, if applied correctly and at scale, can lead to some really powerful and interesting scaling and emergent properties. There is more to RL than meets the eye! Here is my breakdown of the paper along with a few tests: The multi-state training might not make sense initially but they provide clues on optimizations that we can continue to tap into. Data quality is still very important for enhancing the usability of the LLM. Unlike other reasoning LLMs, DeepSeek-R1's training recipe and weights are open so we can build on top of it. This opens up exciting research opportunities. About the attached clip: the previous preview model wasn't able to solve this task. DeepSeek-R1 can solve this and many other tasks that o1 can solve. It's a very good model for coding and math.

The DeepSeek-R1 paper is a gem! Highly encourage everyone to read it. It's clear that LLM reasoning capabilities can be learned in different ways. RL, if applied correctly and at scale, can lead to some really powerful and interesting scaling and emergent properties. There is more to RL than meets the eye! Here is my breakdown of the paper along with a few tests: The multi-state training might not make sense initially but they provide clues on optimizations that we can continue to tap into. Data quality is still very important for enhancing the usability of the LLM. Unlike other reasoning LLMs, DeepSeek-R1's training recipe and weights are open so we can build on top of it. This opens up exciting research opportunities. About the attached clip: the previous preview model wasn't able to solve this task. DeepSeek-R1 can solve this and many other tasks that o1 can solve. It's a very good model for coding and math.

elvis

140,692 Aufrufe • vor 1 Jahr

Introducing VL-JEPA: Vision-Language Joint Embedding Predictive Architecture for streaming, live action recognition, retrieval, VQA, and classification tasks with better performance and higher efficiency than large VLMs. • VL-JEPA is the first non-generative model that can perform general-domain vision-language tasks in real-time, built on a joint embedding predictive architecture. • We demonstrate in controlled experiments that VL-JEPA, trained with latent space embedding prediction, outperforms VLMs that rely on data space token prediction. • We show that VL-JEPA delivers significant efficiency gains over VLMs for online video streaming applications, thanks to its non-autoregressive design and native support for selective decoding. • We highlight that our VL-JEPA model, with an unified model architecture, can effectively handle a wide range of classification, retrieval, and VQA tasks at the same time. by Delong Chen (陈德龙) Mustafa Shukor Théo Moutakanni Willy Jade Lei Yu Tejaswi Kasarla Allen Bolourchi Yann LeCun Pascale Fung

Introducing VL-JEPA: Vision-Language Joint Embedding Predictive Architecture for streaming, live action recognition, retrieval, VQA, and classification tasks with better performance and higher efficiency than large VLMs. • VL-JEPA is the first non-generative model that can perform general-domain vision-language tasks in real-time, built on a joint embedding predictive architecture. • We demonstrate in controlled experiments that VL-JEPA, trained with latent space embedding prediction, outperforms VLMs that rely on data space token prediction. • We show that VL-JEPA delivers significant efficiency gains over VLMs for online video streaming applications, thanks to its non-autoregressive design and native support for selective decoding. • We highlight that our VL-JEPA model, with an unified model architecture, can effectively handle a wide range of classification, retrieval, and VQA tasks at the same time. by Delong Chen (陈德龙) Mustafa Shukor Théo Moutakanni Willy Jade Lei Yu Tejaswi Kasarla Allen Bolourchi Yann LeCun Pascale Fung

Pascale Fung

90,144 Aufrufe • vor 7 Monaten

Agile Continuous Jumping in Discontinuous Terrains discuss: We focus on agile, continuous, and terrain-adaptive jumping of quadrupedal robots in discontinuous terrains such as stairs and stepping stones. Unlike single-step jumping, continuous jumping requires accurately executing highly dynamic motions over long horizons, which is challenging for existing approaches. To accomplish this task, we design a hierarchical learning and control framework, which consists of a learned heightmap predictor for robust terrain perception, a reinforcement-learning-based centroidal-level motion policy for versatile and terrain-adaptive planning, and a low-level model-based leg controller for accurate motion tracking. In addition, we minimize the sim-to-real gap by accurately modeling the hardware characteristics. Our framework enables a Unitree Go1 robot to perform agile and continuous jumps on human-sized stairs and sparse stepping stones, for the first time to the best of our knowledge. In particular, the robot can cross two stair steps in each jump and completes a 3.5m long, 2.8m high, 14-step staircase in 4.5 seconds. Moreover, the same policy outperforms baselines in various other parkour tasks, such as jumping over single horizontal or vertical discontinuities.

Agile Continuous Jumping in Discontinuous Terrains discuss: We focus on agile, continuous, and terrain-adaptive jumping of quadrupedal robots in discontinuous terrains such as stairs and stepping stones. Unlike single-step jumping, continuous jumping requires accurately executing highly dynamic motions over long horizons, which is challenging for existing approaches. To accomplish this task, we design a hierarchical learning and control framework, which consists of a learned heightmap predictor for robust terrain perception, a reinforcement-learning-based centroidal-level motion policy for versatile and terrain-adaptive planning, and a low-level model-based leg controller for accurate motion tracking. In addition, we minimize the sim-to-real gap by accurately modeling the hardware characteristics. Our framework enables a Unitree Go1 robot to perform agile and continuous jumps on human-sized stairs and sparse stepping stones, for the first time to the best of our knowledge. In particular, the robot can cross two stair steps in each jump and completes a 3.5m long, 2.8m high, 14-step staircase in 4.5 seconds. Moreover, the same policy outperforms baselines in various other parkour tasks, such as jumping over single horizontal or vertical discontinuities.

AK

35,794 Aufrufe • vor 1 Jahr

Israel-based Mentee Robotics has demonstrated a logistics workflow: two MenteeBot V3 humanoids work autonomously to pick and place totes. A Modular Agent System is preferred because it favors real-world robustness and lower compute needs over the End-to-End VLA model. Its architecture is composed of three components: - LLM Planner: Converts instructions into executable Robotic API Language code for reliable task decomposition and error handling. - Perception Stack: Uses pre-trained models (NeRF/3DGS, distilled vision) for scene understanding and navigation. - Control Policies: Reinforcement Learning (RL) models, trained at scale via Sim2Real, generate motor commands, enabling high-accuracy mobile manipulation. Crucially, the robot learns new tasks from a single demonstration in hours. Object tracking uses 3D geometry (STL/URDF) tracked in the video to define the RL reward function. Training is optimized using 'Automatic Curriculum Learning', which autonomously adjusts task difficulty based on robot performance, eliminating manual engineering. All computation runs onboard.

Israel-based Mentee Robotics has demonstrated a logistics workflow: two MenteeBot V3 humanoids work autonomously to pick and place totes. A Modular Agent System is preferred because it favors real-world robustness and lower compute needs over the End-to-End VLA model. Its architecture is composed of three components: - LLM Planner: Converts instructions into executable Robotic API Language code for reliable task decomposition and error handling. - Perception Stack: Uses pre-trained models (NeRF/3DGS, distilled vision) for scene understanding and navigation. - Control Policies: Reinforcement Learning (RL) models, trained at scale via Sim2Real, generate motor commands, enabling high-accuracy mobile manipulation. Crucially, the robot learns new tasks from a single demonstration in hours. Object tracking uses 3D geometry (STL/URDF) tracked in the video to define the RL reward function. Training is optimized using 'Automatic Curriculum Learning', which autonomously adjusts task difficulty based on robot performance, eliminating manual engineering. All computation runs onboard.

The Humanoid Hub

15,729 Aufrufe • vor 8 Monaten

Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives paper page: Given a set of calibrated images of a scene, we present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. While many approaches focus on recovering high-fidelity 3D scenes, we focus on parsing a scene into mid-level 3D representations made of a small set of textured primitives. Such representations are interpretable, easy to manipulate and suited for physics-based simulations. Moreover, unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images through differentiable rendering. Specifically, we model primitives as textured superquadric meshes and optimize their parameters from scratch with an image rendering loss. We highlight the importance of modeling transparency for each primitive, which is critical for optimization and also enables handling varying numbers of primitives. We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points, while providing amodal shape completions of unseen object regions. We compare our approach to the state of the art on diverse scenes from DTU, and demonstrate its robustness on real-life captures from BlendedMVS and Nerfstudio. We also showcase how our results can be used to effortlessly edit a scene or perform physical simulations.

Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives paper page: Given a set of calibrated images of a scene, we present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. While many approaches focus on recovering high-fidelity 3D scenes, we focus on parsing a scene into mid-level 3D representations made of a small set of textured primitives. Such representations are interpretable, easy to manipulate and suited for physics-based simulations. Moreover, unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images through differentiable rendering. Specifically, we model primitives as textured superquadric meshes and optimize their parameters from scratch with an image rendering loss. We highlight the importance of modeling transparency for each primitive, which is critical for optimization and also enables handling varying numbers of primitives. We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points, while providing amodal shape completions of unseen object regions. We compare our approach to the state of the art on diverse scenes from DTU, and demonstrate its robustness on real-life captures from BlendedMVS and Nerfstudio. We also showcase how our results can be used to effortlessly edit a scene or perform physical simulations.

AK

38,571 Aufrufe • vor 3 Jahren

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model with Gradio demo local demo: This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion. Despite achieving reasonable results, these approaches face challenges in maintaining temporal consistency throughout the animation due to the lack of temporal modeling and poor preservation of reference identity. In this work, we introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity. To achieve this, we first develop a video diffusion model to encode temporal information. Second, to maintain the appearance coherence across frames, we introduce a novel appearance encoder to retain the intricate details of the reference image. Leveraging these two innovations, we further employ a simple video fusion technique to encourage smooth transitions for long video animation. Empirical results demonstrate the superiority of our method over baseline approaches on two benchmarks. Notably, our approach outperforms the strongest baseline by over 38% in terms of video fidelity on the challenging TikTok dancing dataset. Code and model will be made available.

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model with Gradio demo local demo: This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion. Despite achieving reasonable results, these approaches face challenges in maintaining temporal consistency throughout the animation due to the lack of temporal modeling and poor preservation of reference identity. In this work, we introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity. To achieve this, we first develop a video diffusion model to encode temporal information. Second, to maintain the appearance coherence across frames, we introduce a novel appearance encoder to retain the intricate details of the reference image. Leveraging these two innovations, we further employ a simple video fusion technique to encourage smooth transitions for long video animation. Empirical results demonstrate the superiority of our method over baseline approaches on two benchmarks. Notably, our approach outperforms the strongest baseline by over 38% in terms of video fidelity on the challenging TikTok dancing dataset. Code and model will be made available.

AK

810,578 Aufrufe • vor 2 Jahren