Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Check out our #ICRA2024 paper "Contrastive Initial State Buffer for Reinforcement Learning," which tackles the sample inefficiency in #ReinforcementLearning head-on. Code released! We introduce an approach agnostic to the underlying RL algorithm: the Contrastive Initial State Buffer. This tool strategically selects states from past experiences and uses them to... initialize the agent in the environment to guide it toward more informative states. Our experiments on drone racing and legged locomotion show that our method achieves higher task performance while also speeding up training convergence. Reference: Nico Messikommer, Yunlong Song, Davide Scaramuzza Contrastive Initial State Buffer for Reinforcement Learning IEEE International Conference on Robotics and Automation (ICRA), 2024. PDF: Code: Video: Kudos to Messikommer Yunlong Song Aerial Core European Research Council (ERC) University of Zurich UZH Space Hub IEEE ICRA UZH Scienceshow more

Davide Scaramuzza

18,213 subscribers

13,846 Aufrufe • vor 2 Jahren •via X (Twitter)

#ICRA2024 #ReinforcementLearning

Anya Rossi• Live Now

Private livecam show

1 Kommentare

Profilbild von Stone Tao

Stone Taovor 2 Jahren

Interesting work! We recently also explored the angle of modifying the initial state distribution but in a learning from demos context: Same outcome: better initial state distribution in sim = far more sample efficient

Ähnliche Videos

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Davide Scaramuzza

27,090 Aufrufe • vor 6 Monaten

We are excited to share that our paper “Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe” published in the IEEE Transactions on Robotics! One year in the making! PDF: Video: We focus on autonomous quadrotor flight in narrow pipes, where self-induced, unsteady airflow can significantly affect stability and control. Our approach closes the loop using real-time flow field measurements: an #EventCamera-based smoke velocimetry method provides low-latency airflow estimates, which are used by a learning-based disturbance estimator and integrated into a reinforcement-learning controller. The results are improved hovering and lateral translation performance, helping make flight in confined spaces more stable and safer! Kudos to Leonard Bauersfeld Leonard Bauersfeld! Reference: Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe, IEEE Transactions on Robotics (T-RO), 2026 European Research Council (ERC) UZH IfI UZH Space Hub University of Zurich Swiss Robotics UZH Science UZHai AUTOASSESS IEEE Transactions on Robotics (T-RO) #DVS Prophesee

We are excited to share that our paper “Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe” published in the IEEE Transactions on Robotics! One year in the making! PDF: Video: We focus on autonomous quadrotor flight in narrow pipes, where self-induced, unsteady airflow can significantly affect stability and control. Our approach closes the loop using real-time flow field measurements: an #EventCamera-based smoke velocimetry method provides low-latency airflow estimates, which are used by a learning-based disturbance estimator and integrated into a reinforcement-learning controller. The results are improved hovering and lateral translation performance, helping make flight in confined spaces more stable and safer! Kudos to Leonard Bauersfeld Leonard Bauersfeld! Reference: Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe, IEEE Transactions on Robotics (T-RO), 2026 European Research Council (ERC) UZH IfI UZH Space Hub University of Zurich Swiss Robotics UZH Science UZHai AUTOASSESS IEEE Transactions on Robotics (T-RO) #DVS Prophesee

Davide Scaramuzza

50,192 Aufrufe • vor 5 Monaten

We are excited to share our #ICRA2026 paper "Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight"! Paper: Video: Can we use Model-Based #ReinforcementLearning (MBRL) to fly a drone from pixels to commands? In this work, we train quadrotor navigation policies from scratch using #WorldModels, mapping raw onboard camera pixels directly to control commands, much like a human pilot! While model-free methods like PPO are sample-inefficient and struggle in this setting, we leverage #MBRL to train visuomotor policies capable of agile flight through a racetrack using only raw pixel observations, no explicit state estimation needed. A key finding: because our policies are trained end-to-end directly from pixels, we no longer need the perception-aware reward term used in previous methods. Instead, this behavior emerges naturally! The policies learn to guide the camera toward feature-rich areas of the observation space on their own. Kudos to Ángel Romero Ashwin Shenai Ismail Geles Elie Aljalbout Reference: "Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight" Angel Romero*, Ashwin Shenai*, Ismail Geles, Elie Aljalbout, Davide Scaramuzza IEEE International Conference on Robotics and Automation (ICRA), Vienna, 2026. European Research Council (ERC) AUTOASSESS UZH IfI University of Zurich UZH Science Prophesee SynSense UZH Space Hub Swiss Robotics NCCR Robotics

We are excited to share our #ICRA2026 paper "Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight"! Paper: Video: Can we use Model-Based #ReinforcementLearning (MBRL) to fly a drone from pixels to commands? In this work, we train quadrotor navigation policies from scratch using #WorldModels, mapping raw onboard camera pixels directly to control commands, much like a human pilot! While model-free methods like PPO are sample-inefficient and struggle in this setting, we leverage #MBRL to train visuomotor policies capable of agile flight through a racetrack using only raw pixel observations, no explicit state estimation needed. A key finding: because our policies are trained end-to-end directly from pixels, we no longer need the perception-aware reward term used in previous methods. Instead, this behavior emerges naturally! The policies learn to guide the camera toward feature-rich areas of the observation space on their own. Kudos to Ángel Romero Ashwin Shenai Ismail Geles Elie Aljalbout Reference: "Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight" Angel Romero, Ashwin Shenai, Ismail Geles, Elie Aljalbout, Davide Scaramuzza IEEE International Conference on Robotics and Automation (ICRA), Vienna, 2026. European Research Council (ERC) AUTOASSESS UZH IfI University of Zurich UZH Science Prophesee SynSense UZH Space Hub Swiss Robotics NCCR Robotics

Davide Scaramuzza

15,986 Aufrufe • vor 3 Monaten

Check out our #ICRA2024 paper "Actor-Critic Model Predictive Control." Model-free #reinforcementlearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) benefits from robustness and online replanning capabilities. We combine both approaches by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an Actor-Critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in simulation and the real world with a quadcopter across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out-of-distribution behavior. Paper: Full Video with more details: Kudos to Ángel Romero, Yunlong Song IEEE ICRA University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Check out our #ICRA2024 paper "Actor-Critic Model Predictive Control." Model-free #reinforcementlearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) benefits from robustness and online replanning capabilities. We combine both approaches by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an Actor-Critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in simulation and the real world with a quadcopter across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out-of-distribution behavior. Paper: Full Video with more details: Kudos to Ángel Romero, Yunlong Song IEEE ICRA University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Davide Scaramuzza

34,889 Aufrufe • vor 2 Jahren

We are excited to share our #CORL2024 paper (oral) on "Learning Quadruped Locomotion Using Differentiable Simulation" done in collaboration with Sangbae Kim Massachusetts Institute of Technology (MIT). We present a new way to learn to walk in minutes without parallelization, outperforming PPO in sample efficiency! PDF: Video: We present a new framework for learning quadruped locomotion. By leveraging differentiable simulation for policy optimization, our approach achieves fast convergence and stable training, significantly outperforming model-free #ReinforcementLearning methods like PPO in sample efficiency. The key enabler is to combine a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. This work highlights one of the first successful real-world applications of differentiable simulation for quadruped robots, offering a compelling alternative to traditional RL methods. Kudos to Yunlong Song! UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) Massachusetts Institute of Technology (MIT)MechE

We are excited to share our #CORL2024 paper (oral) on "Learning Quadruped Locomotion Using Differentiable Simulation" done in collaboration with Sangbae Kim Massachusetts Institute of Technology (MIT). We present a new way to learn to walk in minutes without parallelization, outperforming PPO in sample efficiency! PDF: Video: We present a new framework for learning quadruped locomotion. By leveraging differentiable simulation for policy optimization, our approach achieves fast convergence and stable training, significantly outperforming model-free #ReinforcementLearning methods like PPO in sample efficiency. The key enabler is to combine a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. This work highlights one of the first successful real-world applications of differentiable simulation for quadruped robots, offering a compelling alternative to traditional RL methods. Kudos to Yunlong Song! UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) Massachusetts Institute of Technology (MIT)MechE

Davide Scaramuzza

15,533 Aufrufe • vor 1 Jahr

We are thrilled to share our breakthrough research on "Agile Flight from Pixels without State Estimation," to be presented and live-demonstrated at #RSS2024 next week! You heard well: no state estimation means no explicit visual localization, no SLAM, no VIO, and no IMU! Paper: Video (Narrated): Last year, we demonstrated that #ReinforcementLearning (RL) policies could outperform world-champion drone-racing pilots using the same quadrotor hardware; however, unlike human pilots, these policies continuously estimated an explicit state from known gate positions, the camera feed, and inertial measurements (IMU). In this new work, we tackle the challenge of learning vision-based drone racing using an end-to-end reinforcement learning approach that eliminates the need for IMU data or explicit state estimation. Like professional pilots, we go directly from images to control commands. The training is facilitated by an asymmetric actor-critic with access to privileged information. To overcome the computational complexity during image-based RL training, we use an appropriate sensor representation, which can be efficiently simulated during training without rendering images. We achieve agile flight at speeds up to 40 km/h with accelerations up to 2 g's. Although our demonstration focuses on drone racing, we believe that our method has an impact beyond drone racing and can serve as a foundation for future research into real-world applications in structured environments. Besides the paper presentation, we will also give a live demo next Tuesday and Wednesday between and hrs at TU Delft: Reference: Ismail Geles*, Leonard Bauersfeld*, Angel Romero, Jiaxu Xing, Davide Scaramuzza "Demonstrating Agile Flight from Pixels without State Estimation" Robotics: Science and Systems (RSS), 2024. Kudos to Ismail Geles Leonard Bauersfeld Ángel Romero Jiaxu Xing! University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

We are thrilled to share our breakthrough research on "Agile Flight from Pixels without State Estimation," to be presented and live-demonstrated at #RSS2024 next week! You heard well: no state estimation means no explicit visual localization, no SLAM, no VIO, and no IMU! Paper: Video (Narrated): Last year, we demonstrated that #ReinforcementLearning (RL) policies could outperform world-champion drone-racing pilots using the same quadrotor hardware; however, unlike human pilots, these policies continuously estimated an explicit state from known gate positions, the camera feed, and inertial measurements (IMU). In this new work, we tackle the challenge of learning vision-based drone racing using an end-to-end reinforcement learning approach that eliminates the need for IMU data or explicit state estimation. Like professional pilots, we go directly from images to control commands. The training is facilitated by an asymmetric actor-critic with access to privileged information. To overcome the computational complexity during image-based RL training, we use an appropriate sensor representation, which can be efficiently simulated during training without rendering images. We achieve agile flight at speeds up to 40 km/h with accelerations up to 2 g's. Although our demonstration focuses on drone racing, we believe that our method has an impact beyond drone racing and can serve as a foundation for future research into real-world applications in structured environments. Besides the paper presentation, we will also give a live demo next Tuesday and Wednesday between and hrs at TU Delft: Reference: Ismail Geles, Leonard Bauersfeld, Angel Romero, Jiaxu Xing, Davide Scaramuzza "Demonstrating Agile Flight from Pixels without State Estimation" Robotics: Science and Systems (RSS), 2024. Kudos to Ismail Geles Leonard Bauersfeld Ángel Romero Jiaxu Xing! University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Davide Scaramuzza

27,917 Aufrufe • vor 2 Jahren

We are excited to share our latest work, "Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation", where a policy learns to adapt in the real world to unknown disturbances within 5 seconds, both with and without explicit state estimation, directly from visual features. Code released! PDF: Project Page: Starting from a simple analytical dynamics model, the system continuously learns residual dynamics from real-world data and embeds the refined model into a differentiable simulator. This enables fast, gradient-based policy updates that are far more sample-efficient than classical #ReinforcementLearning. We demonstrate rapid adaptation in <5 seconds in agile quadrotor control under challenging conditions, including added payloads, wind disturbances, and large sim-to-real gaps. In real-world experiments, our method reduces hovering error by up to 81% compared to L1-MPC and 55% compared to PPO-based adaptive methods. It also operates directly from visual features without explicit state estimation. Reference: “Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation” IEEE Robotics and Automation Letters, 2026 PDF: Video: Code: Website: Kudos to Michael Pan, Jiaxu Xing, Rudolf Reiter, Yifan Zhai, Elie Aljalbout! UZH Space Hub UZH IfI European Research Council (ERC) AUTOASSESS UZH Science University of Zurich

We are excited to share our latest work, "Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation", where a policy learns to adapt in the real world to unknown disturbances within 5 seconds, both with and without explicit state estimation, directly from visual features. Code released! PDF: Project Page: Starting from a simple analytical dynamics model, the system continuously learns residual dynamics from real-world data and embeds the refined model into a differentiable simulator. This enables fast, gradient-based policy updates that are far more sample-efficient than classical #ReinforcementLearning. We demonstrate rapid adaptation in <5 seconds in agile quadrotor control under challenging conditions, including added payloads, wind disturbances, and large sim-to-real gaps. In real-world experiments, our method reduces hovering error by up to 81% compared to L1-MPC and 55% compared to PPO-based adaptive methods. It also operates directly from visual features without explicit state estimation. Reference: “Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation” IEEE Robotics and Automation Letters, 2026 PDF: Video: Code: Website: Kudos to Michael Pan, Jiaxu Xing, Rudolf Reiter, Yifan Zhai, Elie Aljalbout! UZH Space Hub UZH IfI European Research Council (ERC) AUTOASSESS UZH Science University of Zurich

Davide Scaramuzza

19,184 Aufrufe • vor 6 Monaten

Check out our #RSS2024 paper "#MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints." Model Predictive Contouring Control (MPCC) has shown promising results for agile robotics applications, including car and drone racing. Existing approaches struggle to introduce safety considerations, often resulting in crashes. What does it take to drive or fly fast and safe? We enhance our former MPCC by incorporating spatial constraints that reliably prevent obstacle collisions, allowing planning the fastest trajectory within these safety limits. To improve performance, we leverage real-world data to refine the dynamic model. Our approach is the first to achieve a 100% success rate in real-world experiments. This safety benefit comes without compromising performance, as our method achieves lap times comparable to the best-performing state-based #ReinforcementLearning (RL) policies. Reference M. Krinner, A. Romero, L. Bauersfeld, M. Zeilinger, A. Carron, D. Scaramuzza, "MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints" Robotics, Science and Systems, 2024 PDF: Video: Kudos to Maria Krinner, Angel Romero Aguilar, Leonard Bauersfeld, Melanie Zeilinger, Andrea Carron! Ángel Romero Leonard Bauersfeld University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) #MPC #ModelPredictiveControl

Check out our #RSS2024 paper "#MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints." Model Predictive Contouring Control (MPCC) has shown promising results for agile robotics applications, including car and drone racing. Existing approaches struggle to introduce safety considerations, often resulting in crashes. What does it take to drive or fly fast and safe? We enhance our former MPCC by incorporating spatial constraints that reliably prevent obstacle collisions, allowing planning the fastest trajectory within these safety limits. To improve performance, we leverage real-world data to refine the dynamic model. Our approach is the first to achieve a 100% success rate in real-world experiments. This safety benefit comes without compromising performance, as our method achieves lap times comparable to the best-performing state-based #ReinforcementLearning (RL) policies. Reference M. Krinner, A. Romero, L. Bauersfeld, M. Zeilinger, A. Carron, D. Scaramuzza, "MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints" Robotics, Science and Systems, 2024 PDF: Video: Kudos to Maria Krinner, Angel Romero Aguilar, Leonard Bauersfeld, Melanie Zeilinger, Andrea Carron! Ángel Romero Leonard Bauersfeld University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) #MPC #ModelPredictiveControl

Davide Scaramuzza

17,903 Aufrufe • vor 2 Jahren

Can an inexpensive, off-the-shelf IMU be the only sensor to estimate the full state (position, velocity, orientation) of a quadrotor flying through a track at high speed and even be on-pair with vision-based localization? The answer is yes, within certain limitations! In this #RAL2023 paper, we propose a learning-based odometry algorithm that couples a model-based filter driven by the inertial measurements with a learning-based module with access to the control commands. Our system outperforms by a large margin the state-of-the-art visual-inertial odometry (#VIO) algorithms and the state-of-the-art learned-inertial odometry algorithm, #TLIO, for the task of drone racing. Additionally, we show that our system is as accurate as a VIO algorithm that uses a camera to localize to a known map of the racing track. The main limitation of our approach is that it cannot generalize to trajectories that have not been seen at training time. However, in drone racing competitions, the track is known beforehand. Human pilots spend hours or even days of practice on the race track before the competition. Similarly, our system can be trained with the data collected during practice time and deployed during the competition. Future work will investigate how to generalize to trajectories not seen at training time. The code is released! Paper: Video: Code: Kudos to Giovanni Cioffi Leonard Bauersfeld Elia Kaufmann European Research Council (ERC) University of Zurich UZH Science UZH Space Hub NCCR Robotics Aerial Core #RAL2023 #IROS2023 #SLAM

Can an inexpensive, off-the-shelf IMU be the only sensor to estimate the full state (position, velocity, orientation) of a quadrotor flying through a track at high speed and even be on-pair with vision-based localization? The answer is yes, within certain limitations! In this #RAL2023 paper, we propose a learning-based odometry algorithm that couples a model-based filter driven by the inertial measurements with a learning-based module with access to the control commands. Our system outperforms by a large margin the state-of-the-art visual-inertial odometry (#VIO) algorithms and the state-of-the-art learned-inertial odometry algorithm, #TLIO, for the task of drone racing. Additionally, we show that our system is as accurate as a VIO algorithm that uses a camera to localize to a known map of the racing track. The main limitation of our approach is that it cannot generalize to trajectories that have not been seen at training time. However, in drone racing competitions, the track is known beforehand. Human pilots spend hours or even days of practice on the race track before the competition. Similarly, our system can be trained with the data collected during practice time and deployed during the competition. Future work will investigate how to generalize to trajectories not seen at training time. The code is released! Paper: Video: Code: Kudos to Giovanni Cioffi Leonard Bauersfeld Elia Kaufmann European Research Council (ERC) University of Zurich UZH Science UZH Space Hub NCCR Robotics Aerial Core #RAL2023 #IROS2023 #SLAM

Davide Scaramuzza

37,061 Aufrufe • vor 2 Jahren

We are excited to share our work “Event-Aided Sharp Radiance Field Reconstruction for Fast-Flying Drones” published in IEEE Transactions on Robotics IEEE Transactions on Robotics (T-RO), which tackles sharp radiance field reconstruction under agile drone motion, where RGB frames are heavily motion-blurred and pose priors become unreliable! 4 years in the making! Code & dataset released! PDF: Code & Dataset: Full Narrated Video: High-speed flight is essential for time- and battery-constrained missions (e.g., inspection, exploration, search & rescue). However, fast motion corrupts visual data with severe motion blur and introduces drift/noise in visual-inertial odometry, making NeRF-based 3D reconstruction particularly brittle. We propose a unified framework that leverages asynchronous #EventCamera streams together with motion-blurred frames to reconstruct high-fidelity radiance fields from agile drone flights. Our key idea is to embed event-image fusion directly into radiance field optimization while jointly refining a shared, continuous-time camera trajectory initialized from event-based VIO. This enables us to recover sharp radiance fields and accurate trajectories without ground-truth supervision during training. We validate our method on synthetic data and on real sequences captured by a drone flying up to 2 m/s. Despite severe blur and noisy pose priors, our method preserves fine scene details and achieves a performance gain of over 50% on real-world data compared to state-of-the-art methods. Kudos to Rong Zou and Marco Cannici! Marco Cannici Reference: Rong Zou*, Marco Cannici*, Davide Scaramuzza Event-Aided Sharp Radiance Field Reconstruction for Fast-Flying Drones IEEE Transactions on Robotics (T-RO), 2026 NCCR Robotics European Research Council (ERC) AUTOASSESS UZH IfI University of Zurich UZH Science Prophesee SynSense UZH Space Hub

We are excited to share our work “Event-Aided Sharp Radiance Field Reconstruction for Fast-Flying Drones” published in IEEE Transactions on Robotics IEEE Transactions on Robotics (T-RO), which tackles sharp radiance field reconstruction under agile drone motion, where RGB frames are heavily motion-blurred and pose priors become unreliable! 4 years in the making! Code & dataset released! PDF: Code & Dataset: Full Narrated Video: High-speed flight is essential for time- and battery-constrained missions (e.g., inspection, exploration, search & rescue). However, fast motion corrupts visual data with severe motion blur and introduces drift/noise in visual-inertial odometry, making NeRF-based 3D reconstruction particularly brittle. We propose a unified framework that leverages asynchronous #EventCamera streams together with motion-blurred frames to reconstruct high-fidelity radiance fields from agile drone flights. Our key idea is to embed event-image fusion directly into radiance field optimization while jointly refining a shared, continuous-time camera trajectory initialized from event-based VIO. This enables us to recover sharp radiance fields and accurate trajectories without ground-truth supervision during training. We validate our method on synthetic data and on real sequences captured by a drone flying up to 2 m/s. Despite severe blur and noisy pose priors, our method preserves fine scene details and achieves a performance gain of over 50% on real-world data compared to state-of-the-art methods. Kudos to Rong Zou and Marco Cannici! Marco Cannici Reference: Rong Zou, Marco Cannici, Davide Scaramuzza Event-Aided Sharp Radiance Field Reconstruction for Fast-Flying Drones IEEE Transactions on Robotics (T-RO), 2026 NCCR Robotics European Research Council (ERC) AUTOASSESS UZH IfI University of Zurich UZH Science Prophesee SynSense UZH Space Hub

Davide Scaramuzza

12,006 Aufrufe • vor 5 Monaten

We are excited to share our latest work, "Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning," done in collaboration with Google DeepMind . Autonomous drones have reached superhuman speed in isolation, but what happens when multiple agents share the same airspace? Paper: Website: Video: Using league-based self-play, we train #ReinforcementLearning agents that race against a diverse, evolving population of opponents. Through this competitive training, sophisticated behaviors emerge without explicit programming: strategic overtaking, proactive collision avoidance, and even awareness of aerodynamic downwash from nearby drones. In real-world multi-player races at speeds exceeding 80kph (50 mph) and accelerations up to 7g, our agents outperform a five-time Swiss national drone racing champion while reducing collision rates by 50% compared to single-agent baselines. Crucially, training against diverse artificial opponents enables zero-shot generalization to human pilots, achieving over 90% race completion in mixed human-AI races with up to four competitors. A key insight: human pilots adopt riskier strategies when trailing, leading to more crashes under competitive pressure. Our learned policies, by contrast, maintain consistent safety margins regardless of race standing, a property essential for deploying autonomous systems alongside humans. Also, the multi-agent self-play policies are more robust than those trained independently, suggesting that training in competitive environments is not only key to winning races but also to learning safer, more reliable autonomy for real-world multi-robot systems. Kudos to Ismail Geles, Leonard Bauersfeld, Markus Wulfmeier! Ismail Geles Leonard Bauersfeld Markus Wulfmeier European Research Council (ERC) UZH IfI University of Zurich UZH Science UZH Space Hub Swiss Robotics NCCR Robotics

We are excited to share our latest work, "Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning," done in collaboration with Google DeepMind . Autonomous drones have reached superhuman speed in isolation, but what happens when multiple agents share the same airspace? Paper: Website: Video: Using league-based self-play, we train #ReinforcementLearning agents that race against a diverse, evolving population of opponents. Through this competitive training, sophisticated behaviors emerge without explicit programming: strategic overtaking, proactive collision avoidance, and even awareness of aerodynamic downwash from nearby drones. In real-world multi-player races at speeds exceeding 80kph (50 mph) and accelerations up to 7g, our agents outperform a five-time Swiss national drone racing champion while reducing collision rates by 50% compared to single-agent baselines. Crucially, training against diverse artificial opponents enables zero-shot generalization to human pilots, achieving over 90% race completion in mixed human-AI races with up to four competitors. A key insight: human pilots adopt riskier strategies when trailing, leading to more crashes under competitive pressure. Our learned policies, by contrast, maintain consistent safety margins regardless of race standing, a property essential for deploying autonomous systems alongside humans. Also, the multi-agent self-play policies are more robust than those trained independently, suggesting that training in competitive environments is not only key to winning races but also to learning safer, more reliable autonomy for real-world multi-robot systems. Kudos to Ismail Geles, Leonard Bauersfeld, Markus Wulfmeier! Ismail Geles Leonard Bauersfeld Markus Wulfmeier European Research Council (ERC) UZH IfI University of Zurich UZH Science UZH Space Hub Swiss Robotics NCCR Robotics

Davide Scaramuzza

14,612 Aufrufe • vor 2 Monaten

We are excited to share our #CORL2024 paper on learning quadrotor obstacle avoidance from the visual stream of a single #eventcamera! Trained entirely in simulation! We demonstrate obstacle avoidance both in the dark and in a forest up to 5m/s. PDF: Video: Project page: Event cameras are sensors that output per-pixel-level intensity changes at microsecond latency resolution; they feature nearly zero motion blur and high dynamic range but produce a very large volume of events under significant ego-motion and further lack a high-fidelity continuous-time sensor model in simulation, making direct #sim2real transfer not possible. By leveraging depth prediction as a pretext task, we pre-train a reactive obstacle avoidance policy with “approximated” simulated events and then fine-tune the perception component with limited events-and-depth real-world data. This technique bridges the sim2real gap for #eventcameras! As at the current state, there is no continuous-time sensor model for event cameras, we hope that this work can finally spur future research leveraging simulation for training event-vision-based policies to create faster, agile robots! Kudos to Anish Bhattacharya, @marcocannic, Vijay Kumar Nikolai Matni UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) GRASP Laboratory Penn Engineering

We are excited to share our #CORL2024 paper on learning quadrotor obstacle avoidance from the visual stream of a single #eventcamera! Trained entirely in simulation! We demonstrate obstacle avoidance both in the dark and in a forest up to 5m/s. PDF: Video: Project page: Event cameras are sensors that output per-pixel-level intensity changes at microsecond latency resolution; they feature nearly zero motion blur and high dynamic range but produce a very large volume of events under significant ego-motion and further lack a high-fidelity continuous-time sensor model in simulation, making direct #sim2real transfer not possible. By leveraging depth prediction as a pretext task, we pre-train a reactive obstacle avoidance policy with “approximated” simulated events and then fine-tune the perception component with limited events-and-depth real-world data. This technique bridges the sim2real gap for #eventcameras! As at the current state, there is no continuous-time sensor model for event cameras, we hope that this work can finally spur future research leveraging simulation for training event-vision-based policies to create faster, agile robots! Kudos to Anish Bhattacharya, @marcocannic, Vijay Kumar Nikolai Matni UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) GRASP Laboratory Penn Engineering

Davide Scaramuzza

17,219 Aufrufe • vor 1 Jahr

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:07 - Is robot locomotion solved? 06:04 - Sim-to-real gap 08:58 - Adding semantics to policies 09:42 - Modular vs end-to-end architectures 10:29 - Planner model 12:21 - Adapting RL techniques from quadrupeds to humanoids 15:39 - Behind robot demos 18:09 - Humanoid robots in home environments 22:03 - Training approach 23:56 - VLA models 27:59 - Closing the sim-to-real gap 32:55 - Task orchestration using VLMs 36:38 - Tool use 38:10 - Model hierarchy 43:37 - Simulator versus simulation environment 44:57 - Combining imitation learning and reinforcement learning 46:42 - RL in real world versus RL in simulation 52:58 - Reward tuning and value functions in robotics 56:38 - Predictions 1:00:10 - Humanoids, quadropeds, and wheeled platforms 1:02:45 - Advice, recommended robot kits, and community pla

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:07 - Is robot locomotion solved? 06:04 - Sim-to-real gap 08:58 - Adding semantics to policies 09:42 - Modular vs end-to-end architectures 10:29 - Planner model 12:21 - Adapting RL techniques from quadrupeds to humanoids 15:39 - Behind robot demos 18:09 - Humanoid robots in home environments 22:03 - Training approach 23:56 - VLA models 27:59 - Closing the sim-to-real gap 32:55 - Task orchestration using VLMs 36:38 - Tool use 38:10 - Model hierarchy 43:37 - Simulator versus simulation environment 44:57 - Combining imitation learning and reinforcement learning 46:42 - RL in real world versus RL in simulation 52:58 - Reward tuning and value functions in robotics 56:38 - Predictions 1:00:10 - Humanoids, quadropeds, and wheeled platforms 1:02:45 - Advice, recommended robot kits, and community pla

The TWIML AI Podcast

22,592 Aufrufe • vor 6 Monaten

Synchronize Dual Hands for Physics-Based Dexterous Guitar Playing discuss: We present a novel approach to synthesize dexterous motions for physically simulated hands in tasks that require coordination between the control of two hands with high temporal precision. Instead of directly learning a joint policy to control two hands, our approach performs bimanual control through cooperative learning where each hand is treated as an individual agent. The individual policies for each hand are first trained separately, and then synchronized through latent space manipulation in a centralized environment to serve as a joint policy for two-hand control. By doing so, we avoid directly performing policy learning in the joint state-action space of two hands with higher dimensions, greatly improving the overall training efficiency. We demonstrate the effectiveness of our proposed approach in the challenging guitar-playing task. The virtual guitarist trained by our approach can synthesize motions from unstructured reference data of general guitar-playing practice motions, and accurately play diverse rhythms with complex chord pressing and string picking patterns based on the input guitar tabs that do not exist in the references. Along with this paper, we provide the motion capture data that we collected as the reference for policy training.

Synchronize Dual Hands for Physics-Based Dexterous Guitar Playing discuss: We present a novel approach to synthesize dexterous motions for physically simulated hands in tasks that require coordination between the control of two hands with high temporal precision. Instead of directly learning a joint policy to control two hands, our approach performs bimanual control through cooperative learning where each hand is treated as an individual agent. The individual policies for each hand are first trained separately, and then synchronized through latent space manipulation in a centralized environment to serve as a joint policy for two-hand control. By doing so, we avoid directly performing policy learning in the joint state-action space of two hands with higher dimensions, greatly improving the overall training efficiency. We demonstrate the effectiveness of our proposed approach in the challenging guitar-playing task. The virtual guitarist trained by our approach can synthesize motions from unstructured reference data of general guitar-playing practice motions, and accurately play diverse rhythms with complex chord pressing and string picking patterns based on the input guitar tabs that do not exist in the references. Along with this paper, we provide the motion capture data that we collected as the reference for policy training.

AK

26,855 Aufrufe • vor 1 Jahr

We’re excited to share DiT4DiT, an end-to-end Video-Action Model for robot learning that unifies a video Diffusion Transformer and an action Diffusion Transformer in a single cascaded framework. By leveraging the rich spatiotemporal and physical dynamics learned through video generation, rather than static image-text priors, DiT4DiT achieves state-of-the-art results on LIBERO (98.6%) and RoboCasa GR1 (50.8%) with far less training data, delivering over 10× better sample efficiency and up to 7× faster convergence. Real-world deployment on a humanoid robot further shows robust generalization. We believe this is a step toward making video generation a powerful backbone for robot policy learning. This work builds upon the brilliant foundations laid by Nvidia's GR00T and Cosmos. Project: Paper: Code: Coming soon. In the meantime, you can ask your coding agent to reproduce the method based on GR00T/Cosmos.

We’re excited to share DiT4DiT, an end-to-end Video-Action Model for robot learning that unifies a video Diffusion Transformer and an action Diffusion Transformer in a single cascaded framework. By leveraging the rich spatiotemporal and physical dynamics learned through video generation, rather than static image-text priors, DiT4DiT achieves state-of-the-art results on LIBERO (98.6%) and RoboCasa GR1 (50.8%) with far less training data, delivering over 10× better sample efficiency and up to 7× faster convergence. Real-world deployment on a humanoid robot further shows robust generalization. We believe this is a step toward making video generation a powerful backbone for robot policy learning. This work builds upon the brilliant foundations laid by Nvidia's GR00T and Cosmos. Project: Paper: Code: Coming soon. In the meantime, you can ask your coding agent to reproduce the method based on GR00T/Cosmos.

Shuo Yang

31,596 Aufrufe • vor 4 Monaten

Experiments in progress. The one on the right has been learning for ~3 hours, the one in the middle for ~1 hour, and the one on the left just started a few minutes ago. The initial motivation for making the physical Atari was just to commit ourselves to a subset of algorithms that can make progress in this setup. This commitment rules out algorithms that require billions of samples to learn (or worse, require multiple environments running in parallel). Atari games are simple enough that we should be able to show learning on them in a short amount of time with no prior knowledge. Since then, I've realized that this setup is also a good way to compare different paradigms in robotics in a principled way. These paradigms are sim2real, learning from tele-operated data, and learning directly on the robots. So far, I have observed that getting sim2real to work reliably is hard. It requires tweaks that don't scale. Policies that can play perfectly in simulation fall apart because of latencies and the messiness of the real world. These aspects could be modeled to improve the simulation, but not without sinking significant human engineering hours. I have higher hopes for learning from tele-operated data, but that requires a human to learn the task first. These experiments are on my to-do list. I have to learn to play some of the games well through the robot. I’m half-decent at playing Pong and Ms Pacman now. Learning directly on robots is looking like the most promising approach. This approach takes away pesky distribution shifts and makes it possible to have algorithms that continually improve with more data and time without any human intervention. It feels great to let experiments run overnight and wake up to find improved policies. With learning on robots, I should, in principle, be able to go on a long vacation and come back to find better policies for complex tasks beyond Atari games. Whether that is possible with current learning algorithms is a different question.

Experiments in progress. The one on the right has been learning for ~3 hours, the one in the middle for ~1 hour, and the one on the left just started a few minutes ago. The initial motivation for making the physical Atari was just to commit ourselves to a subset of algorithms that can make progress in this setup. This commitment rules out algorithms that require billions of samples to learn (or worse, require multiple environments running in parallel). Atari games are simple enough that we should be able to show learning on them in a short amount of time with no prior knowledge. Since then, I've realized that this setup is also a good way to compare different paradigms in robotics in a principled way. These paradigms are sim2real, learning from tele-operated data, and learning directly on the robots. So far, I have observed that getting sim2real to work reliably is hard. It requires tweaks that don't scale. Policies that can play perfectly in simulation fall apart because of latencies and the messiness of the real world. These aspects could be modeled to improve the simulation, but not without sinking significant human engineering hours. I have higher hopes for learning from tele-operated data, but that requires a human to learn the task first. These experiments are on my to-do list. I have to learn to play some of the games well through the robot. I’m half-decent at playing Pong and Ms Pacman now. Learning directly on robots is looking like the most promising approach. This approach takes away pesky distribution shifts and makes it possible to have algorithms that continually improve with more data and time without any human intervention. It feels great to let experiments run overnight and wake up to find improved policies. With learning on robots, I should, in principle, be able to go on a long vacation and come back to find better policies for complex tasks beyond Atari games. Whether that is possible with current learning algorithms is a different question.

Khurram Javed

52,110 Aufrufe • vor 7 Monaten

$Introducing LifeGPT, showing that LLMs can simulate complex, Turing-complete systems like Conway's Game of Life with near-perfect accuracy—no prior topology needed.🌐This unlocks new potential for AI in modeling self-organizing systems in biology, materials science, & beyond.🔬🤖 #AI #LifeGPT. Cellular Automata (CA), like Conway's Game of Life ("Life"), are computationally irreducible, meaning their evolution is difficult to predict without an a-priori understanding of the rules of the game, including the topology on which it is played. LifeGPT is a topology-agnostic generative model that learns the rules of Life without prior knowledge of its grid structure or boundary conditions, from only a tiny number of game states. The success in simulating Life suggests promising avenues for scientific discovery, particularly in bridging the gap between AI, artificial life, and real-world biological systems, for both forward and inverse problems. The potential for universal computation within generative AI, including LLMs, through approaches like LifeGPT, represents an exciting area for future research, especially when combined with reinforcement learning. Model Convergence: LifeGPT exhibits rapid convergence during training, achieving high accuracy in predicting next-game-states. We attribute the non-zero cross-entropy loss to the lack of causal relationships within randomly generated ICs. Accuracy & Temperature: LifeGPT achieves near-perfect accuracy, particularly at lower sampling temperatures, but can be continually tuned towards higher creativity to discover patterns that the original ruleset would not be able to produce. This finding highlights the trade-off between model creativity (higher temperature) and accuracy in deterministic predictions, with high relevance to model real-world dynamical systems for which no closed-form rulesets exist. Zero/Few-Shot Learning: Trained on a small fraction of possible initial conditions, LifeGPT demonstrates strong zero/few-shot learning, accurately simulating Life for unseen initial conditions. However, rare prediction errors highlight that LifeGPT approximates rather than perfectly replicates the Life algorithm. Autoregressive Autoregressor: A recursive implementation of LifeGPT demonstrates the model's ability to simulate Life over multiple timesteps. LifeGPT is topology-agnostic with respect to its training data and our results show that a GPT model is capable of capturing the deterministic rules of a Turing-complete system with near-perfect accuracy, given sufficiently diverse training data. The work showcases the possibility for future models to synthesize stochastic generative capabilities with deterministic computational capabilities. Link to code, paper, etc. below. Podcast generated using #NotebookLM. LAMM@MIT DMSE at MIT$

Introducing LifeGPT, showing that LLMs can simulate complex, Turing-complete systems like Conway's Game of Life with near-perfect accuracy—no prior topology needed.🌐This unlocks new potential for AI in modeling self-organizing systems in biology, materials science, & beyond.🔬🤖 #AI #LifeGPT. Cellular Automata (CA), like Conway's Game of Life ("Life"), are computationally irreducible, meaning their evolution is difficult to predict without an a-priori understanding of the rules of the game, including the topology on which it is played. LifeGPT is a topology-agnostic generative model that learns the rules of Life without prior knowledge of its grid structure or boundary conditions, from only a tiny number of game states. The success in simulating Life suggests promising avenues for scientific discovery, particularly in bridging the gap between AI, artificial life, and real-world biological systems, for both forward and inverse problems. The potential for universal computation within generative AI, including LLMs, through approaches like LifeGPT, represents an exciting area for future research, especially when combined with reinforcement learning. Model Convergence: LifeGPT exhibits rapid convergence during training, achieving high accuracy in predicting next-game-states. We attribute the non-zero cross-entropy loss to the lack of causal relationships within randomly generated ICs. Accuracy & Temperature: LifeGPT achieves near-perfect accuracy, particularly at lower sampling temperatures, but can be continually tuned towards higher creativity to discover patterns that the original ruleset would not be able to produce. This finding highlights the trade-off between model creativity (higher temperature) and accuracy in deterministic predictions, with high relevance to model real-world dynamical systems for which no closed-form rulesets exist. Zero/Few-Shot Learning: Trained on a small fraction of possible initial conditions, LifeGPT demonstrates strong zero/few-shot learning, accurately simulating Life for unseen initial conditions. However, rare prediction errors highlight that LifeGPT approximates rather than perfectly replicates the Life algorithm. Autoregressive Autoregressor: A recursive implementation of LifeGPT demonstrates the model's ability to simulate Life over multiple timesteps. LifeGPT is topology-agnostic with respect to its training data and our results show that a GPT model is capable of capturing the deterministic rules of a Turing-complete system with near-perfect accuracy, given sufficiently diverse training data. The work showcases the possibility for future models to synthesize stochastic generative capabilities with deterministic computational capabilities. Link to code, paper, etc. below. Podcast generated using #NotebookLM. LAMM@MIT DMSE at MIT

Markus J. Buehler

114,194 Aufrufe • vor 1 Jahr

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,457 Aufrufe • vor 1 Jahr

Physics-based Motion Retargeting from Sparse Inputs paper page: Avatars are important to create interactive and immersive experiences in virtual worlds. One challenge in animating these characters to mimic a user's motion is that commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose. Another challenge is that an avatar might have a different skeleton structure than a human and the mapping between them is unclear. In this work we address both of these challenges. We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies. Our method uses reinforcement learning to train a policy to control characters in a physics simulator. We only require human motion capture data for training, without relying on artist-generated animations for each avatar. This allows us to use large motion capture datasets to train general policies that can track unseen users from real and sparse data in real-time. We demonstrate the feasibility of our approach on three characters with different skeleton structure: a dinosaur, a mouse-like creature and a human. We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available. We discuss and ablate the important components in our framework, specifically the kinematic retargeting step, the imitation, contact and action reward as well as our asymmetric actor-critic observations. We further explore the robustness of our method in a variety of settings including unbalancing, dancing and sports motions.

AK

106,527 Aufrufe • vor 3 Jahren

The ties between Cyprus and India are rooted in shared historical experiences. Forged through common struggles and strengthened by our joint commitment to democracy, and respect for international law. This State Visit, which transpires less than a year since Prime Minister Modi’s visit to Cyprus and takes place during the Cyprus Presidency of the Council of the European Union, has not only confirmed the enormous potential of our relationship. It has also provided the space to advance it further. We build this strategic partnership on solid foundations, and a clear vision, as encapsulated in the Joint Declaration on the Implementation of our Strategic Partnership we agreed last year, which is already delivering tangible results across key sectors. This growing partnership builds also on the leaps of progress over the past year in EU-India relations, which culminated, in the first month of the Cyprus Presidency, when the political agreement was announced on concluding the milestone Free Trade Agreement. In fact, the Cyprus Presidency has placed EU-India relations, trade, connectivity at the forefront of our work, driven by our vision for a European Union that is truly Open to the World. It is in fact my firm conviction that, in an increasingly shifting geopolitical environment, the EU–India relationship will be one of the defining partnerships of the 21st century. A partnership that can help safeguard stability, resilience, connectivity, and the rules-based international order. 🇨🇾🇪🇺🇮🇳

The ties between Cyprus and India are rooted in shared historical experiences. Forged through common struggles and strengthened by our joint commitment to democracy, and respect for international law. This State Visit, which transpires less than a year since Prime Minister Modi’s visit to Cyprus and takes place during the Cyprus Presidency of the Council of the European Union, has not only confirmed the enormous potential of our relationship. It has also provided the space to advance it further. We build this strategic partnership on solid foundations, and a clear vision, as encapsulated in the Joint Declaration on the Implementation of our Strategic Partnership we agreed last year, which is already delivering tangible results across key sectors. This growing partnership builds also on the leaps of progress over the past year in EU-India relations, which culminated, in the first month of the Cyprus Presidency, when the political agreement was announced on concluding the milestone Free Trade Agreement. In fact, the Cyprus Presidency has placed EU-India relations, trade, connectivity at the forefront of our work, driven by our vision for a European Union that is truly Open to the World. It is in fact my firm conviction that, in an increasingly shifting geopolitical environment, the EU–India relationship will be one of the defining partnerships of the 21st century. A partnership that can help safeguard stability, resilience, connectivity, and the rules-based international order. 🇨🇾🇪🇺🇮🇳

NikosChristodoulides

14,787 Aufrufe • vor 2 Monaten