Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Israel-based Mentee Robotics has demonstrated a logistics workflow: two MenteeBot V3 humanoids work autonomously to pick and place totes. A Modular Agent System is preferred because it favors real-world robustness and lower compute needs over the End-to-End VLA model. Its architecture is composed of three components: - LLM Planner:... Converts instructions into executable Robotic API Language code for reliable task decomposition and error handling. - Perception Stack: Uses pre-trained models (NeRF/3DGS, distilled vision) for scene understanding and navigation. - Control Policies: Reinforcement Learning (RL) models, trained at scale via Sim2Real, generate motor commands, enabling high-accuracy mobile manipulation. Crucially, the robot learns new tasks from a single demonstration in hours. Object tracking uses 3D geometry (STL/URDF) tracked in the video to define the RL reward function. Training is optimized using 'Automatic Curriculum Learning', which autonomously adjusts task difficulty based on robot performance, eliminating manual engineering. All computation runs onboard.show more

The Humanoid Hub

88,729 subscribers

15,729 views • 8 months ago •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:07 - Is robot locomotion solved? 06:04 - Sim-to-real gap 08:58 - Adding semantics to policies 09:42 - Modular vs end-to-end architectures 10:29 - Planner model 12:21 - Adapting RL techniques from quadrupeds to humanoids 15:39 - Behind robot demos 18:09 - Humanoid robots in home environments 22:03 - Training approach 23:56 - VLA models 27:59 - Closing the sim-to-real gap 32:55 - Task orchestration using VLMs 36:38 - Tool use 38:10 - Model hierarchy 43:37 - Simulator versus simulation environment 44:57 - Combining imitation learning and reinforcement learning 46:42 - RL in real world versus RL in simulation 52:58 - Reward tuning and value functions in robotics 56:38 - Predictions 1:00:10 - Humanoids, quadropeds, and wheeled platforms 1:02:45 - Advice, recommended robot kits, and community pla

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:07 - Is robot locomotion solved? 06:04 - Sim-to-real gap 08:58 - Adding semantics to policies 09:42 - Modular vs end-to-end architectures 10:29 - Planner model 12:21 - Adapting RL techniques from quadrupeds to humanoids 15:39 - Behind robot demos 18:09 - Humanoid robots in home environments 22:03 - Training approach 23:56 - VLA models 27:59 - Closing the sim-to-real gap 32:55 - Task orchestration using VLMs 36:38 - Tool use 38:10 - Model hierarchy 43:37 - Simulator versus simulation environment 44:57 - Combining imitation learning and reinforcement learning 46:42 - RL in real world versus RL in simulation 52:58 - Reward tuning and value functions in robotics 56:38 - Predictions 1:00:10 - Humanoids, quadropeds, and wheeled platforms 1:02:45 - Advice, recommended robot kits, and community pla

The TWIML AI Podcast

22,533 views • 6 months ago

A big part of scaling robot learning to solve real-world problems is that we somehow need to get enough diverse, high-quality data to train our robots to perform useful things. GPT and its fellow large language models were bootstrapped and proved out on a massive dataset of real-world language data. Unfortunately, despite our best efforts, similarly massive datasets don’t really exist for robotics — so, in our unending pursuit of high-quality, useful data, we turn to simulation. I compared a couple recent works on sim-to-real robot manipulation, which discuss how to train perception-driven manipulation policies in simulation, in such a way that they’re useful in the real world. - DextraH-RGB, from NVIDIA - Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation, also from NVIDIA — specifically the GEAR lab - Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids, another GEAR lab paper - Local Policies Enable Zero-shot Long-Horizon Manipulation, from CMU (video from DextrAH-RGB)

A big part of scaling robot learning to solve real-world problems is that we somehow need to get enough diverse, high-quality data to train our robots to perform useful things. GPT and its fellow large language models were bootstrapped and proved out on a massive dataset of real-world language data. Unfortunately, despite our best efforts, similarly massive datasets don’t really exist for robotics — so, in our unending pursuit of high-quality, useful data, we turn to simulation. I compared a couple recent works on sim-to-real robot manipulation, which discuss how to train perception-driven manipulation policies in simulation, in such a way that they’re useful in the real world. - DextraH-RGB, from NVIDIA - Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation, also from NVIDIA — specifically the GEAR lab - Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids, another GEAR lab paper - Local Policies Enable Zero-shot Long-Horizon Manipulation, from CMU (video from DextrAH-RGB)

Chris Paxton

20,486 views • 1 year ago

The robot is learning several novel tasks instantly, after just ONE demonstration each... Instant Policy makes it possible: no extra training, no weight updates, just pure in-context learning. It just got accepted at ICLR 2025, and it’s changing how robots learn. With just a single demo, a robot can pick up a new task and start performing it right away. Why this is a big deal: ✅ Learns tasks instantly with just one or a few demonstrations ✅ Improves over time as more demonstrations are given ✅ Uses simulation-based training with “pseudo-demonstrations” for scalability ✅ Can transfer skills across different robots and even follow language-defined tasks It brings in-context learning to robotics, opening up new possibilities for flexible, real-world automation. You can try it yourself: code and weights are available at • • • • Thank you to Edward Johns, Director of the Robot Learning Lab at Imperial College for sharing their work! 🙏

The robot is learning several novel tasks instantly, after just ONE demonstration each... Instant Policy makes it possible: no extra training, no weight updates, just pure in-context learning. It just got accepted at ICLR 2025, and it’s changing how robots learn. With just a single demo, a robot can pick up a new task and start performing it right away. Why this is a big deal: ✅ Learns tasks instantly with just one or a few demonstrations ✅ Improves over time as more demonstrations are given ✅ Uses simulation-based training with “pseudo-demonstrations” for scalability ✅ Can transfer skills across different robots and even follow language-defined tasks It brings in-context learning to robotics, opening up new possibilities for flexible, real-world automation. You can try it yourself: code and weights are available at • • • • Thank you to Edward Johns, Director of the Robot Learning Lab at Imperial College for sharing their work! 🙏

Ilir Aliu

46,285 views • 1 year ago

End-to-end neural networks racing drones in Abu Dhabi! 🚁 Check out the drone racing team from Delft University of Technology! A completely end-to-end neural network solution, from pixels to direct motor commands. No Kalman filters. No computer vision feature detectors. Just neurons flying the drone. The challenge is extreme. These drones fly at high speeds and need split-second decisions with minimal onboard resources: a single rolling-shutter camera and an IMU. Their approach is called SkyDreamer, based on the Dreamer-v3 reinforcement learning algorithm. First, a world model is trained in simulation. Then, the neural network learns how to fly in its dreams through reinforcement learning. The network's internal state can be read out to see where it thinks it is on the track or how fast it's going. Even better, the drone estimates some of its own body characteristics during flight, like the camera angle relative to the body, eliminating time-consuming manual calibration. The system uses only a single camera and the gyros from the IMU, ignoring the accelerometers, just like human FPV pilots do. ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

End-to-end neural networks racing drones in Abu Dhabi! 🚁 Check out the drone racing team from Delft University of Technology! A completely end-to-end neural network solution, from pixels to direct motor commands. No Kalman filters. No computer vision feature detectors. Just neurons flying the drone. The challenge is extreme. These drones fly at high speeds and need split-second decisions with minimal onboard resources: a single rolling-shutter camera and an IMU. Their approach is called SkyDreamer, based on the Dreamer-v3 reinforcement learning algorithm. First, a world model is trained in simulation. Then, the neural network learns how to fly in its dreams through reinforcement learning. The network's internal state can be read out to see where it thinks it is on the track or how fast it's going. Even better, the drone estimates some of its own body characteristics during flight, like the camera angle relative to the body, eliminating time-consuming manual calibration. The system uses only a single camera and the gyros from the IMU, ignoring the accelerometers, just like human FPV pilots do. ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Lukas Ziegler

207,110 views • 5 months ago

New Course: Post-training of LLMs Learn to post-train and customize an LLM in this short course, taught by Banghua Zhu, Assistant Professor at the University of Washington University of Washington, and co-founder of @NexusflowX. Training an LLM to follow instructions or answer questions has two key stages: pre-training and post-training. In pre-training, it learns to predict the next word or token from large amounts of unlabeled text. In post-training, it learns useful behaviors such as following instructions, tool use, and reasoning. Post-training transforms a general-purpose token predictor—trained on trillions of unlabeled text tokens—into an assistant that follows instructions and performs specific tasks. Because it is much cheaper than pre-training, it is practical for many more teams to incorporate post-training methods into their workflows than pre-training. In this course, you’ll learn three common post-training methods—Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL)—and how to use each one effectively. With SFT, you train the model on pairs of input and ideal output responses. With DPO, you provide both a preferred (chosen) and a less preferred (rejected) response and train the model to favor the preferred output. With RL, the model generates an output, receives a reward score based on human or automated feedback, and updates the model to improve performance. You’ll learn the basic concepts, common use cases, and principles for curating high-quality data for effective training. Through hands-on labs, you’ll download a pre-trained model from Hugging Face and post-train it using SFT, DPO, and RL to see how each technique shapes model behavior. In detail, you’ll: - Understand what post-training is, when to use it, and how it differs from pre-training. - Build an SFT pipeline to turn a base model into an instruct model. - Explore how DPO reshapes behavior by minimizing contrastive loss—penalizing poor responses and reinforcing preferred ones. - Implement a DPO pipeline to change the identity of a chat assistant. - Learn online RL methods such as Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), and how to design reward functions. - Train a model with GRPO to improve its math capabilities using a verifiable reward. Post-training is one of the most rapidly developing areas of LLM training. Whether you’re building a high-accuracy context-specific assistant, fine-tuning a model's tone, or improving task-specific accuracy, this course will give you experience with the most important techniques shaping how LLMs are post-trained today. Please sign up here:

New Course: Post-training of LLMs Learn to post-train and customize an LLM in this short course, taught by Banghua Zhu, Assistant Professor at the University of Washington University of Washington, and co-founder of @NexusflowX. Training an LLM to follow instructions or answer questions has two key stages: pre-training and post-training. In pre-training, it learns to predict the next word or token from large amounts of unlabeled text. In post-training, it learns useful behaviors such as following instructions, tool use, and reasoning. Post-training transforms a general-purpose token predictor—trained on trillions of unlabeled text tokens—into an assistant that follows instructions and performs specific tasks. Because it is much cheaper than pre-training, it is practical for many more teams to incorporate post-training methods into their workflows than pre-training. In this course, you’ll learn three common post-training methods—Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL)—and how to use each one effectively. With SFT, you train the model on pairs of input and ideal output responses. With DPO, you provide both a preferred (chosen) and a less preferred (rejected) response and train the model to favor the preferred output. With RL, the model generates an output, receives a reward score based on human or automated feedback, and updates the model to improve performance. You’ll learn the basic concepts, common use cases, and principles for curating high-quality data for effective training. Through hands-on labs, you’ll download a pre-trained model from Hugging Face and post-train it using SFT, DPO, and RL to see how each technique shapes model behavior. In detail, you’ll: - Understand what post-training is, when to use it, and how it differs from pre-training. - Build an SFT pipeline to turn a base model into an instruct model. - Explore how DPO reshapes behavior by minimizing contrastive loss—penalizing poor responses and reinforcing preferred ones. - Implement a DPO pipeline to change the identity of a chat assistant. - Learn online RL methods such as Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), and how to design reward functions. - Train a model with GRPO to improve its math capabilities using a verifiable reward. Post-training is one of the most rapidly developing areas of LLM training. Whether you’re building a high-accuracy context-specific assistant, fine-tuning a model's tone, or improving task-specific accuracy, this course will give you experience with the most important techniques shaping how LLMs are post-trained today. Please sign up here:

Andrew Ng

125,146 views • 1 year ago

Foundation models are enough to solve robotics! Unfortunately, this is not true. We keep hearing that Vision-Language-Action (VLA) models struggle because of the gap between static training and the dynamic real world. A German startup (Sereact) just released a solution that bridges this gap perfectly. They are introducing a new paradigm called Interactive RL Policy Patching. It's a distributed framework that allows robots to self-learn from human corrections without needing full retraining. When a robot fails, a human operator provides a brief "patch" or demonstration. The system then uses online off-policy reinforcement learning to update the behavior instantly. This is powered by a massive foundation model trained on hundreds of millions of interactions from over 100 deployed robot stations. The best part is the distributed parameter synchronization... When one robot learns a fix, the update is published fleet-wide... meaning the entire swarm gets smarter from a single human intervention. They are already proving this on complex manipulation tasks like shoe unboxing and screw sorting, drastically reducing the data needed to handle edge cases. Real-world environments are unforgiving, and I love seeing systems that can actually adapt on the fly! 📍 More info:

Foundation models are enough to solve robotics! Unfortunately, this is not true. We keep hearing that Vision-Language-Action (VLA) models struggle because of the gap between static training and the dynamic real world. A German startup (Sereact) just released a solution that bridges this gap perfectly. They are introducing a new paradigm called Interactive RL Policy Patching. It's a distributed framework that allows robots to self-learn from human corrections without needing full retraining. When a robot fails, a human operator provides a brief "patch" or demonstration. The system then uses online off-policy reinforcement learning to update the behavior instantly. This is powered by a massive foundation model trained on hundreds of millions of interactions from over 100 deployed robot stations. The best part is the distributed parameter synchronization... When one robot learns a fix, the update is published fleet-wide... meaning the entire swarm gets smarter from a single human intervention. They are already proving this on complex manipulation tasks like shoe unboxing and screw sorting, drastically reducing the data needed to handle edge cases. Real-world environments are unforgiving, and I love seeing systems that can actually adapt on the fly! 📍 More info:

Ilir Aliu

18,210 views • 6 months ago

Check out our #ICRA2024 paper "Actor-Critic Model Predictive Control." Model-free #reinforcementlearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) benefits from robustness and online replanning capabilities. We combine both approaches by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an Actor-Critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in simulation and the real world with a quadcopter across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out-of-distribution behavior. Paper: Full Video with more details: Kudos to Ángel Romero, Yunlong Song IEEE ICRA University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Check out our #ICRA2024 paper "Actor-Critic Model Predictive Control." Model-free #reinforcementlearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) benefits from robustness and online replanning capabilities. We combine both approaches by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an Actor-Critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in simulation and the real world with a quadcopter across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out-of-distribution behavior. Paper: Full Video with more details: Kudos to Ángel Romero, Yunlong Song IEEE ICRA University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Davide Scaramuzza

34,889 views • 2 years ago

Introducing FLUX-mimic, a next-generation Video-Action Model for general purpose dexterity, developed in partnership with Black Forest Labs. Late last year we published mimic-video and introduced Video-Action Models (VAM): a new family of robotics foundation models built on top of video generation models. We showed that robot control reduces to visual prediction, and that robot capability is downstream of improvements in video modeling accuracy. The obvious implication was that advances in the video modeling frontier would directly translate to increased capabilities in end-to-end robot learning. FLUX-mimic is that thesis at frontier scale: We've applied our VAM architecture to the strongest video backbone available today, FLUX 3 from Black Forest Labs, and trained it on data from our own robots and wearables. General-purpose dexterity, running on a single GPU on premises. Because the model already understands world dynamics, it needs far fewer demonstrations to learn a new task. This is game-changing for our mission to deploy robots to factory floors, where industrial robot data is scarce and expensive to collect. We're now testing and deploying FLUX-mimic with manufacturing leaders like Audi USA, on complex, multi-step manipulation long considered impossible for conventional automation.

Introducing FLUX-mimic, a next-generation Video-Action Model for general purpose dexterity, developed in partnership with Black Forest Labs. Late last year we published mimic-video and introduced Video-Action Models (VAM): a new family of robotics foundation models built on top of video generation models. We showed that robot control reduces to visual prediction, and that robot capability is downstream of improvements in video modeling accuracy. The obvious implication was that advances in the video modeling frontier would directly translate to increased capabilities in end-to-end robot learning. FLUX-mimic is that thesis at frontier scale: We've applied our VAM architecture to the strongest video backbone available today, FLUX 3 from Black Forest Labs, and trained it on data from our own robots and wearables. General-purpose dexterity, running on a single GPU on premises. Because the model already understands world dynamics, it needs far fewer demonstrations to learn a new task. This is game-changing for our mission to deploy robots to factory floors, where industrial robot data is scarce and expensive to collect. We're now testing and deploying FLUX-mimic with manufacturing leaders like Audi USA, on complex, multi-step manipulation long considered impossible for conventional automation.

mimic

113,051 views • 4 days ago

The power of generative models — now embodied in humanoids. Announcing DreamControl –– After a year-long research effort at General Robotics — we present a scalable framework for whole-body humanoid control that fuses diffusion priors with reinforcement learning to unlock real-world scene interaction. Diffusion + RL → natural whole-body skills on real robots. DreamControl enables humanoids to move beyond locomotion demos → performing natural, human-like skills such as –– Picking & lifting objects, Opening drawers & doors, Precise punching, kicking, and jumping, Bimanual manipulation tasks Our key innovation: a diffusion prior over human motion that guides RL, eliminating the need for massive teleoperation datasets, and producing motions that look human while transferring to real hardware. Trained purely in simulation, deployed on the Unitree G1 humanoid, DreamControl policies run in real time, bridging sim-to-real with unprecedented naturalness. We leverage a novel hybrid edge + cloud infrastructure that runs RL-trained policies on the edge backed by powerful AI models running in the cloud This is the next step in General Robotics’ journey toward general-purpose humanoid assistants that interact, adapt, and assist autonomously. Paper: Blog: 1/n

The power of generative models — now embodied in humanoids. Announcing DreamControl –– After a year-long research effort at General Robotics — we present a scalable framework for whole-body humanoid control that fuses diffusion priors with reinforcement learning to unlock real-world scene interaction. Diffusion + RL → natural whole-body skills on real robots. DreamControl enables humanoids to move beyond locomotion demos → performing natural, human-like skills such as –– Picking & lifting objects, Opening drawers & doors, Precise punching, kicking, and jumping, Bimanual manipulation tasks Our key innovation: a diffusion prior over human motion that guides RL, eliminating the need for massive teleoperation datasets, and producing motions that look human while transferring to real hardware. Trained purely in simulation, deployed on the Unitree G1 humanoid, DreamControl policies run in real time, bridging sim-to-real with unprecedented naturalness. We leverage a novel hybrid edge + cloud infrastructure that runs RL-trained policies on the edge backed by powerful AI models running in the cloud This is the next step in General Robotics’ journey toward general-purpose humanoid assistants that interact, adapt, and assist autonomously. Paper: Blog: 1/n

Ashish Kapoor

118,133 views • 10 months ago

"A parcel with snacks has been delivered for Flexion. Retrieve it using the stairs and come up using the elevator. Then unpack it and place the items into the empty drawer on the shelf in the snack area." One instruction. No human operator. Everything that follows is autonomous. Today we're introducing Reflect v1.0, our robotics intelligence platform for long-horizon work. From a single natural-language command, the robot understands the task, navigates a multi-floor building, calls elevators, handles doors, uses tools to unpack a box, and puts the items away. The biggest shift in v1.0 is that we use reinforcement learning across every layer, from low-level control to high-level reasoning. Long-horizon autonomy is unforgiving. The robot must recover on its own when things don't go to plan because in the real world, they never do. Combining reasoning, perception, physical execution and runtime robustness into a single mission-capable system is the foundation required to solve humanoid autonomy. Our team is just getting started. #HumanoidRobots #Flexion

"A parcel with snacks has been delivered for Flexion. Retrieve it using the stairs and come up using the elevator. Then unpack it and place the items into the empty drawer on the shelf in the snack area." One instruction. No human operator. Everything that follows is autonomous. Today we're introducing Reflect v1.0, our robotics intelligence platform for long-horizon work. From a single natural-language command, the robot understands the task, navigates a multi-floor building, calls elevators, handles doors, uses tools to unpack a box, and puts the items away. The biggest shift in v1.0 is that we use reinforcement learning across every layer, from low-level control to high-level reasoning. Long-horizon autonomy is unforgiving. The robot must recover on its own when things don't go to plan because in the real world, they never do. Combining reasoning, perception, physical execution and runtime robustness into a single mission-capable system is the foundation required to solve humanoid autonomy. Our team is just getting started. #HumanoidRobots #Flexion

Flexion

136,271 views • 28 days ago

Karpathy's prediction about RL is coming true now! He called reward functions unreliable and argued that a single reward number is too low-dimensional to teach an agent what "good" means for complex tasks. To solve this, Agents need a knowledge-guided review as a higher-dimensional feedback channel. Every major AI lab trains models with RL today (OpenAI, Anthropic, DeepSeek). And their key bottleneck has always been the reward functions. GRPO by DeepSeek worked well for math and code because the environment gave a binary signal. But for real agent tasks, someone still has to hand-code the scoring function. That takes days and breaks every time the pipeline changes. RULER (implemented in OpenPipe ART, 10k stars) addresses the exact problem Karpathy identified. The reward criteria are defined in plain English, and an LLM evaluates each trajectory against that description to provide feedback for training. I trained a Qwen3 1.4B agent that plays 2048 using GRPO with this exact workflow. In this case, the agent saw the board, picked a direction, and RULER evaluated the outcome, all from this natural language definition. You can see the full implementation on GitHub and try it yourself. Here's the ART Repo: (don't forget to star it ⭐ ) Just like RLHF replaced manual rankings and GRPO replaced the critic model, natural language rewards are replacing hand-coded scoring functions. RL reward engineering is now prompt engineering. I wrote a full walkthrough covering RL for LLM agents, from RLHF to GRPO to RULER, in the article below.

Karpathy's prediction about RL is coming true now! He called reward functions unreliable and argued that a single reward number is too low-dimensional to teach an agent what "good" means for complex tasks. To solve this, Agents need a knowledge-guided review as a higher-dimensional feedback channel. Every major AI lab trains models with RL today (OpenAI, Anthropic, DeepSeek). And their key bottleneck has always been the reward functions. GRPO by DeepSeek worked well for math and code because the environment gave a binary signal. But for real agent tasks, someone still has to hand-code the scoring function. That takes days and breaks every time the pipeline changes. RULER (implemented in OpenPipe ART, 10k stars) addresses the exact problem Karpathy identified. The reward criteria are defined in plain English, and an LLM evaluates each trajectory against that description to provide feedback for training. I trained a Qwen3 1.4B agent that plays 2048 using GRPO with this exact workflow. In this case, the agent saw the board, picked a direction, and RULER evaluated the outcome, all from this natural language definition. You can see the full implementation on GitHub and try it yourself. Here's the ART Repo: (don't forget to star it ⭐ ) Just like RLHF replaced manual rankings and GRPO replaced the critic model, natural language rewards are replacing hand-coded scoring functions. RL reward engineering is now prompt engineering. I wrote a full walkthrough covering RL for LLM agents, from RLHF to GRPO to RULER, in the article below.

Avi Chawla

350,036 views • 2 months ago

China’s pretty humanoid robot stuns by opening a car door in a ‘world’s first’ | Jijo Malayil, Interesting Engineering Mornine used onboard sensors and full-body control to locate the handle, adjust posture, and open a car door—no human input needed. AiMOGA Robotics has claimed to have reached a significant milestone in embodied AI with its humanoid robot, Mornine, autonomously opening a car door inside a functioning Chery dealership in China. Relying solely on onboard sensors, full-body motion control, and end-to-end reinforcement learning, Mornine performed the task without any human input. Unlike scripted or teleoperated robots, Mornie identified the door handle, adjusted its posture, and used coordinated force across its limbs and torso to complete the action—demonstrating advanced autonomy in a real-world setting. “The deployment marks one of the first instances of a service robot executing such a high-friction, physical interaction in a live commercial setting,” said the firm in a statement. In April, at the Shanghai Auto Show, automotive brands Omoda and Jaecoo, subsidiaries of Chery Automobile, introduced Mornine, designed for use in car dealerships. From sim to service Opening a car door may seem like a simple task, but AiMOGA Robotics views it as a pivotal moment in robotics—signaling a shift from simulation to real-world service, and from basic command execution to autonomous capability. Using only onboard sensors and full-body motion control, Mornine identified the door handle, adjusted her posture, and applied coordinated force across her limbs to open the door—entirely without human intervention. Mornine’s advanced sensor suite includes 3D LiDAR, depth and wide-angle cameras, and a visual-language model (VLM), enabling real-time perception of door position and opening status. Uniquely, Mornine wasn’t explicitly programmed to recognize door handles. Instead, she learned through reinforcement learning, undergoing millions of simulated cycles to focus on the right region and perform the task independently. “We never explicitly told the robot what a door handle is. It learned to focus on that region by itself,” said the engineering team at AiMOGA Robotics in a statement. The learned model was transferred to the real world using Sim2Real methods. Mornine continuously gathers live sensor data during operation, which feeds into a cloud-based training loop, allowing her to improve through continuous learning in real-world settings, reports Robotics Tomorrow. Now active in multiple Chery 4S dealerships in China, Mornine not only opens car doors but also assists with customer greetings, vehicle introductions, and item delivery—marking a step forward in humanoid robotics for commercial retail environments. AI meets retail Originally introduced as the AiMOGA Robot, Mornine was developed to support dealership sales by performing tasks such as explaining vehicle specifications, leading showroom tours, serving refreshments, and engaging with customers in multiple languages. First conceived by Chery as a virtual character to appeal to Generation Z using metaverse and virtual human technologies, Mornine gradually evolved into a real-world interactive humanoid. After multiple iterations of character and model design, Mornine debuted as a digital persona in animations, livestreams, and promotional content, gaining brand recognition. Chery later expanded the concept beyond the virtual space, resulting in the creation of the AiMOGA humanoid robot. Leveraging Chery’s expertise in autonomous driving, environmental sensing, and control systems, AiMOGA features full-stack capabilities in perception, cognition, decision-making, and execution. It uses multimodal sensing—combining speech, vision, and environmental data—to interpret user gestures, commands, and showroom dynamics. A bionic motion system and automotive-grade hardware enable dexterous movement and upright mobility, while multi-robot collaboration allows for coordinated tasks like guided tours. At the decision-making layer, Deepseek’s large language models enable natural language understanding and personalized interaction. In April 2025, Mornine officially began commercial service as an “Intelligent Sales Consultant” at the OMODA C5 JOYSTAR 4S dealership in Kuala Lumpur, Malaysia—marking her full transition from a virtual concept to a real-world humanoid sales assistant.

Owen Gregorian

67,975 views • 11 months ago

Not the flashiest demos, but what’s under the hood represents a foundational shift for general-purpose robotics. World models are the next-gen foundation of Physical AI, not the VLM backbones found in typical VLAs. DreamZero is a 14B-parameter World Action Model (WAM) by NVIDIA that treats robotics as a joint video-and-action prediction task. Unlike traditional Vision-Language-Action (VLA) models that map images directly to motor commands, DreamZero leverages a pretrained video diffusion backbone to predict future world states and actions simultaneously. - achieves 2× better zero-shot generalization to unseen tasks and environments compared to state-of-the-art VLAs. - learns effectively from heterogeneous, non-repetitive data (500 hours), breaking the need for thousands of repeated demonstrations. - adapts to new robot embodiments with just 30 minutes of play data. - enables 7Hz closed-loop control via system optimizations and "DreamZero-Flash," making high-capacity diffusion models viable for real-time use.

Not the flashiest demos, but what’s under the hood represents a foundational shift for general-purpose robotics. World models are the next-gen foundation of Physical AI, not the VLM backbones found in typical VLAs. DreamZero is a 14B-parameter World Action Model (WAM) by NVIDIA that treats robotics as a joint video-and-action prediction task. Unlike traditional Vision-Language-Action (VLA) models that map images directly to motor commands, DreamZero leverages a pretrained video diffusion backbone to predict future world states and actions simultaneously. - achieves 2× better zero-shot generalization to unseen tasks and environments compared to state-of-the-art VLAs. - learns effectively from heterogeneous, non-repetitive data (500 hours), breaking the need for thousands of repeated demonstrations. - adapts to new robot embodiments with just 30 minutes of play data. - enables 7Hz closed-loop control via system optimizations and "DreamZero-Flash," making high-capacity diffusion models viable for real-time use.

The Humanoid Hub

35,204 views • 5 months ago

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Davide Scaramuzza

27,090 views • 6 months ago

Video: World’s first humanoid robot labor that swaps its own batteries to work endlessly | Jijo Malayil, Interesting Engineering Walker S2 uses dual-battery balancing and standardized modules to boost efficiency and ensure uninterrupted, optimized performance. In a leap for robotics, China’s UBTech has unveiled the Walker S2, the world’s first humanoid robot capable of fully autonomous battery swapping. Designed for non-stop industrial operations, the Walker S2 can replace its own power pack in just three minutes—no human intervention required. Equipped with advanced anthropomorphic bipedal locomotion and a hot-swappable battery system, Walker S2 is built to operate 24/7 across dynamic industrial environments. According to UBTech, the next-generation humanoid robot marks a major milestone in automation, bringing continuous, hands-free performance to the factory floor. In May 2025, UBTech Robotics and Huawei Technologies inked a significant partnership to accelerate the adoption of humanoid robots across China’s factories and households. Uninterrupted robot operations A video posted by the robotics firm opens with the sleek UBTech Walker S2 humanoid robot working in an industrial setting. The highlight, however, is its autonomous battery swap. Walker S2 approaches the charging station, carefully detaches its depleted power pack, and seamlessly installs a fresh one—all within about three minutes—without any human assistance, according to CGTN. The camera captures close-ups of the robot’s articulated limbs and the intelligent battery-handling mechanism, conveying precision and reliability. As the swap completes, Walker S2 resumes its duties, reinforcing the promise of uninterrupted, 24/7 operations in dynamic factory environments. UBTech’s Walker S2 humanoid robot is equipped with advanced dual-battery power balancing technology and uses standardized battery modules to optimize performance, reports CNEVPOST. This dual-battery system allows the robot to automatically switch to a backup battery in case of a main battery failure, ensuring that critical tasks are carried out without interruption. In addition to battery swapping, the robot can intelligently choose between charging and swapping based on task urgency, allowing it to manage energy dynamically and adapt to real-time operational demands. UBTech highlights these features as a step forward in deploying humanoid robots for industrial and domestic applications, combining flexibility, reliability, and autonomy in one intelligent platform. Factory intelligence upgrade Earlier in the year, UBTech unveiled a major advancement in humanoid robot collaboration, claiming the world’s first deployment of multiple humanoids working together across varied industrial tasks. Demonstrated at Zeekr’s 5G-enabled smart factory, the breakthrough centers on UBTech’s “BrainNet” framework, which orchestrates cooperative behavior through a cloud-device intelligence system. BrainNet integrates a “super brain” for high-level decision-making with an “intelligent sub-brain” for distributed multi-robot control. The super brain, powered by a proprietary large-scale multimodal reasoning model, handles complex production-line scheduling and decision-making. Meanwhile, the sub-brain coordinates real-time tasks using cross-field perception and Transformer-based control for dynamic adaptability. Together, they enable the Walker S1 humanoid robots to move beyond isolated operations and perform coordinated tasks with high precision and speed. The system is built on DeepSeek-R1 reasoning technology and trained on real-world data from automotive factory settings. Leveraging Retrieval-Augmented Generation (RAG), the model adapts to specific job functions and improves scalability across workstations. At Zeekr’s facility, dozens of Walker S1s now collaborate on tasks like assembly, inspection, and part handling. Using semantic VSLAM and shared mapping, they coordinate seamlessly via vision-based navigation and agile manipulation. UBTech says this marks a transition to “Practical Training 2.0,” where humanoid robots operate as a swarm, maximizing efficiency and setting the stage for next-generation intelligent manufacturing.

Video: World’s first humanoid robot labor that swaps its own batteries to work endlessly | Jijo Malayil, Interesting Engineering Walker S2 uses dual-battery balancing and standardized modules to boost efficiency and ensure uninterrupted, optimized performance. In a leap for robotics, China’s UBTech has unveiled the Walker S2, the world’s first humanoid robot capable of fully autonomous battery swapping. Designed for non-stop industrial operations, the Walker S2 can replace its own power pack in just three minutes—no human intervention required. Equipped with advanced anthropomorphic bipedal locomotion and a hot-swappable battery system, Walker S2 is built to operate 24/7 across dynamic industrial environments. According to UBTech, the next-generation humanoid robot marks a major milestone in automation, bringing continuous, hands-free performance to the factory floor. In May 2025, UBTech Robotics and Huawei Technologies inked a significant partnership to accelerate the adoption of humanoid robots across China’s factories and households. Uninterrupted robot operations A video posted by the robotics firm opens with the sleek UBTech Walker S2 humanoid robot working in an industrial setting. The highlight, however, is its autonomous battery swap. Walker S2 approaches the charging station, carefully detaches its depleted power pack, and seamlessly installs a fresh one—all within about three minutes—without any human assistance, according to CGTN. The camera captures close-ups of the robot’s articulated limbs and the intelligent battery-handling mechanism, conveying precision and reliability. As the swap completes, Walker S2 resumes its duties, reinforcing the promise of uninterrupted, 24/7 operations in dynamic factory environments. UBTech’s Walker S2 humanoid robot is equipped with advanced dual-battery power balancing technology and uses standardized battery modules to optimize performance, reports CNEVPOST. This dual-battery system allows the robot to automatically switch to a backup battery in case of a main battery failure, ensuring that critical tasks are carried out without interruption. In addition to battery swapping, the robot can intelligently choose between charging and swapping based on task urgency, allowing it to manage energy dynamically and adapt to real-time operational demands. UBTech highlights these features as a step forward in deploying humanoid robots for industrial and domestic applications, combining flexibility, reliability, and autonomy in one intelligent platform. Factory intelligence upgrade Earlier in the year, UBTech unveiled a major advancement in humanoid robot collaboration, claiming the world’s first deployment of multiple humanoids working together across varied industrial tasks. Demonstrated at Zeekr’s 5G-enabled smart factory, the breakthrough centers on UBTech’s “BrainNet” framework, which orchestrates cooperative behavior through a cloud-device intelligence system. BrainNet integrates a “super brain” for high-level decision-making with an “intelligent sub-brain” for distributed multi-robot control. The super brain, powered by a proprietary large-scale multimodal reasoning model, handles complex production-line scheduling and decision-making. Meanwhile, the sub-brain coordinates real-time tasks using cross-field perception and Transformer-based control for dynamic adaptability. Together, they enable the Walker S1 humanoid robots to move beyond isolated operations and perform coordinated tasks with high precision and speed. The system is built on DeepSeek-R1 reasoning technology and trained on real-world data from automotive factory settings. Leveraging Retrieval-Augmented Generation (RAG), the model adapts to specific job functions and improves scalability across workstations. At Zeekr’s facility, dozens of Walker S1s now collaborate on tasks like assembly, inspection, and part handling. Using semantic VSLAM and shared mapping, they coordinate seamlessly via vision-based navigation and agile manipulation. UBTech says this marks a transition to “Practical Training 2.0,” where humanoid robots operate as a swarm, maximizing efficiency and setting the stage for next-generation intelligent manufacturing.

Owen Gregorian

35,637 views • 1 year ago

Today, we publicly released RoboCasa365, a large-scale simulation benchmark for training and systematically evaluating generalist robot models. Built upon our original RoboCasa framework, it offers: • 2,500 realistic kitchen environments; • 365 everyday tasks (basic skills + long-horizon mobile manipulation); • Over 3,200 objects with many articulated fixtures/appliances. All are designed for fully controlled, reproducible benchmarking of robotic policies. Progress in robotic foundation models is real. But it’s still hard to answer basic questions like: How close are we to general-purpose autonomy? What factors drive generalization? What are the model/data scaling curves like? Real-world eval is slow and noisy, and existing sims (like LIBERO, which we built 3 years ago) often lack sufficient task and scene diversity. This benchmark comes with 2,200+ hours of demonstrations and 500K+ trajectories to support studies of multi-task training, pretraining, and continual learning at scale. Check it out at

Today, we publicly released RoboCasa365, a large-scale simulation benchmark for training and systematically evaluating generalist robot models. Built upon our original RoboCasa framework, it offers: • 2,500 realistic kitchen environments; • 365 everyday tasks (basic skills + long-horizon mobile manipulation); • Over 3,200 objects with many articulated fixtures/appliances. All are designed for fully controlled, reproducible benchmarking of robotic policies. Progress in robotic foundation models is real. But it’s still hard to answer basic questions like: How close are we to general-purpose autonomy? What factors drive generalization? What are the model/data scaling curves like? Real-world eval is slow and noisy, and existing sims (like LIBERO, which we built 3 years ago) often lack sufficient task and scene diversity. This benchmark comes with 2,200+ hours of demonstrations and 500K+ trajectories to support studies of multi-task training, pretraining, and continual learning at scale. Check it out at

Yuke Zhu

23,977 views • 4 months ago

Today at Meta FAIR we’re announcing three new cutting-edge developments in robotics and touch perception — and releasing a collection of artifacts to empower the community to build on this work. Details on all of this new work ➡️ 1️⃣ Meta Sparsh is the first general-purpose encoder for vision-based tactile sensing that works across many tactile sensors and many tasks. Trained on 460K+ tactile images using self-supervised learning. 2️⃣ Meta Digit 360 is a breakthrough artificial fingertip-based tactile sensor, equipped with 18+ sensing features to deliver detailed touch data with human-level precision and touch-sensing capabilities. 3️⃣ Meta Digit Plexus is a standardized platform for robotic sensor connections and interactions. It provides a hardware-software solution to integrate tactile sensors on a single robot hand and enables seamless data collection, control and analysis over a single cable. The potential impact of expanding capabilities and components like these for the open source community ranges from medical research to supply chain, manufacturing and much more. We’re excited to continue this work with the broader community.

Today at Meta FAIR we’re announcing three new cutting-edge developments in robotics and touch perception — and releasing a collection of artifacts to empower the community to build on this work. Details on all of this new work ➡️ 1️⃣ Meta Sparsh is the first general-purpose encoder for vision-based tactile sensing that works across many tactile sensors and many tasks. Trained on 460K+ tactile images using self-supervised learning. 2️⃣ Meta Digit 360 is a breakthrough artificial fingertip-based tactile sensor, equipped with 18+ sensing features to deliver detailed touch data with human-level precision and touch-sensing capabilities. 3️⃣ Meta Digit Plexus is a standardized platform for robotic sensor connections and interactions. It provides a hardware-software solution to integrate tactile sensors on a single robot hand and enables seamless data collection, control and analysis over a single cable. The potential impact of expanding capabilities and components like these for the open source community ranges from medical research to supply chain, manufacturing and much more. We’re excited to continue this work with the broader community.

AI at Meta

453,260 views • 1 year ago

Robots from Humanoid passed real factory test at Siemens! 😮‍💨 Humanoid from the UK just completed a successful proof of concept with Siemens, deploying their HMND 01 wheeled humanoid robot in a real working factory. The task was simple but important: pick totes from a storage stack, transport them to a conveyor, and place them at the pickup point for human operators. Repeat until the stack is empty. The robot achieved 60 tote moves per hour, handled two different tote sizes, ran autonomously for more than 30 minutes at a time, stayed operational for over eight hours, and maintained over 90% success rate on pick and place tasks. This matters because it's not a demo video. It's a real deployment in Siemens' Electronics Factory in Erlangen, handling actual production work, measured against real operational metrics. The test ran in two phases. First, Humanoid built a physical twin in-house to test and optimize the system. Then they deployed on-site at Siemens for two weeks, where the robot operated in the real production environment. We can see a wheeled humanoid robot in action! So, now the ultimate question: wheels or legs? 👀 ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Robots from Humanoid passed real factory test at Siemens! 😮‍💨 Humanoid from the UK just completed a successful proof of concept with Siemens, deploying their HMND 01 wheeled humanoid robot in a real working factory. The task was simple but important: pick totes from a storage stack, transport them to a conveyor, and place them at the pickup point for human operators. Repeat until the stack is empty. The robot achieved 60 tote moves per hour, handled two different tote sizes, ran autonomously for more than 30 minutes at a time, stayed operational for over eight hours, and maintained over 90% success rate on pick and place tasks. This matters because it's not a demo video. It's a real deployment in Siemens' Electronics Factory in Erlangen, handling actual production work, measured against real operational metrics. The test ran in two phases. First, Humanoid built a physical twin in-house to test and optimize the system. Then they deployed on-site at Siemens for two weeks, where the robot operated in the real production environment. We can see a wheeled humanoid robot in action! So, now the ultimate question: wheels or legs? 👀 ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Lukas Ziegler

21,993 views • 5 months ago

🚨 BREAKING: Microsoft's first robotics foundation model! 🤯 Microsoft just announced Rho-alpha (ρα), their first robotics model derived from the Phi series of vision-language models. Rho-alpha translates natural language commands into control signals for robotic systems performing bimanual manipulation tasks. Commands like "push the green button with the right gripper," "pull out the red wire," "flip the top switch on," or "turn the knob to position 5" get executed directly by dual-arm robots. What makes this different from standard vision-language-action (VLA) models is the additional modalities. Rho-alpha is a VLA+ model that adds tactile sensing to the perceptual mix, with plans to incorporate force feedback. On the learning side, the model is designed to continually improve during deployment by learning from human feedback. The training approach combines trajectories from physical demonstrations and simulated tasks with web-scale visual question answering data. Since teleoperation data is scarce and expensive, Microsoft is using NVIDIA Isaac Sim on Azure to generate physically accurate synthetic datasets via reinforcement learning. These simulated trajectories get combined with commercial and open physical demonstration datasets. The model is currently under evaluation on dual-arm setups and humanoid robots. Microsoft is opening an Early Access Program for organizations interested in evaluating Rho-alpha. Robots that can adapt to dynamic situations and human preferences are more useful in real environments and more trusted by the people operating them. Read more here: ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

🚨 BREAKING: Microsoft's first robotics foundation model! 🤯 Microsoft just announced Rho-alpha (ρα), their first robotics model derived from the Phi series of vision-language models. Rho-alpha translates natural language commands into control signals for robotic systems performing bimanual manipulation tasks. Commands like "push the green button with the right gripper," "pull out the red wire," "flip the top switch on," or "turn the knob to position 5" get executed directly by dual-arm robots. What makes this different from standard vision-language-action (VLA) models is the additional modalities. Rho-alpha is a VLA+ model that adds tactile sensing to the perceptual mix, with plans to incorporate force feedback. On the learning side, the model is designed to continually improve during deployment by learning from human feedback. The training approach combines trajectories from physical demonstrations and simulated tasks with web-scale visual question answering data. Since teleoperation data is scarce and expensive, Microsoft is using NVIDIA Isaac Sim on Azure to generate physically accurate synthetic datasets via reinforcement learning. These simulated trajectories get combined with commercial and open physical demonstration datasets. The model is currently under evaluation on dual-arm setups and humanoid robots. Microsoft is opening an Early Access Program for organizations interested in evaluating Rho-alpha. Robots that can adapt to dynamic situations and human preferences are more useful in real environments and more trusted by the people operating them. Read more here: ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Lukas Ziegler

60,912 views • 6 months ago

NEWS: Humanoid robotics company Figure has released Helix 02, what they claim in their most capable humanoid model yet. "A single neural system that controls the full body directly from pixels, enabling dexterous, long horizon autonomy across an entire room: • Autonomous, long‑horizon loco-manipulation: Helix 02 unloads and reloads a dishwasher across a full-sized kitchen - a four-minute, end-to-end autonomous task that integrates walking, manipulation, and balance with no resets and no human intervention. We believe this is the longest horizon, most complex task completed autonomously by a humanoid robot to date. • All sensors in. All actuators out: Helix 02 connects every onboard sensor - vision, touch, and proprioception - directly to every actuator through a single unified visuomotor neural network. • Human-like whole body control from human data: All results are enabled by System 0, a learned whole‑body controller trained on over 1,000 hours of human motion data and sim‑to‑real reinforcement learning. System 0 replaces 109,504 lines of hand‑engineered C++ with a single neural prior for stable, natural motion. • New classes of dexterity: With Figure 03’s embedded tactile sensing and palm cameras, Helix 02 performs manipulation that was previously out of reach: extracting individual pills, dispensing precise syringe volumes, and singulating small, irregular objects from clutter despite self‑occlusion. Helix 02 is trained on over 1,000 hours of human motion data and integrates vision, touch, and proprioception."

NEWS: Humanoid robotics company Figure has released Helix 02, what they claim in their most capable humanoid model yet. "A single neural system that controls the full body directly from pixels, enabling dexterous, long horizon autonomy across an entire room: • Autonomous, long‑horizon loco-manipulation: Helix 02 unloads and reloads a dishwasher across a full-sized kitchen - a four-minute, end-to-end autonomous task that integrates walking, manipulation, and balance with no resets and no human intervention. We believe this is the longest horizon, most complex task completed autonomously by a humanoid robot to date. • All sensors in. All actuators out: Helix 02 connects every onboard sensor - vision, touch, and proprioception - directly to every actuator through a single unified visuomotor neural network. • Human-like whole body control from human data: All results are enabled by System 0, a learned whole‑body controller trained on over 1,000 hours of human motion data and sim‑to‑real reinforcement learning. System 0 replaces 109,504 lines of hand‑engineered C++ with a single neural prior for stable, natural motion. • New classes of dexterity: With Figure 03’s embedded tactile sensing and palm cameras, Helix 02 performs manipulation that was previously out of reach: extracting individual pills, dispensing precise syringe volumes, and singulating small, irregular objects from clutter despite self‑occlusion. Helix 02 is trained on over 1,000 hours of human motion data and integrates vision, touch, and proprioception."

Sawyer Merritt

624,689 views • 6 months ago