Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

🚨 BREAKING: Microsoft's first robotics foundation model! 🤯 Microsoft just announced Rho-alpha (ρα), their first robotics model derived from the Phi series of vision-language models. Rho-alpha translates natural language commands into control signals for robotic systems performing bimanual manipulation tasks. Commands like "push the green button with the right... gripper," "pull out the red wire," "flip the top switch on," or "turn the knob to position 5" get executed directly by dual-arm robots. What makes this different from standard vision-language-action (VLA) models is the additional modalities. Rho-alpha is a VLA+ model that adds tactile sensing to the perceptual mix, with plans to incorporate force feedback. On the learning side, the model is designed to continually improve during deployment by learning from human feedback. The training approach combines trajectories from physical demonstrations and simulated tasks with web-scale visual question answering data. Since teleoperation data is scarce and expensive, Microsoft is using NVIDIA Isaac Sim on Azure to generate physically accurate synthetic datasets via reinforcement learning. These simulated trajectories get combined with commercial and open physical demonstration datasets. The model is currently under evaluation on dual-arm setups and humanoid robots. Microsoft is opening an Early Access Program for organizations interested in evaluating Rho-alpha. Robots that can adapt to dynamic situations and human preferences are more useful in real environments and more trusted by the people operating them. Read more here: ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →show more

Lukas Ziegler

57,945 subscribers

60,912 просмотров • 6 месяцев назад •via X (Twitter)

Новости и политика Образование Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

The next evolution: VLA+ models Just yesterday Microsoft Research released Rho-alpha (ρα) – their first robotics model, built on the Phi family. While most Vision-Language-Action (VLA) models stop at vision and language, Rho-alpha adds: ▪️ Tactile sensing to feel objects during manipulation ▪️ Online learning that lets it improve from human corrections (via teleoperation, 3D mouse or other tools) in real-time even after deployment. Both these sides make adaptability central rather than incidental. Microsoft calls it a VLA+ model, positioning it as an extension beyond what current VLA systems support. ➡️ Today Rho-alpha can control dual-arm robot setups to perform tasks such as: • Manipulating the BusyBox following natural-language instructions • Plug insertion • Toolbox packing and object arrangement with bimanual coordination But to understand why this "plus" matters, we need to understand what came before. Here, we'll take you through the entire landscape of VLA models – Gemini Robotics, π0, SmolVLA, Helix, ACoT-VLA and others:

The next evolution: VLA+ models Just yesterday Microsoft Research released Rho-alpha (ρα) – their first robotics model, built on the Phi family. While most Vision-Language-Action (VLA) models stop at vision and language, Rho-alpha adds: ▪️ Tactile sensing to feel objects during manipulation ▪️ Online learning that lets it improve from human corrections (via teleoperation, 3D mouse or other tools) in real-time even after deployment. Both these sides make adaptability central rather than incidental. Microsoft calls it a VLA+ model, positioning it as an extension beyond what current VLA systems support. ➡️ Today Rho-alpha can control dual-arm robot setups to perform tasks such as: • Manipulating the BusyBox following natural-language instructions • Plug insertion • Toolbox packing and object arrangement with bimanual coordination But to understand why this "plus" matters, we need to understand what came before. Here, we'll take you through the entire landscape of VLA models – Gemini Robotics, π0, SmolVLA, Helix, ACoT-VLA and others:

Turing Post

62,362 просмотров • 6 месяцев назад

Robots might learn better from video than from language! 📼 Most Vision-Language-Action (VLA) models learn what to do from text, but still struggle with how things move in the real world. That makes them data-hungry and slow to train. mimic video takes a different route. Instead of grounding robot control in text, it grounds it in video, using large pre-trained video models that already capture physical motion and dynamics. The idea is straightforward: let the video model handle “what will happen next,” and let a smaller control model focus only on turning that visual plan into robot actions. The result is big gains in practice. Robots trained this way need 10× less data, converge twice as fast, and perform better on both simulated benchmarks and real bimanual manipulation tasks. If robots can “imagine” motion using video, control becomes a much simpler problem. Shoutout to Jonas Pai, Liam Achenbach, Oier Mees, Elvis Nava and the rest of the team! Here's the project page: ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Robots might learn better from video than from language! 📼 Most Vision-Language-Action (VLA) models learn what to do from text, but still struggle with how things move in the real world. That makes them data-hungry and slow to train. mimic video takes a different route. Instead of grounding robot control in text, it grounds it in video, using large pre-trained video models that already capture physical motion and dynamics. The idea is straightforward: let the video model handle “what will happen next,” and let a smaller control model focus only on turning that visual plan into robot actions. The result is big gains in practice. Robots trained this way need 10× less data, converge twice as fast, and perform better on both simulated benchmarks and real bimanual manipulation tasks. If robots can “imagine” motion using video, control becomes a much simpler problem. Shoutout to Jonas Pai, Liam Achenbach, Oier Mees, Elvis Nava and the rest of the team! Here's the project page: ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Lukas Ziegler

49,920 просмотров • 7 месяцев назад

Brain-controlled exoskeletons to train humanoid robots! 🧠 Fourier just presented human tele-operators using brain control interfaces and exoskeletal arms to train humanoid robots on home tasks. The brain control interface is the interesting part. Instead of using a controller or joystick to teleoperate, the operator's movements and intentions are captured more naturally through the exoskeleton and BCI. This means the demonstrations are more fluid, more human-like, and better suited for training robots to perform delicate home tasks. Multiple tele-operators are simultaneously generating training data across multiple robots. This is how you build the dataset needed for eventual full autonomy, without waiting years for it to arrive. This might be the bridge between "robots that work in controlled environments" and "robots that work in homes." Not full autonomy right away, but trusted human intelligence operating through a robot body, getting better with every task completed. ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Brain-controlled exoskeletons to train humanoid robots! 🧠 Fourier just presented human tele-operators using brain control interfaces and exoskeletal arms to train humanoid robots on home tasks. The brain control interface is the interesting part. Instead of using a controller or joystick to teleoperate, the operator's movements and intentions are captured more naturally through the exoskeleton and BCI. This means the demonstrations are more fluid, more human-like, and better suited for training robots to perform delicate home tasks. Multiple tele-operators are simultaneously generating training data across multiple robots. This is how you build the dataset needed for eventual full autonomy, without waiting years for it to arrive. This might be the bridge between "robots that work in controlled environments" and "robots that work in homes." Not full autonomy right away, but trusted human intelligence operating through a robot body, getting better with every task completed. ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Lukas Ziegler

20,898 просмотров • 5 месяцев назад

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Jim Fan

466,148 просмотров • 1 год назад

VLAI Robotics Launches High-Dexterity Dual-Arm Robot Starting at ~$5,500 USD VLAI Robotics, utilizing the open-source design of Japan’s OpenArm team, has introduced a high-dexterity, low-cost bimanual robotic system. The product is priced from 39,900 RMB (approximately 5,500 USD), making high-level research tools accessible to institutions and developers. The robot features a human-scale, arm-hand integrated design with up to 16 DOF (8 DOF per arm, including the gripper). It can handle a dual-arm peak payload of 12 kg while maintaining high precision. A key breakthrough is its ability to replicate human upper limb movement trajectories, which enhances data quality for imitation learning and remote operation. VLAI Robotics handled the domestic engineering, manufacturing, and VLA (Vision-Language-Action) algorithm integration, with quality control organized according to the strict standards of the OpenArm R&D framework. The robot is highly adaptable for research, education, and Physical AI training, supporting easy expansion for features like VR teleoperation.

VLAI Robotics Launches High-Dexterity Dual-Arm Robot Starting at ~$5,500 USD VLAI Robotics, utilizing the open-source design of Japan’s OpenArm team, has introduced a high-dexterity, low-cost bimanual robotic system. The product is priced from 39,900 RMB (approximately 5,500 USD), making high-level research tools accessible to institutions and developers. The robot features a human-scale, arm-hand integrated design with up to 16 DOF (8 DOF per arm, including the gripper). It can handle a dual-arm peak payload of 12 kg while maintaining high precision. A key breakthrough is its ability to replicate human upper limb movement trajectories, which enhances data quality for imitation learning and remote operation. VLAI Robotics handled the domestic engineering, manufacturing, and VLA (Vision-Language-Action) algorithm integration, with quality control organized according to the strict standards of the OpenArm R&D framework. The robot is highly adaptable for research, education, and Physical AI training, supporting easy expansion for features like VR teleoperation.

RoboHub🤖

12,961 просмотров • 9 месяцев назад

We might be solving the wrong problem in robotics. That’s what this makes clear. UMI → Universal Manipulation Interface A simple $400 gripper that lets you teach robots by demonstration. You hold it like a tool. Show the task. The robot learns. No teleoperation. No expensive hardware. No robot-specific data. Stanford open-sourced everything → hardware, code, datasets. What stands out to me is the bottleneck. Not algorithms. Data. Teleoperation → ~35 demos/hour UMI → ~111 demos/hour And the data transfers across robots → UR5, Franka, others. The design is surprisingly practical: → GoPro fisheye lens (155° FOV) + mirrors for depth → SLAM + IMU for precise 6DoF tracking → latency matching for dynamic tasks → diffusion policies for multimodal actions Then it scales. Cheng Chi takes this further with Sunday Robotics (with Tony Zhao). A $200 glove → deployed in 500+ homes → ~10 million real-world interactions. Not lab data. Real human behavior. Their robot learns dishes, laundry, espresso → with zero robot-specific data. This is where the shift becomes obvious. From training robots in controlled environments → to learning directly from humans at scale So here’s the real question: Will robotics be unlocked by better models… or by unlocking data? #ArtificialIntelligence #Robotics #AI #Innovation #FutureOfWork

We might be solving the wrong problem in robotics. That’s what this makes clear. UMI → Universal Manipulation Interface A simple $400 gripper that lets you teach robots by demonstration. You hold it like a tool. Show the task. The robot learns. No teleoperation. No expensive hardware. No robot-specific data. Stanford open-sourced everything → hardware, code, datasets. What stands out to me is the bottleneck. Not algorithms. Data. Teleoperation → ~35 demos/hour UMI → ~111 demos/hour And the data transfers across robots → UR5, Franka, others. The design is surprisingly practical: → GoPro fisheye lens (155° FOV) + mirrors for depth → SLAM + IMU for precise 6DoF tracking → latency matching for dynamic tasks → diffusion policies for multimodal actions Then it scales. Cheng Chi takes this further with Sunday Robotics (with Tony Zhao). A $200 glove → deployed in 500+ homes → ~10 million real-world interactions. Not lab data. Real human behavior. Their robot learns dishes, laundry, espresso → with zero robot-specific data. This is where the shift becomes obvious. From training robots in controlled environments → to learning directly from humans at scale So here’s the real question: Will robotics be unlocked by better models… or by unlocking data? #ArtificialIntelligence #Robotics #AI #Innovation #FutureOfWork

Pascal Bornet

186,223 просмотров • 3 месяцев назад

Foundation models are enough to solve robotics! Unfortunately, this is not true. We keep hearing that Vision-Language-Action (VLA) models struggle because of the gap between static training and the dynamic real world. A German startup (Sereact) just released a solution that bridges this gap perfectly. They are introducing a new paradigm called Interactive RL Policy Patching. It's a distributed framework that allows robots to self-learn from human corrections without needing full retraining. When a robot fails, a human operator provides a brief "patch" or demonstration. The system then uses online off-policy reinforcement learning to update the behavior instantly. This is powered by a massive foundation model trained on hundreds of millions of interactions from over 100 deployed robot stations. The best part is the distributed parameter synchronization... When one robot learns a fix, the update is published fleet-wide... meaning the entire swarm gets smarter from a single human intervention. They are already proving this on complex manipulation tasks like shoe unboxing and screw sorting, drastically reducing the data needed to handle edge cases. Real-world environments are unforgiving, and I love seeing systems that can actually adapt on the fly! 📍 More info:

Foundation models are enough to solve robotics! Unfortunately, this is not true. We keep hearing that Vision-Language-Action (VLA) models struggle because of the gap between static training and the dynamic real world. A German startup (Sereact) just released a solution that bridges this gap perfectly. They are introducing a new paradigm called Interactive RL Policy Patching. It's a distributed framework that allows robots to self-learn from human corrections without needing full retraining. When a robot fails, a human operator provides a brief "patch" or demonstration. The system then uses online off-policy reinforcement learning to update the behavior instantly. This is powered by a massive foundation model trained on hundreds of millions of interactions from over 100 deployed robot stations. The best part is the distributed parameter synchronization... When one robot learns a fix, the update is published fleet-wide... meaning the entire swarm gets smarter from a single human intervention. They are already proving this on complex manipulation tasks like shoe unboxing and screw sorting, drastically reducing the data needed to handle edge cases. Real-world environments are unforgiving, and I love seeing systems that can actually adapt on the fly! 📍 More info:

Ilir Aliu

18,210 просмотров • 5 месяцев назад

🚨 BREAKING: ABB Robotics + NVIDIA close the sim-to-real gap with 99% accuracy! 👾 ABB Robotics is integrating NVIDIA Omniverse libraries into RobotStudio to deliver physical AI for industry, closing the gap from virtual training to real-world deployment with up to 99% accuracy. RobotStudio HyperReality, available second half of 2026, will fundamentally change how quickly manufacturers can scale production: reducing costs by up to 40%, accelerating time-to-market by 50%, and cutting setup and commissioning times by up to 80%. For decades, the deficit between simulation accuracy and real-world lighting, materials, and environments has limited manufacturers' ability to design advanced manufacturing processes in the virtual world. The only robot manufacturer with a virtual controller running the same firmware as the hardware, ensuring near-perfect correlation between simulation and real-world performance. The system uses physically accurate simulations and foundation models endlessly optimized with real-world data feedback. These models can train any number of ABB robots anywhere in the world with industrial-grade reliability. Foxconn is using RobotStudio HyperReality for consumer electronics assembly. Assembly robots are trained virtually using synthetic data to perfect multiple production processes across various scenarios, then moved to production lines with 99% accuracy. This eliminates physical training and tests, reducing setup times and costs. Workr is demonstrating AI-powered robotic systems at NVIDIA GTC 2026. Built on ABB technology, trained with synthetic data using NVIDIA Omniverse, deployed without operators needing programming knowledge . 🚨 I’ll be onsite in San Jose during GTC 2026, and will be showing all the cool stuff that ABB Robotics prepared this year! Can’t wait! 🫡 ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

🚨 BREAKING: ABB Robotics + NVIDIA close the sim-to-real gap with 99% accuracy! 👾 ABB Robotics is integrating NVIDIA Omniverse libraries into RobotStudio to deliver physical AI for industry, closing the gap from virtual training to real-world deployment with up to 99% accuracy. RobotStudio HyperReality, available second half of 2026, will fundamentally change how quickly manufacturers can scale production: reducing costs by up to 40%, accelerating time-to-market by 50%, and cutting setup and commissioning times by up to 80%. For decades, the deficit between simulation accuracy and real-world lighting, materials, and environments has limited manufacturers' ability to design advanced manufacturing processes in the virtual world. The only robot manufacturer with a virtual controller running the same firmware as the hardware, ensuring near-perfect correlation between simulation and real-world performance. The system uses physically accurate simulations and foundation models endlessly optimized with real-world data feedback. These models can train any number of ABB robots anywhere in the world with industrial-grade reliability. Foxconn is using RobotStudio HyperReality for consumer electronics assembly. Assembly robots are trained virtually using synthetic data to perfect multiple production processes across various scenarios, then moved to production lines with 99% accuracy. This eliminates physical training and tests, reducing setup times and costs. Workr is demonstrating AI-powered robotic systems at NVIDIA GTC 2026. Built on ABB technology, trained with synthetic data using NVIDIA Omniverse, deployed without operators needing programming knowledge . 🚨 I’ll be onsite in San Jose during GTC 2026, and will be showing all the cool stuff that ABB Robotics prepared this year! Can’t wait! 🫡 ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Lukas Ziegler

22,482 просмотров • 4 месяцев назад

Experiments in progress. The one on the right has been learning for ~3 hours, the one in the middle for ~1 hour, and the one on the left just started a few minutes ago. The initial motivation for making the physical Atari was just to commit ourselves to a subset of algorithms that can make progress in this setup. This commitment rules out algorithms that require billions of samples to learn (or worse, require multiple environments running in parallel). Atari games are simple enough that we should be able to show learning on them in a short amount of time with no prior knowledge. Since then, I've realized that this setup is also a good way to compare different paradigms in robotics in a principled way. These paradigms are sim2real, learning from tele-operated data, and learning directly on the robots. So far, I have observed that getting sim2real to work reliably is hard. It requires tweaks that don't scale. Policies that can play perfectly in simulation fall apart because of latencies and the messiness of the real world. These aspects could be modeled to improve the simulation, but not without sinking significant human engineering hours. I have higher hopes for learning from tele-operated data, but that requires a human to learn the task first. These experiments are on my to-do list. I have to learn to play some of the games well through the robot. I’m half-decent at playing Pong and Ms Pacman now. Learning directly on robots is looking like the most promising approach. This approach takes away pesky distribution shifts and makes it possible to have algorithms that continually improve with more data and time without any human intervention. It feels great to let experiments run overnight and wake up to find improved policies. With learning on robots, I should, in principle, be able to go on a long vacation and come back to find better policies for complex tasks beyond Atari games. Whether that is possible with current learning algorithms is a different question.

Experiments in progress. The one on the right has been learning for ~3 hours, the one in the middle for ~1 hour, and the one on the left just started a few minutes ago. The initial motivation for making the physical Atari was just to commit ourselves to a subset of algorithms that can make progress in this setup. This commitment rules out algorithms that require billions of samples to learn (or worse, require multiple environments running in parallel). Atari games are simple enough that we should be able to show learning on them in a short amount of time with no prior knowledge. Since then, I've realized that this setup is also a good way to compare different paradigms in robotics in a principled way. These paradigms are sim2real, learning from tele-operated data, and learning directly on the robots. So far, I have observed that getting sim2real to work reliably is hard. It requires tweaks that don't scale. Policies that can play perfectly in simulation fall apart because of latencies and the messiness of the real world. These aspects could be modeled to improve the simulation, but not without sinking significant human engineering hours. I have higher hopes for learning from tele-operated data, but that requires a human to learn the task first. These experiments are on my to-do list. I have to learn to play some of the games well through the robot. I’m half-decent at playing Pong and Ms Pacman now. Learning directly on robots is looking like the most promising approach. This approach takes away pesky distribution shifts and makes it possible to have algorithms that continually improve with more data and time without any human intervention. It feels great to let experiments run overnight and wake up to find improved policies. With learning on robots, I should, in principle, be able to go on a long vacation and come back to find better policies for complex tasks beyond Atari games. Whether that is possible with current learning algorithms is a different question.

Khurram Javed

52,110 просмотров • 7 месяцев назад

🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image & video tasks. 2️⃣ Meta Perception Language Model: A fully open & reproducible vision-language model designed to tackle visual recognition tasks. 3️⃣ Meta Locate 3D: An end-to-end model for accurate object localization in 3D environments. 4️⃣ Releasing model weights for our 8B-parameter Dynamic Byte Latent Transformer, an alternative to traditional tokenization methods with the potential to redefine the standards for language model efficiency and reliability. 5️⃣Collaborative Reasoner: A framework for evaluating & improving collaborative reasoning skills in language models. Download the code, datasets, and research papers and learn more about how these artifacts are paving the way for more efficient and accurate AI systems.➡️

🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image & video tasks. 2️⃣ Meta Perception Language Model: A fully open & reproducible vision-language model designed to tackle visual recognition tasks. 3️⃣ Meta Locate 3D: An end-to-end model for accurate object localization in 3D environments. 4️⃣ Releasing model weights for our 8B-parameter Dynamic Byte Latent Transformer, an alternative to traditional tokenization methods with the potential to redefine the standards for language model efficiency and reliability. 5️⃣Collaborative Reasoner: A framework for evaluating & improving collaborative reasoning skills in language models. Download the code, datasets, and research papers and learn more about how these artifacts are paving the way for more efficient and accurate AI systems.➡️

AI at Meta

163,313 просмотров • 1 год назад

🔥 JUST IN: Open-source robotics dataset from 100% real-world scenarios! 🤯 Chinese robotics company AGIBOT just released AGIBOT WORLD 2026, an open-source dataset systematically covering key embodied AI research directions. Built entirely from real-world environments: commercial spaces, and homes. Collected using AGIBOT G2 robots in free-form collection mode, providing structured, accurately annotated, high-quality data. Digital twin technology creates 1:1 scale replicas in simulation matching the real environments. Both real-world and simulation data are open-sourced. The AGIBOT G2 platform collects multiple data types simultaneously: RGB(D) cameras, tactile sensors, force sensors, LiDAR, IMU, and full-body joint states. Whole-body control coordinates arms, waist, and hands for complex tasks. First-person teleoperation lets operators control the robot from its perspective. The tasks covered are fine-grained manipulation, ultra-long-horizon tasks, spatial navigation, dual-arm coordination, and multi-agent/human-robot collaboration. The dataset includes error-recovery trajectories with annotations. Most datasets only show successful demonstrations. AGIBOT includes failures and how the robot recovers, teaching models how to handle mistakes. After collection, data is tested through policy training and real-robot deployment to ensure quality. Then processed through industrial quality control with multiple screening and cleaning rounds. Making it open-source accelerates embodied AI research by giving researchers access to high-quality real-world robot data at scale. 🇨🇳 Learn more here: ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

🔥 JUST IN: Open-source robotics dataset from 100% real-world scenarios! 🤯 Chinese robotics company AGIBOT just released AGIBOT WORLD 2026, an open-source dataset systematically covering key embodied AI research directions. Built entirely from real-world environments: commercial spaces, and homes. Collected using AGIBOT G2 robots in free-form collection mode, providing structured, accurately annotated, high-quality data. Digital twin technology creates 1:1 scale replicas in simulation matching the real environments. Both real-world and simulation data are open-sourced. The AGIBOT G2 platform collects multiple data types simultaneously: RGB(D) cameras, tactile sensors, force sensors, LiDAR, IMU, and full-body joint states. Whole-body control coordinates arms, waist, and hands for complex tasks. First-person teleoperation lets operators control the robot from its perspective. The tasks covered are fine-grained manipulation, ultra-long-horizon tasks, spatial navigation, dual-arm coordination, and multi-agent/human-robot collaboration. The dataset includes error-recovery trajectories with annotations. Most datasets only show successful demonstrations. AGIBOT includes failures and how the robot recovers, teaching models how to handle mistakes. After collection, data is tested through policy training and real-robot deployment to ensure quality. Then processed through industrial quality control with multiple screening and cleaning rounds. Making it open-source accelerates embodied AI research by giving researchers access to high-quality real-world robot data at scale. 🇨🇳 Learn more here: ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Lukas Ziegler

40,583 просмотров • 3 месяцев назад

Teaching by showing! 👨🏼‍🏫 Nordbo Robotics is making robotics accessible to all skill levels. Mimic is an intuitive tool allowing users to record and automate tasks without programming experience. By integrating Mimic with robotic systems, companies save time as operators can directly record movements, eliminating the need for extensive robot programming. They various industrial tasks like sanding, polishing, painting, and grinding, offering automation solutions for tasks traditionally requiring skilled human labor. 🧽 As you can see Universal Robots arm recreated the demonstrated path very accurately. Users can program Mimic-enabled robots with flexible options, including point-to-point movements, path generation, or demonstration. P.S. My first deburring :) ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Teaching by showing! 👨🏼‍🏫 Nordbo Robotics is making robotics accessible to all skill levels. Mimic is an intuitive tool allowing users to record and automate tasks without programming experience. By integrating Mimic with robotic systems, companies save time as operators can directly record movements, eliminating the need for extensive robot programming. They various industrial tasks like sanding, polishing, painting, and grinding, offering automation solutions for tasks traditionally requiring skilled human labor. 🧽 As you can see Universal Robots arm recreated the demonstrated path very accurately. Users can program Mimic-enabled robots with flexible options, including point-to-point movements, path generation, or demonstration. P.S. My first deburring :) ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Lukas Ziegler

15,923 просмотров • 3 месяцев назад

Humanoid Robots Get a Soul: Inside AheadForm's Empathy Engine Newcomer AheadForm just secured two funding rounds in three months, highlighting strong investor confidence in their unique approach to humanoid robots. Founded by a Columbia University Ph.D. Hu Yuhang, the company is betting that the key to mainstream adoption isn't just function, but empathy. They’ve built their tech on three core pillars: ► Autonomous Learning: Robots can learn and adapt on their own through "mirror self-perception" and self-modeling. ► Emotional Foundation Model: A model that processes multimodal data to give robots the ability to understand and respond with natural empathy. ► Bionic Face Hardware: A self-developed, highly realistic face designed to cross the "uncanny valley" and spark genuine human connection. Their core product, the Emo robot, is a bionic platform with real-time expression feedback. AheadForm believes this "Humanoid Empathy Value" is the key to commercial success, starting with applications in brand stores and theme parks. Internal testing is set to begin later this year. They also just announced a new humanoid model, the "Elf·Xuan," a new milestone for their "Elf Project," which began in 2024 to turn virtual inspiration into a physical bionic robot.

RoboHub🤖

66,450 просмотров • 10 месяцев назад

We are back. After one year of quiet building. Introducing GENE-26.5, our first robotic brain that takes a major step toward human-level capability. For years, robotics has struggled to learn from the world’s largest and valuable data source: Humans. Solving it means rethinking the whole stack from the ground up: - A robotics-native foundation model. - A 1:1 human-like robotic hand. - A noninvasive data collection glove for motion, force, and touch. - A simulator that turns weeks of experiments into minutes. GENE-26.5 is trained across language, vision, proprioception, tactile, and action. We designed a set of tasks to test how far we can go with this new paradigm. Fully autonomous, 1x speed, one model, same weights. (Enjoy with sound on) We are approaching the endgame for robotics. And this is just a beginning.

We are back. After one year of quiet building. Introducing GENE-26.5, our first robotic brain that takes a major step toward human-level capability. For years, robotics has struggled to learn from the world’s largest and valuable data source: Humans. Solving it means rethinking the whole stack from the ground up: - A robotics-native foundation model. - A 1:1 human-like robotic hand. - A noninvasive data collection glove for motion, force, and touch. - A simulator that turns weeks of experiments into minutes. GENE-26.5 is trained across language, vision, proprioception, tactile, and action. We designed a set of tasks to test how far we can go with this new paradigm. Fully autonomous, 1x speed, one model, same weights. (Enjoy with sound on) We are approaching the endgame for robotics. And this is just a beginning.

Genesis AI

2,716,210 просмотров • 2 месяцев назад

Robots are getting smarter, but most still fail the same way. They don’t learn from their own mistakes. A new paper proposes something different: a way for robots to self-improve directly from their failures in the real world. It’s called PLD (Probe, Learn, Distill). The idea: instead of collecting endless human demos, let the robot figure out where it fails, learn how to recover, and then distill that knowledge back into its main model. Key takeaways from the research: ✅ Uses residual reinforcement learning to recover from policy failures ✅ Achieves 99% success on LIBERO and 100% on real Franka and YAM arms ✅ Runs hour-long manipulation tasks without human resets ✅ Builds a feedback loop between real-world data and model adaptation Unlike supervised fine-tuning, which relies on humans, PLD learns from the robot’s own experience. By training on its own failure distribution, the model becomes both more efficient and more aligned with the real world. It’s not just a technical shift, it’s a step toward robots that improve themselves through real-world practice. Thanks for sharing, Wenli Xiao !! Paper and demos: —- Weekly robotics and AI insights. Subscribe free:

Robots are getting smarter, but most still fail the same way. They don’t learn from their own mistakes. A new paper proposes something different: a way for robots to self-improve directly from their failures in the real world. It’s called PLD (Probe, Learn, Distill). The idea: instead of collecting endless human demos, let the robot figure out where it fails, learn how to recover, and then distill that knowledge back into its main model. Key takeaways from the research: ✅ Uses residual reinforcement learning to recover from policy failures ✅ Achieves 99% success on LIBERO and 100% on real Franka and YAM arms ✅ Runs hour-long manipulation tasks without human resets ✅ Builds a feedback loop between real-world data and model adaptation Unlike supervised fine-tuning, which relies on humans, PLD learns from the robot’s own experience. By training on its own failure distribution, the model becomes both more efficient and more aligned with the real world. It’s not just a technical shift, it’s a step toward robots that improve themselves through real-world practice. Thanks for sharing, Wenli Xiao !! Paper and demos: —- Weekly robotics and AI insights. Subscribe free:

Ilir Aliu

22,812 просмотров • 8 месяцев назад

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:07 - Is robot locomotion solved? 06:04 - Sim-to-real gap 08:58 - Adding semantics to policies 09:42 - Modular vs end-to-end architectures 10:29 - Planner model 12:21 - Adapting RL techniques from quadrupeds to humanoids 15:39 - Behind robot demos 18:09 - Humanoid robots in home environments 22:03 - Training approach 23:56 - VLA models 27:59 - Closing the sim-to-real gap 32:55 - Task orchestration using VLMs 36:38 - Tool use 38:10 - Model hierarchy 43:37 - Simulator versus simulation environment 44:57 - Combining imitation learning and reinforcement learning 46:42 - RL in real world versus RL in simulation 52:58 - Reward tuning and value functions in robotics 56:38 - Predictions 1:00:10 - Humanoids, quadropeds, and wheeled platforms 1:02:45 - Advice, recommended robot kits, and community pla

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:07 - Is robot locomotion solved? 06:04 - Sim-to-real gap 08:58 - Adding semantics to policies 09:42 - Modular vs end-to-end architectures 10:29 - Planner model 12:21 - Adapting RL techniques from quadrupeds to humanoids 15:39 - Behind robot demos 18:09 - Humanoid robots in home environments 22:03 - Training approach 23:56 - VLA models 27:59 - Closing the sim-to-real gap 32:55 - Task orchestration using VLMs 36:38 - Tool use 38:10 - Model hierarchy 43:37 - Simulator versus simulation environment 44:57 - Combining imitation learning and reinforcement learning 46:42 - RL in real world versus RL in simulation 52:58 - Reward tuning and value functions in robotics 56:38 - Predictions 1:00:10 - Humanoids, quadropeds, and wheeled platforms 1:02:45 - Advice, recommended robot kits, and community pla

The TWIML AI Podcast

22,533 просмотров • 6 месяцев назад

Turning video into humanoid robot motion! 🤳🏼 Training humanoid robots needs huge amounts of motion data, but real-world capture doesn’t scale. Mocap is expensive, dangerous edge cases are rare, and you can’t ask humans to repeatedly fall or crash. Video2Robot tackles this by converting videos into physics-grounded humanoid simulations. Motion is generated to respect balance, inertia, ground contact, and joint limits, then directly retargeted to robot simulators. One prompt can generate a full humanoid motion sequence, including multi-agent interactions and failure cases like falls or collisions, scenarios that are hard or impossible to capture safely in the real world. The pipeline is model-agnostic and works with existing video generators, making it a practical way to scale data for robots. If robots are going to operate in the real world, they need to be trained on the failures too, not just the perfect demos. Here's the GitHub: ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Turning video into humanoid robot motion! 🤳🏼 Training humanoid robots needs huge amounts of motion data, but real-world capture doesn’t scale. Mocap is expensive, dangerous edge cases are rare, and you can’t ask humans to repeatedly fall or crash. Video2Robot tackles this by converting videos into physics-grounded humanoid simulations. Motion is generated to respect balance, inertia, ground contact, and joint limits, then directly retargeted to robot simulators. One prompt can generate a full humanoid motion sequence, including multi-agent interactions and failure cases like falls or collisions, scenarios that are hard or impossible to capture safely in the real world. The pipeline is model-agnostic and works with existing video generators, making it a practical way to scale data for robots. If robots are going to operate in the real world, they need to be trained on the failures too, not just the perfect demos. Here's the GitHub: ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Lukas Ziegler

59,708 просмотров • 7 месяцев назад

Not the flashiest demos, but what’s under the hood represents a foundational shift for general-purpose robotics. World models are the next-gen foundation of Physical AI, not the VLM backbones found in typical VLAs. DreamZero is a 14B-parameter World Action Model (WAM) by NVIDIA that treats robotics as a joint video-and-action prediction task. Unlike traditional Vision-Language-Action (VLA) models that map images directly to motor commands, DreamZero leverages a pretrained video diffusion backbone to predict future world states and actions simultaneously. - achieves 2× better zero-shot generalization to unseen tasks and environments compared to state-of-the-art VLAs. - learns effectively from heterogeneous, non-repetitive data (500 hours), breaking the need for thousands of repeated demonstrations. - adapts to new robot embodiments with just 30 minutes of play data. - enables 7Hz closed-loop control via system optimizations and "DreamZero-Flash," making high-capacity diffusion models viable for real-time use.

Not the flashiest demos, but what’s under the hood represents a foundational shift for general-purpose robotics. World models are the next-gen foundation of Physical AI, not the VLM backbones found in typical VLAs. DreamZero is a 14B-parameter World Action Model (WAM) by NVIDIA that treats robotics as a joint video-and-action prediction task. Unlike traditional Vision-Language-Action (VLA) models that map images directly to motor commands, DreamZero leverages a pretrained video diffusion backbone to predict future world states and actions simultaneously. - achieves 2× better zero-shot generalization to unseen tasks and environments compared to state-of-the-art VLAs. - learns effectively from heterogeneous, non-repetitive data (500 hours), breaking the need for thousands of repeated demonstrations. - adapts to new robot embodiments with just 30 minutes of play data. - enables 7Hz closed-loop control via system optimizations and "DreamZero-Flash," making high-capacity diffusion models viable for real-time use.

The Humanoid Hub

35,204 просмотров • 5 месяцев назад

VLA-JEPA just dropped in LeRobot 🤖 What makes this model special is that it does not just learn what action to take from a given observation, it also leverages a JEPA world model to learn action-relevant dynamics. During training, the VLA leverages V-JEPA2 by conditioning its predictor. This clever trick adds a world modeling objective to the training, which also allows pretraining on human videos. At inference, the world model is dropped entirely, keeping only a standard VLA architecture: Qwen backbone and action head. The demo here was only fine-tuned on 13 examples, showing great pretraining capability and running in real time on NVIDIA Robotics DGX Spark! VLA-JEPA is the first world model to be ported to LeRobot, and I feel like it won't be the last 🚀 Thomas Wolf clem 🤗

VLA-JEPA just dropped in LeRobot 🤖 What makes this model special is that it does not just learn what action to take from a given observation, it also leverages a JEPA world model to learn action-relevant dynamics. During training, the VLA leverages V-JEPA2 by conditioning its predictor. This clever trick adds a world modeling objective to the training, which also allows pretraining on human videos. At inference, the world model is dropped entirely, keeping only a standard VLA architecture: Qwen backbone and action head. The demo here was only fine-tuned on 13 examples, showing great pretraining capability and running in real time on NVIDIA Robotics DGX Spark! VLA-JEPA is the first world model to be ported to LeRobot, and I feel like it won't be the last 🚀 Thomas Wolf clem 🤗

LeRobot

318,321 просмотров • 1 месяц назад