Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

With the recent progress in large-scale multi-task robot training, how can we advance the real-world deployment of multi-task robot fleets? Introducing Sirius-Fleet✨, a multi-task interactive robot fleet learning framework with 𝗩𝗶𝘀𝘂𝗮𝗹 𝗪𝗼𝗿𝗹𝗱 𝗠𝗼𝗱𝗲𝗹𝘀! 🌍 #CoRL2024

Huihan Liu

3,990 subscribers

28,043 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie Bildung Nachrichten & Politik #CoRL2024

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Huihan Liu

Huihan Liuvor 1 Jahr

Our recipe: Stage 1: Pre-train a visual world model on diverse datasets to 𝙥𝙧𝙚𝙙𝙞𝙘𝙩 𝙛𝙪𝙩𝙪𝙧𝙚 𝙤𝙪𝙩𝙘𝙤𝙢𝙚𝙨 across many tasks. Stage 2: Deploy a multi-task policy on a robot fleet, with anomaly predictors monitoring the fleet deployment via the visual world model.

Profilbild von Huihan Liu

Huihan Liuvor 1 Jahr

The visual world model predicts future embeddings. It learns a latent space using image reconstruction and predicts future embeddings with cVAE. We train anomaly predictors on the frozen future embeddings, including failure and Out-of-Distribution (OOD) predictors.

Profilbild von Huihan Liu

Huihan Liuvor 1 Jahr

The anomaly predictors and the multi-task policy are fine-tuned over time. Over deployment, Sirius-Fleet improves on 1️⃣Combined Policy Performance for human-robot teaming, 2️⃣Autonomous Policy Performance for policy success, and 3️⃣Return of Human Effort for human use efficiency.

Profilbild von Huihan Liu

Huihan Liuvor 1 Jahr

Check out our paper and the project website for more info! 📃 🌐 A huge thank you to the team @YZ_Franklin, Vaarij Betala, Evan Zhang, James Liu, Crystal Ding, and @yukez ! @texas_robotics

Profilbild von Appy Pie

Appy Pievor 1 Jahr

Impressive! Visual World Models could truly revolutionize multi-task robot fleets.

Profilbild von عرفان Erfan

عرفان Erfanvor 1 Jahr

Cool

Profilbild von Ing Gan

Ing Ganvor 1 Jahr

Cool work. Is the world model trained on the observations in the same environment (i.e. trained for open door / turn on microwave separately)?

Profilbild von Huihan Liu

Huihan Liuvor 1 Jahr

Thanks! The world model is trained a wide range of tasks, not specific to one particular environment.

Profilbild von Heeger

Heegervor 1 Jahr

Cool work! When will you present this work?

Profilbild von Huihan Liu

Huihan Liuvor 1 Jahr

Thanks! It will be Thursday afternoon, Session 3 at CoRL :)

Ähnliche Videos

Tired of collecting demonstrations all day to train your robot? Introducing MimicGen, an autonomous data generation system for robotics. Using just 200 human demos we generated a large multi-task dataset of 50K demos! #CoRL2023 #NVIDIAResearch 👇 🧵 1/

Tired of collecting demonstrations all day to train your robot? Introducing MimicGen, an autonomous data generation system for robotics. Using just 200 human demos we generated a large multi-task dataset of 50K demos! #CoRL2023 #NVIDIAResearch 👇 🧵 1/

Ajay Mandlekar

93,632 Aufrufe • vor 2 Jahren

AGIBOT introduces AgiBot World, a large-scale open-source dataset for general-purpose robotic learning. ⦿ 1M+ trajectories from 100+ real-world scenarios ⦿ Scenarios include dexterous manipulation, tool use, and multi-robot collaboration

AGIBOT introduces AgiBot World, a large-scale open-source dataset for general-purpose robotic learning. ⦿ 1M+ trajectories from 100+ real-world scenarios ⦿ Scenarios include dexterous manipulation, tool use, and multi-robot collaboration

The Humanoid Hub

59,847 Aufrufe • vor 1 Jahr

Today, we’re introducing KinetIQ, our own AI framework for end-to-end orchestration of humanoid robot fleets. One system, multiple robot embodiments. Industrial, service and home environments coordinated in real time. The framework consists of 4 cognitive layers: from high-level task allocation and workflow optimisation down to VLA-based task execution and RL-trained whole-body control. Watch how KinetIQ runs both our wheeled and bipedal robots. Read more on our blog:

Today, we’re introducing KinetIQ, our own AI framework for end-to-end orchestration of humanoid robot fleets. One system, multiple robot embodiments. Industrial, service and home environments coordinated in real time. The framework consists of 4 cognitive layers: from high-level task allocation and workflow optimisation down to VLA-based task execution and RL-trained whole-body control. Watch how KinetIQ runs both our wheeled and bipedal robots. Read more on our blog:

Humanoid

23,887 Aufrufe • vor 5 Monaten

.Sureform (YC X25) connects real-world workplaces with robotics labs to collect task-specific training data for robot foundation models. Congrats on the launch, Ananth Kashyap!

.Sureform (YC X25) connects real-world workplaces with robotics labs to collect task-specific training data for robot foundation models. Congrats on the launch, Ananth Kashyap!

Y Combinator

35,876 Aufrufe • vor 5 Monaten

DOBOT robotics' humanoid robot is making a big leap in industrial applications, achieving cross-scenario, multi-task generalization. This is thanks to two major breakthroughs: efficient human-to-robot motion mapping and knowledge-driven generative VLA tech. ► The robot handles complex tasks like precision assembly and can work reliably in temperatures over 50°C (122°F). ► It excels at multi-robot collaborative tasks and can adaptively grasp soft or irregularly shaped objects. With a repeatability of ±0.05mm, this model demonstrates strong adaptability and reliability in dynamic industrial settings. It's now being applied in warehouse anomaly handling and quality inspection.

DOBOT robotics' humanoid robot is making a big leap in industrial applications, achieving cross-scenario, multi-task generalization. This is thanks to two major breakthroughs: efficient human-to-robot motion mapping and knowledge-driven generative VLA tech. ► The robot handles complex tasks like precision assembly and can work reliably in temperatures over 50°C (122°F). ► It excels at multi-robot collaborative tasks and can adaptively grasp soft or irregularly shaped objects. With a repeatability of ±0.05mm, this model demonstrates strong adaptability and reliability in dynamic industrial settings. It's now being applied in warehouse anomaly handling and quality inspection.

RoboHub🤖

91,953 Aufrufe • vor 10 Monaten

Introducing RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning. 7 real robot tasks, 900/900 successes. Up to 250 consecutive trials in one task, running 2 hours nonstop without failure. High success rate against physical disturbances, zero-shot, and few-shot adaptation Our first step toward a deployable robot learning system.

Introducing RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning. 7 real robot tasks, 900/900 successes. Up to 250 consecutive trials in one task, running 2 hours nonstop without failure. High success rate against physical disturbances, zero-shot, and few-shot adaptation Our first step toward a deployable robot learning system.

Kun Lei

90,992 Aufrufe • vor 9 Monaten

🤖What if a robot could perform a new task just from a natural language command, with zero demonstrations? Our new work, NovaFlow, makes it possible! We use pre-trained video generative model to create a video of the task, then translate it into a plan for real-world robot execution. 1/6 #Robotics #AI #ZeroShot #Manipulation

🤖What if a robot could perform a new task just from a natural language command, with zero demonstrations? Our new work, NovaFlow, makes it possible! We use pre-trained video generative model to create a video of the task, then translate it into a plan for real-world robot execution. 1/6 #Robotics #AI #ZeroShot #Manipulation

Hongyu Li

105,613 Aufrufe • vor 9 Monaten

Today, we publicly released RoboCasa365, a large-scale simulation benchmark for training and systematically evaluating generalist robot models. Built upon our original RoboCasa framework, it offers: • 2,500 realistic kitchen environments; • 365 everyday tasks (basic skills + long-horizon mobile manipulation); • Over 3,200 objects with many articulated fixtures/appliances. All are designed for fully controlled, reproducible benchmarking of robotic policies. Progress in robotic foundation models is real. But it’s still hard to answer basic questions like: How close are we to general-purpose autonomy? What factors drive generalization? What are the model/data scaling curves like? Real-world eval is slow and noisy, and existing sims (like LIBERO, which we built 3 years ago) often lack sufficient task and scene diversity. This benchmark comes with 2,200+ hours of demonstrations and 500K+ trajectories to support studies of multi-task training, pretraining, and continual learning at scale. Check it out at

Today, we publicly released RoboCasa365, a large-scale simulation benchmark for training and systematically evaluating generalist robot models. Built upon our original RoboCasa framework, it offers: • 2,500 realistic kitchen environments; • 365 everyday tasks (basic skills + long-horizon mobile manipulation); • Over 3,200 objects with many articulated fixtures/appliances. All are designed for fully controlled, reproducible benchmarking of robotic policies. Progress in robotic foundation models is real. But it’s still hard to answer basic questions like: How close are we to general-purpose autonomy? What factors drive generalization? What are the model/data scaling curves like? Real-world eval is slow and noisy, and existing sims (like LIBERO, which we built 3 years ago) often lack sufficient task and scene diversity. This benchmark comes with 2,200+ hours of demonstrations and 500K+ trajectories to support studies of multi-task training, pretraining, and continual learning at scale. Check it out at

Yuke Zhu

23,977 Aufrufe • vor 4 Monaten

Can robots self-improve by collecting data autonomously🤖? Introducing SOAR: a system for large-scale autonomous data collection 🚀 and autonomous improvement📈of a multi-task language-conditioned policy in diverse scenes without human interventions .

Can robots self-improve by collecting data autonomously🤖? Introducing SOAR: a system for large-scale autonomous data collection 🚀 and autonomous improvement📈of a multi-task language-conditioned policy in diverse scenes without human interventions .

Paul Zhou

47,667 Aufrufe • vor 2 Jahren

The problem with humanoid teleoperation is that it is expensive and difficult to scale Enter NVIDIA's EgoScale: - A VLA model pretrained on thousands hours of egocentric human videos. - Mid-trained via 50 hours of human + 4 hours of robot "play" data for human-robot alignment. - Fine-tuned with very few examples of task-specific robot teleoperation (100 or fewer per task). - Successfully transfers across 5-finger (Sharpa) and 3-finger (Unitree G1) robot hands. - Performance scales predictably as data increases.

The problem with humanoid teleoperation is that it is expensive and difficult to scale Enter NVIDIA's EgoScale: - A VLA model pretrained on thousands hours of egocentric human videos. - Mid-trained via 50 hours of human + 4 hours of robot "play" data for human-robot alignment. - Fine-tuned with very few examples of task-specific robot teleoperation (100 or fewer per task). - Successfully transfers across 5-finger (Sharpa) and 3-finger (Unitree G1) robot hands. - Performance scales predictably as data increases.

The Humanoid Hub

44,441 Aufrufe • vor 5 Monaten

🤖 How can we scale up humanoid robot learning? Introducing 🌟VLK🌟: generating large-scale synthetic data with paired egocentric observations, text, and full-body G1 kinematics for learning humanoid loco-manipulation. No teleoperation needed! Website:

🤖 How can we scale up humanoid robot learning? Introducing 🌟VLK🌟: generating large-scale synthetic data with paired egocentric observations, text, and full-body G1 kinematics for learning humanoid loco-manipulation. No teleoperation needed! Website:

Jiaman Li

967,846 Aufrufe • vor 24 Tagen

In my first week at Generalist, I trained a robot to pour liquids using GEN-1 🤖💧 I wanted to challenge the robot with a non-rigid manipulation task, so liquid felt like the perfect choice. The task involved: - unscrewing the bottle cap - pouring liquid into espresso glasses - rebalancing uneven pours Best of all, the robot was able to complete the task fully autonomously 3 times in a row (out of 3)! Pour-fect 😉 Excited for the journey ahead and grateful to be building alongside such an incredible team!

In my first week at Generalist, I trained a robot to pour liquids using GEN-1 🤖💧 I wanted to challenge the robot with a non-rigid manipulation task, so liquid felt like the perfect choice. The task involved: - unscrewing the bottle cap - pouring liquid into espresso glasses - rebalancing uneven pours Best of all, the robot was able to complete the task fully autonomously 3 times in a row (out of 3)! Pour-fect 😉 Excited for the journey ahead and grateful to be building alongside such an incredible team!

Kylie Ying

37,327 Aufrufe • vor 1 Monat

Modern AI is confined to the digital world. At Skild AI, we are building towards AGI for the real world, unconstrained by robot type or task — a single, omni-bodied brain. Today, we are sharing our journey, starting with early milestones, with more to come in the weeks ahead. Our Mission: Artificial General Intelligence grounded in the physical world. We believe AGI that can truly understand and reason in the real world can only be built through grounding in the physical world. Our Vision: Any robot, Any task, One brain. We tackle robotics in its full generality – building a continually improving, omni-bodied brain that can control any hardware for any task. Who are we? A passionate group of scientists & engineers driven by our shared vision. We have been researching AI and robotics for more than a decade. Our team includes pioneers of self-supervised learning, curiosity-driven exploration, end-to-end sim2real for visual locomotion, dexterous manipulation, learning from human videos, robot parkour, and many more. Many of these works have won awards at top-tier AI and Robotics conferences. Our team has also built production-ready systems at Anduril, Tesla, Nvidia, Meta, Kitty Hawk, Google, Everyday Robotics, and Amazon. Join us in our mission to build the robot brains of tomorrow.

Modern AI is confined to the digital world. At Skild AI, we are building towards AGI for the real world, unconstrained by robot type or task — a single, omni-bodied brain. Today, we are sharing our journey, starting with early milestones, with more to come in the weeks ahead. Our Mission: Artificial General Intelligence grounded in the physical world. We believe AGI that can truly understand and reason in the real world can only be built through grounding in the physical world. Our Vision: Any robot, Any task, One brain. We tackle robotics in its full generality – building a continually improving, omni-bodied brain that can control any hardware for any task. Who are we? A passionate group of scientists & engineers driven by our shared vision. We have been researching AI and robotics for more than a decade. Our team includes pioneers of self-supervised learning, curiosity-driven exploration, end-to-end sim2real for visual locomotion, dexterous manipulation, learning from human videos, robot parkour, and many more. Many of these works have won awards at top-tier AI and Robotics conferences. Our team has also built production-ready systems at Anduril, Tesla, Nvidia, Meta, Kitty Hawk, Google, Everyday Robotics, and Amazon. Join us in our mission to build the robot brains of tomorrow.

Skild AI

382,738 Aufrufe • vor 1 Jahr

Robot-free demos can be collected through our VR interface, inspected, trained, and evaluated in a closed loop. Cool thing is that a small amount of real-robot data mixed with large-scale robot-free data can reach comparable performance, while reducing real-robot data needs by up to 20x. Code: Paper:

Robot-free demos can be collected through our VR interface, inspected, trained, and evaluated in a closed loop. Cool thing is that a small amount of real-robot data mixed with large-scale robot-free data can reach comparable performance, while reducing real-robot data needs by up to 20x. Code: Paper:

X Square Robot

175,588 Aufrufe • vor 1 Monat

These are some of the UI features in Grok. First, it allows you to multi-task. You can run several concurrent conversations and switch between them as they progress.

These are some of the UI features in Grok. First, it allows you to multi-task. You can run several concurrent conversations and switch between them as they progress.

Toby Pohlen

1,223,421 Aufrufe • vor 2 Jahren

🎤🎤 Excited to introduce COME-robot🤖🤖, Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V. It is the first closed-loop framework utilizing the vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios. COME-robot demonstrates a significant improvement in task success rate (~25%) compared to SOTA methods. Project: Arxiv:

🎤🎤 Excited to introduce COME-robot🤖🤖, Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V. It is the first closed-loop framework utilizing the vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios. COME-robot demonstrates a significant improvement in task success rate (~25%) compared to SOTA methods. Project: Arxiv:

Siyuan Huang

22,291 Aufrufe • vor 2 Jahren

YouTube is a LARGE dataset of demonstration videos to train Generalist robot agents, but lacks action data. How can we learn DEXTEROUS skills from them? In #CoRL2024, we explore the problem of learning a Generalist Piano Playing agent from YouTube videos.

YouTube is a LARGE dataset of demonstration videos to train Generalist robot agents, but lacks action data. How can we learn DEXTEROUS skills from them? In #CoRL2024, we explore the problem of learning a Generalist Piano Playing agent from YouTube videos.

Julen Urain

42,451 Aufrufe • vor 1 Jahr

This is a demonstration video of in-hand object rotation with new Torobo Hand. By leveraging large-scale parallel reinforcement learning in NVIDIA Isaac Sim, the learned policy can perform object manipulation in MuJoCo and on the real robot without any additional training.

This is a demonstration video of in-hand object rotation with new Torobo Hand. By leveraging large-scale parallel reinforcement learning in NVIDIA Isaac Sim, the learned policy can perform object manipulation in MuJoCo and on the real robot without any additional training.

東京ロボティクス株式会社

10,860 Aufrufe • vor 1 Jahr