正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Having humans annotate data to pre-train robots is expensive and time-consuming! Introducing SPRINT: A pre-training approach using LLMs and offline RL to equip robots w/ many language-annotated skills while minimizing human annotation effort! URL: 🧵👇

Jesse Zhang

1,673 subscribers

24,662 次观看 • 3 年前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

7 条评论

Jesse Zhang 的头像

Jesse Zhang3 年前

Labeling demonstrations with natural language instructions in hindsight is standard, but it is tedious and expensive to scale. We propose automatically (1) **relabeling** language instructions and (2) **chaining** trajectories together to generate more training data.

Jesse Zhang 的头像

Jesse Zhang3 年前

(1) Relabeling: If we have two skills, "Put mug in coffee machine" and "Press brew button," we could call this "Make Coffee." In SPRINT, we do this relabeling **automatically** by prompting an LLM to summarize nearby instructions. This gives us 2-2.5X more pre-training data!

Jesse Zhang 的头像

Jesse Zhang3 年前

(2) Chaining: Offline RL can "stitch" trajectories to learn new behaviors. We carefully relabel rewards with offline RL and modified language instructions to allow stitching even with language conditioning!

Jesse Zhang 的头像

Jesse Zhang3 年前

Results Overall, this allows us to achieve 2-8X better zero-shot long horizon task execution in ALFRED, a realistic household simulator, and on a real robot setup! SPRINT agents also fine-tune more efficiently to new tasks in unseen environments! ALFRED results:

Jesse Zhang 的头像

Jesse Zhang3 年前

Real Robot Results With offline fine-tuning, SPRINT achieves superior performance on new, long-horizon manipulation tasks in previously unseen environments!

Jesse Zhang 的头像

Jesse Zhang3 年前

For more details about SPRINT and experiment results, please see our paper or website. Paper: Website: Work done in collaboration with @KarlPertsch, @JiahuiZhang_32, @JosephLim_AI. @JiahuiZhang_32 is applying for PhD this year!

OliviaLi 的头像

OliviaLi2 年前

It sounds great, so that humans don't have to complete such a large amount of work every day, just let the robot do it

相关视频

How to teach robots to perform long-horizon tasks efficiently and robustly🦾? Introducing MimicPlay - an imitation learning algorithm that uses "cheap human play data". Our approach unlocks both real-time planning through raw perception and strong robustness to disturbances!🧵👇

How to teach robots to perform long-horizon tasks efficiently and robustly🦾? Introducing MimicPlay - an imitation learning algorithm that uses "cheap human play data". Our approach unlocks both real-time planning through raw perception and strong robustness to disturbances!🧵👇

Chen Wang

288,500 次观看 • 3 年前

Wouldn't it be great if we could train robots without any teleoperation! In our latest paper, we train robots to mimic a human video of the task by simply matching the object features using RL. We only need one video and under an hour of robot training.

Wouldn't it be great if we could train robots without any teleoperation! In our latest paper, we train robots to mimic a human video of the task by simply matching the object features using RL. We only need one video and under an hour of robot training.

Lerrel Pinto

46,197 次观看 • 1 年前

HDMI (HumanoiD iMitation for Interaction) is a framework enabling humanoid robots to learn whole-body object interaction skills from monocular RGB human videos. It extracts and retargets human poses and object trajectories using GVHMR and LocoMujoco, building reference datasets with contact annotations. The data is used to train an RL policy via robot-object co-tracking. HDMI achieved 67 consecutive door traversals.

HDMI (HumanoiD iMitation for Interaction) is a framework enabling humanoid robots to learn whole-body object interaction skills from monocular RGB human videos. It extracts and retargets human poses and object trajectories using GVHMR and LocoMujoco, building reference datasets with contact annotations. The data is used to train an RL policy via robot-object co-tracking. HDMI achieved 67 consecutive door traversals.

The Humanoid Hub

17,395 次观看 • 8 个月前

Amazon is training humanoids to move boxes. Makes sense! OmniRetarget is a data generation engine that enables complex loco-manipulation for humanoids. It uses offline retargeting from human MoCap datasets and augments data from single demos to produce 8 hours of trajectories that train RL policies in simulation, transferred zero-shot to real robots. Amazon FAR is the primary affiliation for the paper:

Amazon is training humanoids to move boxes. Makes sense! OmniRetarget is a data generation engine that enables complex loco-manipulation for humanoids. It uses offline retargeting from human MoCap datasets and augments data from single demos to produce 8 hours of trajectories that train RL policies in simulation, transferred zero-shot to real robots. Amazon FAR is the primary affiliation for the paper:

The Humanoid Hub

106,447 次观看 • 8 个月前

One of the biggest bottlenecks in deploying visual AI and computer vision is annotation, which can be both costly and time-consuming. Today, we’re introducing Verified Auto Labeling, a new approach to AI-assisted annotation that achieves up to 95% of human-level performance while cutting labeling costs by up to 100,000x and time by 5,000x. Read the full paper:

One of the biggest bottlenecks in deploying visual AI and computer vision is annotation, which can be both costly and time-consuming. Today, we’re introducing Verified Auto Labeling, a new approach to AI-assisted annotation that achieves up to 95% of human-level performance while cutting labeling costs by up to 100,000x and time by 5,000x. Read the full paper:

Voxel51

12,621 次观看 • 1 年前

Introducing GRID: the General Robot Intelligence Development platform, designed for prototyping smart and safe robots rapidly using foundation models, LLMs, and simulation. Paper: Try now: GitHub: 🧵👇(1/N)

Introducing GRID: the General Robot Intelligence Development platform, designed for prototyping smart and safe robots rapidly using foundation models, LLMs, and simulation. Paper: Try now: GitHub: 🧵👇(1/N)

Sai Vemprala

277,281 次观看 • 2 年前

Our course recommendation of the day is “Post-training of LLMs, ” where you’ll learn how to customize pre-trained language models using Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL). You'll learn when to use each method, how to curate training data, and implement them in code to shape model behavior effectively. Enroll here:

Our course recommendation of the day is “Post-training of LLMs, ” where you’ll learn how to customize pre-trained language models using Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL). You'll learn when to use each method, how to curate training data, and implement them in code to shape model behavior effectively. Enroll here:

DeepLearning.AI

29,369 次观看 • 8 个月前

A reward model that works, zero-shot, across robots, tasks, and scenes? Introducing Robometer: Scaling general-purpose robotic reward models with 1M+ trajectories. Enables zero-shot: online/offline/model-based RL, data retrieval + IL, automatic failure detection, and more! 🧵 (1/12)

A reward model that works, zero-shot, across robots, tasks, and scenes? Introducing Robometer: Scaling general-purpose robotic reward models with 1M+ trajectories. Enables zero-shot: online/offline/model-based RL, data retrieval + IL, automatic failure detection, and more! 🧵 (1/12)

Jesse Zhang

99,840 次观看 • 3 个月前

The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us? Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.

The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us? Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.

Siddhant Haldar

69,031 次观看 • 1 年前

Imagine robots learning new skills—without any robot data. Today, we're excited to release EgoZero: our first steps in training robot policies that operate in unseen environments, solely from data collected through humans wearing Aria smart glasses. 🧵👇

Imagine robots learning new skills—without any robot data. Today, we're excited to release EgoZero: our first steps in training robot policies that operate in unseen environments, solely from data collected through humans wearing Aria smart glasses. 🧵👇

Lerrel Pinto

42,538 次观看 • 1 年前

Cooking in kitchens is fun. BUT doing it collaboratively with two robots is even more satisfying! We introduce MOSAIC, a modular framework that coordinates multiple robots to closely collaborate and cook with humans via natural language interaction and a repository of skills.

Cooking in kitchens is fun. BUT doing it collaboratively with two robots is even more satisfying! We introduce MOSAIC, a modular framework that coordinates multiple robots to closely collaborate and cook with humans via natural language interaction and a repository of skills.

Sanjiban Choudhury

26,373 次观看 • 2 年前

Brain-controlled exoskeletons to train humanoid robots! 🧠 Fourier just presented human tele-operators using brain control interfaces and exoskeletal arms to train humanoid robots on home tasks. The brain control interface is the interesting part. Instead of using a controller or joystick to teleoperate, the operator's movements and intentions are captured more naturally through the exoskeleton and BCI. This means the demonstrations are more fluid, more human-like, and better suited for training robots to perform delicate home tasks. Multiple tele-operators are simultaneously generating training data across multiple robots. This is how you build the dataset needed for eventual full autonomy, without waiting years for it to arrive. This might be the bridge between "robots that work in controlled environments" and "robots that work in homes." Not full autonomy right away, but trusted human intelligence operating through a robot body, getting better with every task completed. ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Brain-controlled exoskeletons to train humanoid robots! 🧠 Fourier just presented human tele-operators using brain control interfaces and exoskeletal arms to train humanoid robots on home tasks. The brain control interface is the interesting part. Instead of using a controller or joystick to teleoperate, the operator's movements and intentions are captured more naturally through the exoskeleton and BCI. This means the demonstrations are more fluid, more human-like, and better suited for training robots to perform delicate home tasks. Multiple tele-operators are simultaneously generating training data across multiple robots. This is how you build the dataset needed for eventual full autonomy, without waiting years for it to arrive. This might be the bridge between "robots that work in controlled environments" and "robots that work in homes." Not full autonomy right away, but trusted human intelligence operating through a robot body, getting better with every task completed. ~~ ♻️ Join the weekly robotics newsletter, and never miss any news →

Lukas Ziegler

20,874 次观看 • 4 个月前

Train the AI behind real-world robots We’re hiring Data Labelers to annotate images and videos to train Optimus & autonomous systems. You’ll work with real production data & directly impact how these systems learn and operate in the real world Come join!

Train the AI behind real-world robots We’re hiring Data Labelers to annotate images and videos to train Optimus & autonomous systems. You’ll work with real production data & directly impact how these systems learn and operate in the real world Come join!

Tesla Recruiting

625,209 次观看 • 1 个月前

I need to annotate some images for training a computer vision model. There are many powerful annotation platforms available, but I want to keep my images local. I added a new section to my CV Streamlit app to quickly annotate images and train a YOLO model in a few clicks.

I need to annotate some images for training a computer vision model. There are many powerful annotation platforms available, but I want to keep my images local. I added a new section to my CV Streamlit app to quickly annotate images and train a YOLO model in a few clicks.

Marco Franzon

29,530 次观看 • 5 个月前

We discovered an emergent property of VLAs like π0/π0.5/π0.6: as we scale up pre-training, the model learns to align human videos and robot data! This gives us a simple way to leverage human videos. Once π0.5 knows how to control robots, it can naturally learn from human video.

We discovered an emergent property of VLAs like π0/π0.5/π0.6: as we scale up pre-training, the model learns to align human videos and robot data! This gives us a simple way to leverage human videos. Once π0.5 knows how to control robots, it can naturally learn from human video.

Physical Intelligence

1,181,601 次观看 • 6 个月前

Introducing LDA, a latent world action foundation model that, for the first time, unifies the utilization of heterogeneous embodied data across simulation and reality, humans and robots, and varying levels of action quality and annotation. By breaking long-standing data silos in embodied intelligence, LDA enables the field, much like GPT did for language, to benefit continuously from scaling data, marking the transition into a new era of scalable learning. #Galbot #Robotics #Innovation #AI #Technology #Humanoid #WorldModel

Introducing LDA, a latent world action foundation model that, for the first time, unifies the utilization of heterogeneous embodied data across simulation and reality, humans and robots, and varying levels of action quality and annotation. By breaking long-standing data silos in embodied intelligence, LDA enables the field, much like GPT did for language, to benefit continuously from scaling data, marking the transition into a new era of scalable learning. #Galbot #Robotics #Innovation #AI #Technology #Humanoid #WorldModel

Galbot

37,804 次观看 • 1 个月前

🤖WE HUMANS ARE TRAINING OUR AI REPLACEMENTS Humanoid robots are learning by watching humans work. Workers wear devices that capture egocentric data, showing robots exactly how tasks look from a human perspective. We are not just building robots. We are handing them our job's playbook.

🤖WE HUMANS ARE TRAINING OUR AI REPLACEMENTS Humanoid robots are learning by watching humans work. Workers wear devices that capture egocentric data, showing robots exactly how tasks look from a human perspective. We are not just building robots. We are handing them our job's playbook.

Coin Bureau

20,144 次观看 • 23 天前

Let’s think about humanoid robots outside carrying the box. How about having the humanoid come out the door, interact with humans, and even dance? Introducing Expressive Whole-Body Control for Humanoid Robots: See how our robot performs rich, diverse, and expressive motions in the real world 👇🧵

Let’s think about humanoid robots outside carrying the box. How about having the humanoid come out the door, interact with humans, and even dance? Introducing Expressive Whole-Body Control for Humanoid Robots: See how our robot performs rich, diverse, and expressive motions in the real world 👇🧵

Xiaolong Wang

309,009 次观看 • 2 年前

Goal-conditioned RL (GCRL) is great - unsupervised, can use data (in offline mode), flexibility to define tasks at test time. But can we run GCRL on *language data*?? In our new work we show that language GCRL enables sophisticated test-time reasoning for interactive tasks! 🧵👇

Goal-conditioned RL (GCRL) is great - unsupervised, can use data (in offline mode), flexibility to define tasks at test time. But can we run GCRL on language data?? In our new work we show that language GCRL enables sophisticated test-time reasoning for interactive tasks! 🧵👇

Sergey Levine

18,782 次观看 • 1 年前