正在加载视频...
视频加载失败
Having humans annotate data to pre-train robots is expensive and time-consuming! Introducing SPRINT: A pre-training approach using LLMs and offline RL to equip robots w/ many language-annotated skills while minimizing human annotation effort! URL: 🧵👇
24,662 次观看 • 3 年前 •via X (Twitter)
7 条评论

Labeling demonstrations with natural language instructions in hindsight is standard, but it is tedious and expensive to scale. We propose automatically (1) **relabeling** language instructions and (2) **chaining** trajectories together to generate more training data.

(1) Relabeling: If we have two skills, "Put mug in coffee machine" and "Press brew button," we could call this "Make Coffee." In SPRINT, we do this relabeling **automatically** by prompting an LLM to summarize nearby instructions. This gives us 2-2.5X more pre-training data!

(2) Chaining: Offline RL can "stitch" trajectories to learn new behaviors. We carefully relabel rewards with offline RL and modified language instructions to allow stitching even with language conditioning!

Results Overall, this allows us to achieve 2-8X better zero-shot long horizon task execution in ALFRED, a realistic household simulator, and on a real robot setup! SPRINT agents also fine-tune more efficiently to new tasks in unseen environments! ALFRED results:

Real Robot Results With offline fine-tuning, SPRINT achieves superior performance on new, long-horizon manipulation tasks in previously unseen environments!

For more details about SPRINT and experiment results, please see our paper or website. Paper: Website: Work done in collaboration with @KarlPertsch, @JiahuiZhang_32, @JosephLim_AI. @JiahuiZhang_32 is applying for PhD this year!

It sounds great, so that humans don't have to complete such a large amount of work every day, just let the robot do it
