正在加载视频...
视频加载失败
3 mo. ago we released the Open X-Embodiment dataset, today we’re doing the next step: Introducing Octo 🐙, a generalist robot policy, trained on 800k robot trajectories, stronger than RT-1X, flexible observation + action spaces, fully open source! 💻: /🧵
13 条评论

Out of the box, Octo can control multiple robots, use 3rd person + wrist cameras, language instructions & goal images. Key feature: Octo can be quickly finetuned to use new observation & action spaces! In <5 hours on a 24 GB VRAM GPU! 2/

If we want to build truly “foundational” models for robotics we need to support the diversity of real robot setups! Despite the added flexibility, we find Octo's performance to be strong compared to RT-1X and even RT-2X + great during finetuning! 3/

Octo is built to scale: it’s a big transformer with small encoders at the input and a small action head at the output. We use diffusion action decoding for max expressiveness 4/

We’re fully open-sourcing model checkpoints, our pre-training and finetuning pipelines! Initially, Octo comes in two sizes: Octo-Small (27M params) and Octo-Base (93M params). All models are on HuggingFace, so loading an Octo model is as easy as this: 5/

We’re releasing a tech report with lots of details on what worked and, importantly, what didn’t -- go check it out! 📜: 6/

Last but not least: Octo is your one-stop-shop for training on OpenX data! We’re releasing high-quality data loaders that work with PyTorch and JAX + a curated dataset split! 7/

Octo is only the first step towards building generalist robot policies and we’re planning to improve the models over time — larger sizes, more robot morphologies, RL etc etc — really excited to see how folks will use Octo! :) 8/

This was a big team effort w/ collaborator from UC Berkeley, Stanford & CMU! I'm very grateful to all collaborators!! :) @its_dibya @HomerWalke @kvablack @oier_mees @SudeepDasari @JoeyHejna Tobias Kreiman, Charles Xu @jianlanluo You Liang Tan @DorsaSadigh @chelseabfinn @svlevine

Adding the Twitter threads from all my amazing co-leads on the project! Truly inspiring to have so many people work so hard on a common goal! <3

led base model development & training, and implemented many of the features that make the Octo code easy to use!

led model evaluation, designed our internal eval bench for iterating on the model & ran many of the evals in the tech report.

led data & training infrastructure -- that sweet Octo OpenX data loader is in large parts Kevin's baby -- loading 25 video datasets concurrently at high speed is no easy feat! Kevin also had large contributions in making Octo easier to use!

ran many model ablations & evals for the tech report, integrated pre-trained language encoders & last but not least, kept the spirits high during long nights "in the arena" ♥️
