正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

3 mo. ago we released the Open X-Embodiment dataset, today we’re doing the next step: Introducing Octo 🐙, a generalist robot policy, trained on 800k robot trajectories, stronger than RT-1X, flexible observation + action spaces, fully open source! 💻: /🧵

Karl Pertsch

3,801 subscribers

126,658 次观看 • 2 年前 •via X (Twitter)

教育科学技术新闻政治

Anya Rossi• Live Now

Private livecam show

13 条评论

Karl Pertsch 的头像

Karl Pertsch2 年前

Out of the box, Octo can control multiple robots, use 3rd person + wrist cameras, language instructions & goal images. Key feature: Octo can be quickly finetuned to use new observation & action spaces! In <5 hours on a 24 GB VRAM GPU! 2/

Karl Pertsch 的头像

Karl Pertsch2 年前

If we want to build truly “foundational” models for robotics we need to support the diversity of real robot setups! Despite the added flexibility, we find Octo's performance to be strong compared to RT-1X and even RT-2X + great during finetuning! 3/

Karl Pertsch 的头像

Karl Pertsch2 年前

Octo is built to scale: it’s a big transformer with small encoders at the input and a small action head at the output. We use diffusion action decoding for max expressiveness 4/

Karl Pertsch 的头像

Karl Pertsch2 年前

We’re fully open-sourcing model checkpoints, our pre-training and finetuning pipelines! Initially, Octo comes in two sizes: Octo-Small (27M params) and Octo-Base (93M params). All models are on HuggingFace, so loading an Octo model is as easy as this: 5/

Karl Pertsch 的头像

Karl Pertsch2 年前

We’re releasing a tech report with lots of details on what worked and, importantly, what didn’t -- go check it out! 📜: 6/

Karl Pertsch 的头像

Karl Pertsch2 年前

Last but not least: Octo is your one-stop-shop for training on OpenX data! We’re releasing high-quality data loaders that work with PyTorch and JAX + a curated dataset split! 7/

Karl Pertsch 的头像

Karl Pertsch2 年前

Octo is only the first step towards building generalist robot policies and we’re planning to improve the models over time — larger sizes, more robot morphologies, RL etc etc — really excited to see how folks will use Octo! :) 8/

Karl Pertsch 的头像

Karl Pertsch2 年前

This was a big team effort w/ collaborator from UC Berkeley, Stanford & CMU! I'm very grateful to all collaborators!! :) @its_dibya @HomerWalke @kvablack @oier_mees @SudeepDasari @JoeyHejna Tobias Kreiman, Charles Xu @jianlanluo You Liang Tan @DorsaSadigh @chelseabfinn @svlevine

Karl Pertsch 的头像

Karl Pertsch2 年前

Adding the Twitter threads from all my amazing co-leads on the project! Truly inspiring to have so many people work so hard on a common goal! <3

Karl Pertsch 的头像

Karl Pertsch2 年前

led base model development & training, and implemented many of the features that make the Octo code easy to use!

Karl Pertsch 的头像

Karl Pertsch2 年前

led model evaluation, designed our internal eval bench for iterating on the model & ran many of the evals in the tech report.

Karl Pertsch 的头像

Karl Pertsch2 年前

led data & training infrastructure -- that sweet Octo OpenX data loader is in large parts Kevin's baby -- loading 25 video datasets concurrently at high speed is no easy feat! Kevin also had large contributions in making Octo easier to use!

Karl Pertsch 的头像

Karl Pertsch2 年前

ran many model ablations & evals for the tech report, integrated pre-trained language encoders & last but not least, kept the spirits high during long nights "in the arena" ♥️

相关视频

✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐 - SOTA generalist policy - 7B params - outperforms Octo, RT-2-X on zero-shot evals 🦾 - trained on 970k episodes from OpenX dataset 🤖 - fully open: model/code/data all online 🤗 🧵👇

✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐 - SOTA generalist policy - 7B params - outperforms Octo, RT-2-X on zero-shot evals 🦾 - trained on 970k episodes from OpenX dataset 🤖 - fully open: model/code/data all online 🤗 🧵👇

Moo Jin Kim

226,991 次观看 • 2 年前

Very excited to release the Open X-Embodiment Dataset today — the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step! There’s lots to unpack here, so let’s do a deep dive into the dataset! 🧵1/15

Very excited to release the Open X-Embodiment Dataset today — the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step! There’s lots to unpack here, so let’s do a deep dive into the dataset! 🧵1/15

Karl Pertsch

111,054 次观看 • 2 年前

Introducing the Open Deep Research app! Generate detailed reports on any topic with open source LLMs. Free & fully open source. We’re releasing everything: evaluation dataset, code, app, and blog.🔥

Introducing the Open Deep Research app! Generate detailed reports on any topic with open source LLMs. Free & fully open source. We’re releasing everything: evaluation dataset, code, app, and blog.🔥

Together AI

28,338 次观看 • 1 年前

Today, we present a step-change in robotic AI Sunday. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. - Ultra long-horizon tasks - Zero-shot generalization - Advanced dexterity 🧵->

Today, we present a step-change in robotic AI Sunday. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. - Ultra long-horizon tasks - Zero-shot generalization - Advanced dexterity 🧵->

Tony Zhao

2,045,547 次观看 • 7 个月前

🔥 #ICRA2026 Best Paper Finalist The era of "robot VLA = single-arm gripper" is ending. Introducing Dexora — the first open-source Vision-Language-Action system for dual-arm, dual-hand, 36-DoF dexterous manipulation. 🦾 Dual Arms 🖐️ Dual Hands 🎯 36 DoF Control 🌍 Open Source Trained on: • 100K simulated trajectories • 10K real-world demonstrations Dexora achieves: ✓ 90%+ success on basic manipulation ✓ Strong dexterous manipulation performance ✓ Cross-embodiment generalization Our key hypothesis: Train on the hardest embodiment. Transfer to simpler robots later. Instead of scaling up gripper policies, we train directly in the most expressive action space and project downward to simpler embodiments. This may be a practical path toward universal robot controllers. 🎥 Demos: 📄 Paper:

🔥 #ICRA2026 Best Paper Finalist The era of "robot VLA = single-arm gripper" is ending. Introducing Dexora — the first open-source Vision-Language-Action system for dual-arm, dual-hand, 36-DoF dexterous manipulation. 🦾 Dual Arms 🖐️ Dual Hands 🎯 36 DoF Control 🌍 Open Source Trained on: • 100K simulated trajectories • 10K real-world demonstrations Dexora achieves: ✓ 90%+ success on basic manipulation ✓ Strong dexterous manipulation performance ✓ Cross-embodiment generalization Our key hypothesis: Train on the hardest embodiment. Transfer to simpler robots later. Instead of scaling up gripper policies, we train directly in the most expressive action space and project downward to simpler embodiments. This may be a practical path toward universal robot controllers. 🎥 Demos: 📄 Paper:

Hao Zhao

16,598 次观看 • 27 天前

Introducing a new, fully open robotics dataset! - 76k episodes - 564 unique scenes - 100 contributors - 13 labs/institutions - 3 continents A short 🧵 on the backstory

Introducing a new, fully open robotics dataset! - 76k episodes - 564 unique scenes - 100 contributors - 13 labs/institutions - 3 continents A short 🧵 on the backstory

Chelsea Finn

98,616 次观看 • 2 年前

We’re organizing the RoboArena Challenge at CoRL this year! Show the performance of your best generalist policy, in a fair, open benchmark for the robotics community! 🤖 Sign up, even if you don’t have a robot! More details in 🧵👇

We’re organizing the RoboArena Challenge at CoRL this year! Show the performance of your best generalist policy, in a fair, open benchmark for the robotics community! 🤖 Sign up, even if you don’t have a robot! More details in 🧵👇

Karl Pertsch

26,775 次观看 • 11 个月前

🆕 Introducing JAT, the first open-source multi-modal, multi-task multi-domain agent! 🤖 A step toward open generalist agents! 🚀 📰 Blog:

🆕 Introducing JAT, the first open-source multi-modal, multi-task multi-domain agent! 🤖 A step toward open generalist agents! 🚀 📰 Blog:

Quentin Gallouédec

73,212 次观看 • 2 年前

the conference is getting crazy over it today we're unveiling our 1st robot Hugging Face 🤝 Pollen Robotics a low-cost $250 open-source robot designed as an open-source platform for fun human computer interactions powered by HF Spaces-models-community >

the conference is getting crazy over it today we're unveiling our 1st robot Hugging Face 🤝 Pollen Robotics a low-cost $250 open-source robot designed as an open-source platform for fun human computer interactions powered by HF Spaces-models-community >

Thomas Wolf

104,849 次观看 • 1 年前

Most capable generalist robotics models today are closed or at best, open weights. But robotics won’t reach its ChatGPT moment without real openness. That GPT moment was built on years of open tools and datasets such as Python, PyTorch, ImageNet and more, that let researchers inspect, reproduce, and build. Today, we’re introducing MolmoAct 2: a fully open-source action reasoning model for real-world robotics. We rethought and reshaped everything! 🧵👇

Most capable generalist robotics models today are closed or at best, open weights. But robotics won’t reach its ChatGPT moment without real openness. That GPT moment was built on years of open tools and datasets such as Python, PyTorch, ImageNet and more, that let researchers inspect, reproduce, and build. Today, we’re introducing MolmoAct 2: a fully open-source action reasoning model for real-world robotics. We rethought and reshaped everything! 🧵👇

Jiafei Duan

105,282 次观看 • 2 个月前

AGIBOT introduces AgiBot World, a large-scale open-source dataset for general-purpose robotic learning. ⦿ 1M+ trajectories from 100+ real-world scenarios ⦿ Scenarios include dexterous manipulation, tool use, and multi-robot collaboration

AGIBOT introduces AgiBot World, a large-scale open-source dataset for general-purpose robotic learning. ⦿ 1M+ trajectories from 100+ real-world scenarios ⦿ Scenarios include dexterous manipulation, tool use, and multi-robot collaboration

The Humanoid Hub

59,847 次观看 • 1 年前

1/ We just released π0.7 — a steerable generalist robot model with emergent capabilities. I want to share a bit of the backstory, because π0.7 taught me something surprising about where robot learning is heading. A thread on bittersweet lessons 🧵

1/ We just released π0.7 — a steerable generalist robot model with emergent capabilities. I want to share a bit of the backstory, because π0.7 taught me something surprising about where robot learning is heading. A thread on bittersweet lessons 🧵

Lucy Shi

86,770 次观看 • 2 个月前

🚀 First step to unlocking Generalist Robots! Introducing 🤖LAPA🤖, a new SOTA open-sourced 7B VLA pretrained without using action labels. 💪SOTA VLA trained with Open X (outperforming OpenVLA on cross and multi embodiment) 😯LAPA enables learning from human videos, unlocking potential for robotic foundation model ❗Over 30x pretraining efficiency for VLA training 🤗Code and checkpoints are all open-sourced!

🚀 First step to unlocking Generalist Robots! Introducing 🤖LAPA🤖, a new SOTA open-sourced 7B VLA pretrained without using action labels. 💪SOTA VLA trained with Open X (outperforming OpenVLA on cross and multi embodiment) 😯LAPA enables learning from human videos, unlocking potential for robotic foundation model ❗Over 30x pretraining efficiency for VLA training 🤗Code and checkpoints are all open-sourced!

Seonghyeon Ye

33,239 次观看 • 1 年前

The first agentive robot navigation stack. Fully open source. T-2 days. Quadrupeds are now $1000. Get yours today and start building.

The first agentive robot navigation stack. Fully open source. T-2 days. Quadrupeds are now $1000. Get yours today and start building.

stash

151,754 次观看 • 5 个月前

🔥 Hot release: Aloha unleashed World first demonstration of a robot able to tie shoelaces or hang t-shirts autonomously! They trained a diffusion policy at scale: 26,000 demonstrations over 5 tasks on Aloha 2 robot Retweet if you'd like them to open-source 😝 (video x4) 1/🧵

🔥 Hot release: Aloha unleashed World first demonstration of a robot able to tie shoelaces or hang t-shirts autonomously! They trained a diffusion policy at scale: 26,000 demonstrations over 5 tasks on Aloha 2 robot Retweet if you'd like them to open-source 😝 (video x4) 1/🧵

Remi Cadene

198,388 次观看 • 1 年前

Scalable, reproducible, and reliable robotic evaluation remains an open challenge, especially in the age of generalist robot foundation models. Can *simulation* effectively predict *real-world* robot policy performance & behavior? Presenting SIMPLER!👇

Scalable, reproducible, and reliable robotic evaluation remains an open challenge, especially in the age of generalist robot foundation models. Can simulation effectively predict real-world robot policy performance & behavior? Presenting SIMPLER!👇

Xuanlin Li (Simon)

86,367 次观看 • 2 年前

YouTube is a LARGE dataset of demonstration videos to train Generalist robot agents, but lacks action data. How can we learn DEXTEROUS skills from them? In #CoRL2024, we explore the problem of learning a Generalist Piano Playing agent from YouTube videos.

YouTube is a LARGE dataset of demonstration videos to train Generalist robot agents, but lacks action data. How can we learn DEXTEROUS skills from them? In #CoRL2024, we explore the problem of learning a Generalist Piano Playing agent from YouTube videos.

Julen Urain

42,442 次观看 • 1 年前

Unitree G1 Open Source Dataset In order to promote the development of the global embodied AI industry, the Unitree G1 robot operation data set is open sourced, adapted to a variety of open source solutions, and continuously updated: Open source data collection: Open source learning algorithms: Open source datasets and models: #AI #Teleoperation #OpenSourceDataset #Unitree #EmbodiedAI #Humanoid #DataCollection #AGI

Unitree G1 Open Source Dataset In order to promote the development of the global embodied AI industry, the Unitree G1 robot operation data set is open sourced, adapted to a variety of open source solutions, and continuously updated: Open source data collection: Open source learning algorithms: Open source datasets and models: #AI #Teleoperation #OpenSourceDataset #Unitree #EmbodiedAI #Humanoid #DataCollection #AGI

Unitree

148,860 次观看 • 1 年前

The World's full open source Smallest Desktop-Scale Two-Wheeled Robot. Github: MuShibo/Micro-Wheeled_leg-Robot

The World's full open source Smallest Desktop-Scale Two-Wheeled Robot. Github: MuShibo/Micro-Wheeled_leg-Robot

Steven Cheng

40,308 次观看 • 1 个月前

today, we're releasing the largest egocentric dataset of physical jobs - 400k action labels - 2.5k clips - 2x'd open source dataset size (download below)

today, we're releasing the largest egocentric dataset of physical jobs - 400k action labels - 2.5k clips - 2x'd open source dataset size (download below)

Eddy Xu

417,334 次观看 • 8 个月前