正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

This is a single uncut video, showing a robot learning several tasks instantly, after just one demonstration each ... This is possible because we've now been able to achieve in-context learning for everyday robotics tasks, and I'm very excited to announce our latest paper: 🎆 Instant Policy: In-Context Imitation... show more

Edward Johns

3,809 subscribers

74,663 次观看 • 1 年前 •via X (Twitter)

科学技术游戏教育

Anya Rossi• Live Now

Private livecam show

10 条评论

Edward Johns 的头像

Edward Johns1 年前

In-context learning is where a trained model accepts examples of a new task (the "context") at its input, and can then make predictions for that same task given a novel instance of it, without any further training or weight updates. Achieving this in robotics is very exciting: with Instant Policy, we can now provide one or a few demonstrations (the "context"), and the robot instantly learns a closed-loop policy for that task, which it can then immediately perform. (2/6)

Edward Johns 的头像

Edward Johns1 年前

The figure below shows our network architecture, which jointly expresses the context (demonstrations, as sequences of observations and actions), the current observation, and the future actions. Observations are point clouds, and actions are relative gripper poses. During inference, actions are predicted using a learned diffusion process on the graph nodes representing the actions, conditioned on the demonstrations and the current observation. (3/6)

Edward Johns 的头像

Edward Johns1 年前

One very exciting aspect of Instant Policy is that we don't need any real-world training data. The entire network can be trained with simulated "pseudo-demonstrations", which are arbitrary trajectories with random objects, all in simulation. And we found very promising scaling laws: we can continue to generate these pseudo-demonstrations in simulation, and the performance of the network continues to improve. (4/6)

Edward Johns 的头像

Edward Johns1 年前

Beyond just regular imitation learning, we also discovered two intriguing downstream applications: (1) Cross-embodiment transfer from human-hand demonstrations to robot policies. (2) Zero-shot transfer to language-defined tasks without needing large language-annotated datasets. (5/6)

Edward Johns 的头像

Edward Johns1 年前

This was led by my excellent student Vitalis Vosylius (@vitalisvos19), in the final project of his PhD. To read the paper and see more videos, please visit And we have code and weights available on the webpage, for you to teach your own robot with Instant Policy. Please try it out, and let us know how you get on! Thanks for reading 😀 (6/6)

You Jiacheng 的头像

You Jiacheng1 年前

Great work! I have a small problem: how did you prompt SAM in this video? there is another person?

tOSUFever 的头像

tOSUFever1 年前

this is cool 😎

Ornias 的头像

Ornias1 年前

Feels like I'm watching an animal rather than a robot.

XXXin 的头像

XXXin1 年前

Seeing more and more works like this. Wondering how we can leverage the power of community to collect data efficiently in mass, and how the system generalizes under different configurations

Appy Pie 的头像

Appy Pie1 年前

Exciting breakthrough in robotics! With in-context learning, robots can now master tasks instantly after just one demonstration. This is a huge step forward in making robots more adaptable and efficient!

相关视频

I'm very excited to finally announce one of the most ambitious projects we've worked on — which makes the front cover of Science Robotics today: ☀️ Learning a Thousand Tasks in a Day ⭐️ Everyday tasks — like those below — can now be learned from a single demonstration each...

I'm very excited to finally announce one of the most ambitious projects we've worked on — which makes the front cover of Science Robotics today: ☀️ Learning a Thousand Tasks in a Day ⭐️ Everyday tasks — like those below — can now be learned from a single demonstration each...

Edward Johns

109,323 次观看 • 7 个月前

Vision-language models perform diverse tasks via in-context learning. Time for robots to do the same! Introducing In-Context Robot Transformer (ICRT): a robot policy that learns new tasks by prompting with robot trajectories, without any fine-tuning. [1/N]

Vision-language models perform diverse tasks via in-context learning. Time for robots to do the same! Introducing In-Context Robot Transformer (ICRT): a robot policy that learns new tasks by prompting with robot trajectories, without any fine-tuning. [1/N]

Max Fu

40,435 次观看 • 1 年前

Robot AI brains, aka Vision-Language-Action models, cannot adapt to new tasks as easily as LLMs like Gemini, ChatGPT, or Grok. LLMs can adapt quickly with their in-context learning (ICL) capabilities. But can we inject ICL abilities into a pre-trained VLA like pi0? Yes! Introducing RICL (Retraining for In-Context Learning), our Conference on Robot Learning (CoRL) 2025 paper. Our RICL-pi0 model can adapt to unseen objects, novel motions, and new scenes with just ICL and RAG (retrieval-augmented generation). RICL-pi0 also boosts performance on the long-tail of tasks. A quick 1 minute video summary:

Robot AI brains, aka Vision-Language-Action models, cannot adapt to new tasks as easily as LLMs like Gemini, ChatGPT, or Grok. LLMs can adapt quickly with their in-context learning (ICL) capabilities. But can we inject ICL abilities into a pre-trained VLA like pi0? Yes! Introducing RICL (Retraining for In-Context Learning), our Conference on Robot Learning (CoRL) 2025 paper. Our RICL-pi0 model can adapt to unseen objects, novel motions, and new scenes with just ICL and RAG (retrieval-augmented generation). RICL-pi0 also boosts performance on the long-tail of tasks. A quick 1 minute video summary:

Kaustubh Sridhar

52,158 次观看 • 10 个月前

Very excited to announce: Keypoint Action Tokens! We found that LLMs can be repurposed as "imitation learning engines" for robots, by representing both observations & actions as 3D keypoints, and feeding into an LLM for in-context learning. See: More 👇

Very excited to announce: Keypoint Action Tokens! We found that LLMs can be repurposed as "imitation learning engines" for robots, by representing both observations & actions as 3D keypoints, and feeding into an LLM for in-context learning. See: More 👇

Edward Johns

32,569 次观看 • 2 年前

After two years in stealth mode, we're thrilled to unveil Mentee Robotics and our humanoid robot, Mentee Robotics! With AI integration at every layer, from Sim2Real machine learning to NeRF-based algorithms and LLMs, we've achieved a complete end-to-end cycle for tasks. This is just the beginning and we can’t wait to share more

After two years in stealth mode, we're thrilled to unveil Mentee Robotics and our humanoid robot, Mentee Robotics! With AI integration at every layer, from Sim2Real machine learning to NeRF-based algorithms and LLMs, we've achieved a complete end-to-end cycle for tasks. This is just the beginning and we can’t wait to share more

Amnon Shashua

223,875 次观看 • 2 年前

Excited to finally share Generative Value Learning (GVL), my Google DeepMind project on extracting universal value functions from long-context VLMs via in-context learning! We discovered a simple method to generate zero-shot and few-shot values for 300+ robot tasks and 50+ datasets using SOTA VLMs like Gemini (Try out the demo on our website on your robot video today!) I worked a lot on leveraging foundation models as guidance for robots in my PhD, and to me, this result forges a new frontier in how we can use foundation models for robot learning, given its broad applicability independent of embodiment and task types. Quite excited about how we can build on this work as a community!

Excited to finally share Generative Value Learning (GVL), my Google DeepMind project on extracting universal value functions from long-context VLMs via in-context learning! We discovered a simple method to generate zero-shot and few-shot values for 300+ robot tasks and 50+ datasets using SOTA VLMs like Gemini (Try out the demo on our website on your robot video today!) I worked a lot on leveraging foundation models as guidance for robots in my PhD, and to me, this result forges a new frontier in how we can use foundation models for robot learning, given its broad applicability independent of embodiment and task types. Quite excited about how we can build on this work as a community!

Jason Ma

98,090 次观看 • 1 年前

Want a robot that learns household tasks by watching you? EquiBot is a ✨ generalizable and 🚰 data-efficient method for visuomotor policy learning, robust to changes in object shapes, lighting, and scene makeup, even from just 5 mins of human videos. 🧵↓

Want a robot that learns household tasks by watching you? EquiBot is a ✨ generalizable and 🚰 data-efficient method for visuomotor policy learning, robust to changes in object shapes, lighting, and scene makeup, even from just 5 mins of human videos. 🧵↓

Jingyun Yang

88,352 次观看 • 2 年前

We just released TAVI -- a robotics framework that combines touch and vision to solve challenging dexterous tasks in under 1 hour. The key? Use human demonstrations to initialize a policy, followed by tactile-based online learning with vision-based rewards. Details in🧵(1/7)

We just released TAVI -- a robotics framework that combines touch and vision to solve challenging dexterous tasks in under 1 hour. The key? Use human demonstrations to initialize a policy, followed by tactile-based online learning with vision-based rewards. Details in🧵(1/7)

Lerrel Pinto

138,536 次观看 • 2 年前

R+X was accepted at ICRA 2025! Robots can now do in-context imitation learning, just by observing humans going about their daily lives... No more need to *label* and *train* - just *RETRIEVE* and *EXECUTE*! 🧵👇 (1/5)

R+X was accepted at ICRA 2025! Robots can now do in-context imitation learning, just by observing humans going about their daily lives... No more need to label and train - just RETRIEVE and EXECUTE! 🧵👇 (1/5)

Edward Johns

10,209 次观看 • 1 年前

Is this a 3D model? 2 minutes, one shot prompt after 5 prior prompts for context. Look it's learning and evolving...

Is this a 3D model? 2 minutes, one shot prompt after 5 prior prompts for context. Look it's learning and evolving...

Andrew Rulnick

27,915 次观看 • 6 个月前

Learning about active perception with Haoyu Xiong -- your robot needs a head, and to be able to control where it's looking, in order to perform complex tasks!

Learning about active perception with Haoyu Xiong -- your robot needs a head, and to be able to control where it's looking, in order to perform complex tasks!

Chris Paxton

22,784 次观看 • 9 个月前

TidyBot++ is an open-source mobile manipulator optimized for household tasks. The robot can be teleoperated using a mobile phone interface, enabling data collection for imitation learning.

TidyBot++ is an open-source mobile manipulator optimized for household tasks. The robot can be teleoperated using a mobile phone interface, enabling data collection for imitation learning.

The Humanoid Hub

29,188 次观看 • 1 年前

✨ Introducing Keypoint Action Tokens. 🤖 We translate visual observations and robot actions into a "language" that off-the-shelf LLMs can ingest and output. This transforms LLMs into *in-context, low-level imitation learning machines*. 🚀 Let me explain. 👇🧵

✨ Introducing Keypoint Action Tokens. 🤖 We translate visual observations and robot actions into a "language" that off-the-shelf LLMs can ingest and output. This transforms LLMs into in-context, low-level imitation learning machines. 🚀 Let me explain. 👇🧵

Norman Di Palo

23,088 次观看 • 2 年前

World Model meets robot policy! Robbyant's LingBot-VA: unifies video world modeling and robotic policy learning. - A single model generates both future video and the actions to make it real. - Long-term memory enables long-horizon tasks. - Claims significant outperformance over π₀.₅ in real-world tasks. - It's open-source

World Model meets robot policy! Robbyant's LingBot-VA: unifies video world modeling and robotic policy learning. - A single model generates both future video and the actions to make it real. - Long-term memory enables long-horizon tasks. - Claims significant outperformance over π₀.₅ in real-world tasks. - It's open-source

The Humanoid Hub

17,721 次观看 • 4 个月前

Introducing GEN-1. Our latest milestone in scaling robot learning. We believe it to be the first general-purpose AI model to master simple physical tasks. 99% success rates, 3x faster speeds, adapts in real time to unexpected scenarios, w/ only 1 hour of robot data. More🧵👇

Introducing GEN-1. Our latest milestone in scaling robot learning. We believe it to be the first general-purpose AI model to master simple physical tasks. 99% success rates, 3x faster speeds, adapts in real time to unexpected scenarios, w/ only 1 hour of robot data. More🧵👇

Generalist

377,841 次观看 • 2 个月前

Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL 2. Distributional Successor Features Enable Zero-Shot Policy Optimization 3. Learning to Cooperate with Humans using Generative Agents 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Find more details on each paper and where to find us in this thread (1/6)

Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL 2. Distributional Successor Features Enable Zero-Shot Policy Optimization 3. Learning to Cooperate with Humans using Generative Agents 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Find more details on each paper and where to find us in this thread (1/6)

Abhishek Gupta

10,777 次观看 • 1 年前

We are excited to announce CLIN 🤖: The first continually learning language agent that excels in both task adaptation and generalization to unseen tasks and environments in a pure zero-shot setup. Aristo Team at Ai2 Ai2 Website: Let's dive in 🧵 (1/n)

We are excited to announce CLIN 🤖: The first continually learning language agent that excels in both task adaptation and generalization to unseen tasks and environments in a pure zero-shot setup. Aristo Team at Ai2 Ai2 Website: Let's dive in 🧵 (1/n)

Bodhisattwa Majumder

35,052 次观看 • 2 年前

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

Heni Ben Amor

51,248 次观看 • 6 个月前

This is a valid crash out due to learning our government was run by a MASSIVE satan worshipping cannibalistic pedophile ring. We've ALL been here because HOW do you function in every day life after learning this?! How can anyone pretend ANYTHING is NORMAL?! What even IS normal?

Sensitive content

This is a valid crash out due to learning our government was run by a MASSIVE satan worshipping cannibalistic pedophile ring. We've ALL been here because HOW do you function in every day life after learning this?! How can anyone pretend ANYTHING is NORMAL?! What even IS normal?

Bridgett Fertig

24,736 次观看 • 2 个月前