正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

We can teach LLMs to write better robot code through natural language feedback. But can LLMs remember what they were taught and improve their teachability over time? Introducing our latest work, Learning to Learn Faster from Human Feedback with Language Model Predictive Control

Jacky Liang

7,125 subscribers

86,549 次观看 • 2 年前 •via X (Twitter)

教育科学技术

Anya Rossi• Live Now

Private livecam show

13 条评论

Jacky Liang 的头像

Jacky Liang2 年前

TL;DR We improve LLMs’ in-context teachability by fine-tuning them to model human-robot interactions. Here’s an example of robot teaching sessions before and after finetuning. Before, the LLM requires many corrections to do a high-five; after, the model gets there much faster

Jacky Liang 的头像

Jacky Liang2 年前

Given a dataset of these chat-based teaching sessions, how do we improve LLMs’ teachability (number of chat turns it takes to reach task success)? We want to improve teachability not just on train tasks, but also on test tasks and test embodiments.

Jacky Liang 的头像

Jacky Liang2 年前

Instead of finetuning to directly output final answers, we propose Language Model Predictive Control: 1) Finetune the LLM to predict the entire chat session 2) During inference: sample rollouts of future chat sessions, return the first response of the shortest successful chat

Jacky Liang 的头像

Jacky Liang2 年前

Compared to retrieval (RAG) and directly predict the answer (LMPC-Skip), our method (LMPC-Rollouts) achieves the highest improvement in teachability over the base model (PaLM 2-S) With 1 chat turn, LMPC-Skip does the best, but LMPC-Rollouts improves more with corrective feedback

Jacky Liang 的头像

Jacky Liang2 年前

On test tasks, LMPC-Rollouts have the highest success rate when the chat has 2+ turns; it also has the highest good rating rate (if human teachers rated each chat response positively).

Jacky Liang 的头像

Jacky Liang2 年前

While we only fine-tune on 3 embodiments (robot dog, mobile manipulator, aloha), we also see improvements on test embodiments (bimanual kuka, kuka+hand), both have different robot APIs and tasks

Jacky Liang 的头像

Jacky Liang2 年前

Our method enables better teaching on real robots too! Here’s an example with the robot dog…

Jacky Liang 的头像

Jacky Liang2 年前

…and here’s another example for our mobile manipulation robot

Jacky Liang 的头像

Jacky Liang2 年前

See more details in our paper, and videos and demo on our website:

Jacky Liang 的头像

Jacky Liang2 年前

This was a huge collaboration from many folks at Google DeepMind Robotics. I learned a ton from everyone and am super grateful for the amazing teamwork! Special thanks to @xf1280 @Stacormed @andyzeng_! Big shoutouts to our incredible collaborators: @montseglz @JMarakiii …

Jacky Liang 的头像

Jacky Liang2 年前

… @bauzavillalonga Matthew Bennice @AlexBewleyAI @AdilDostmohamed @ChuyuanFu @NimTheCoder Marissa Giustina @keerthanpg @lqh20 Jan Humplik, Jasmine Hsu, Nikhil Joshi, Ben Jyenis, Chase Kew, @SeanKirmani Edward Lee, @kuanghueilee, Assaf Hurwitz Michaely, Joss Moore, Ken Oslund, …

Jacky Liang 的头像

Jacky Liang2 年前

…, Dushyant Rao, @allenzren Baruch Tabanpour @QuanVng @ayzwah @xiao_ted Ying Xu, Vincent Zhuang, as well as our incredible advising leads: Peng Xu, Erik Frey, Ken Caluwaerts, Tingnan Zhang, @brian_ichter @JonathanTompson @leilatakayama Vincent Vanhoucke @IzhakShafran …

Jacky Liang 的头像

Jacky Liang2 年前

…Maja Mataric @DorsaSadigh Nicolas Heess @Kanishka_Rao Nik Stewart, Jie Tan, Carolina Parada Thanks everyone!!

相关视频

Vision-language models can control robots, but what if the prompt is too complex for the robot to follow directly? We developed a way to get robots to “think through” complex instructions, feedback, and interjections. We call it the Hierarchical Interactive Robot (Hi Robot).

Vision-language models can control robots, but what if the prompt is too complex for the robot to follow directly? We developed a way to get robots to “think through” complex instructions, feedback, and interjections. We call it the Hierarchical Interactive Robot (Hi Robot).

Physical Intelligence

116,845 次观看 • 1 年前

Humans learn and improve from failures. Similarly, foundation models adapt based on human feedback. Can we leverage this failure understanding to enhance robotics systems that use foundation models? Introducing AHA—a vision-language model for detecting and reasoning over failures in robotic manipulation. Project page: 🧵Thread👇 Aha!

Humans learn and improve from failures. Similarly, foundation models adapt based on human feedback. Can we leverage this failure understanding to enhance robotics systems that use foundation models? Introducing AHA—a vision-language model for detecting and reasoning over failures in robotic manipulation. Project page: 🧵Thread👇 Aha!

Jiafei Duan

48,739 次观看 • 1 年前

Robot AI brains, aka Vision-Language-Action models, cannot adapt to new tasks as easily as LLMs like Gemini, ChatGPT, or Grok. LLMs can adapt quickly with their in-context learning (ICL) capabilities. But can we inject ICL abilities into a pre-trained VLA like pi0? Yes! Introducing RICL (Retraining for In-Context Learning), our Conference on Robot Learning (CoRL) 2025 paper. Our RICL-pi0 model can adapt to unseen objects, novel motions, and new scenes with just ICL and RAG (retrieval-augmented generation). RICL-pi0 also boosts performance on the long-tail of tasks. A quick 1 minute video summary:

Robot AI brains, aka Vision-Language-Action models, cannot adapt to new tasks as easily as LLMs like Gemini, ChatGPT, or Grok. LLMs can adapt quickly with their in-context learning (ICL) capabilities. But can we inject ICL abilities into a pre-trained VLA like pi0? Yes! Introducing RICL (Retraining for In-Context Learning), our Conference on Robot Learning (CoRL) 2025 paper. Our RICL-pi0 model can adapt to unseen objects, novel motions, and new scenes with just ICL and RAG (retrieval-augmented generation). RICL-pi0 also boosts performance on the long-tail of tasks. A quick 1 minute video summary:

Kaustubh Sridhar

52,158 次观看 • 9 个月前

✨ Introducing Keypoint Action Tokens. 🤖 We translate visual observations and robot actions into a "language" that off-the-shelf LLMs can ingest and output. This transforms LLMs into *in-context, low-level imitation learning machines*. 🚀 Let me explain. 👇🧵

✨ Introducing Keypoint Action Tokens. 🤖 We translate visual observations and robot actions into a "language" that off-the-shelf LLMs can ingest and output. This transforms LLMs into in-context, low-level imitation learning machines. 🚀 Let me explain. 👇🧵

Norman Di Palo

23,088 次观看 • 2 年前

When robots do not work out of the box, we teach them, and they can improve over time. What is interesting is that their "teachability" also improves over time, meaning it takes fewer attempts to shape their behavior and achieve task success. Website:

When robots do not work out of the box, we teach them, and they can improve over time. What is interesting is that their "teachability" also improves over time, meaning it takes fewer attempts to shape their behavior and achieve task success. Website:

Fei Xia

29,097 次观看 • 2 年前

Google Gemini: Talki – An AI language learning app that helps you learn a language where you can talk with an AI in different scenarios and you get real time feedback.

Google Gemini: Talki – An AI language learning app that helps you learn a language where you can talk with an AI in different scenarios and you get real time feedback.

Lovable

86,378 次观看 • 1 年前

Introducing Yell At Your Robot (YAY Robot!) 🗣️- a fun collaboration b/w Stanford University and UC Berkeley 🤖 We enable robots to improve on-the-fly from language corrections: robots rapidly adapt in real-time and continuously improve from human verbal feedback. YAY Robot enables long-horizon, dexterous manipulation tasks like preparing trail-mix, packing a ziploc bag, and cleaning dishes:

Introducing Yell At Your Robot (YAY Robot!) 🗣️- a fun collaboration b/w Stanford University and UC Berkeley 🤖 We enable robots to improve on-the-fly from language corrections: robots rapidly adapt in real-time and continuously improve from human verbal feedback. YAY Robot enables long-horizon, dexterous manipulation tasks like preparing trail-mix, packing a ziploc bag, and cleaning dishes:

Lucy Shi

122,774 次观看 • 2 年前

Yann LeCun says we're fooled by LLMs because they manipulate language well, and we associate that with intelligence But language fluency doesn't mean underlying intelligence Every generation since the 1950s claimed its technique was the ticket to human-level AI All were wrong. "this generation with LLMs is also wrong"

Yann LeCun says we're fooled by LLMs because they manipulate language well, and we associate that with intelligence But language fluency doesn't mean underlying intelligence Every generation since the 1950s claimed its technique was the ticket to human-level AI All were wrong. "this generation with LLMs is also wrong"

Haider.

625,361 次观看 • 6 个月前

Geoffrey Hinton says people still think LLMs are different from us, but they’re not LLMs don’t just generate words; they reflect how we process meaning: "they’re our best model of how language works" old linguistic theories failed to explain language; neural nets using feature vectors finally do

Geoffrey Hinton says people still think LLMs are different from us, but they’re not LLMs don’t just generate words; they reflect how we process meaning: "they’re our best model of how language works" old linguistic theories failed to explain language; neural nets using feature vectors finally do

Haider.

193,247 次观看 • 1 年前

I added the most requested feature to AI Code Translator… Natural language to/from code! You can now: - Use natural language to generate code. - Get natural language explanations from code. Try it: GitHub:

I added the most requested feature to AI Code Translator… Natural language to/from code! You can now: - Use natural language to generate code. - Get natural language explanations from code. Try it: GitHub:

Mckay Wrigley

605,915 次观看 • 3 年前

Demis Hassabis says large language models (LLMs) can search and plan but they forget everything once the session ends. This is because today’s models don’t truly “learn” after deployment. He sees continual learning and better memory as key missing ingredients for AGI. Efficient memory, like the brain’s ability to store only what matters, will be crucial. LLMs are powerful, but without memory, they have a "goldfish brain."

Demis Hassabis says large language models (LLMs) can search and plan but they forget everything once the session ends. This is because today’s models don’t truly “learn” after deployment. He sees continual learning and better memory as key missing ingredients for AGI. Efficient memory, like the brain’s ability to store only what matters, will be crucial. LLMs are powerful, but without memory, they have a "goldfish brain."

Wes Roth

38,833 次观看 • 4 个月前

Yann LeCun says LLMs are strongest in domains where language itself is the substrate of reasoning, like math and code They can solve problems, prove theorems, and write programs — but they are not creative mathematicians, software architects, or computer scientists "their role is to help humans build"

Yann LeCun says LLMs are strongest in domains where language itself is the substrate of reasoning, like math and code They can solve problems, prove theorems, and write programs — but they are not creative mathematicians, software architects, or computer scientists "their role is to help humans build"

Haider.

346,805 次观看 • 1 个月前

Francois Chollet says LLMs aren't enough for human-like continual learning They store skills as vector programs learned via gradient descent — not efficient, not adaptive True AGI needs to learn from experience and generalize fast "LLMs can be part of AGI, but not the solution"

Francois Chollet says LLMs aren't enough for human-like continual learning They store skills as vector programs learned via gradient descent — not efficient, not adaptive True AGI needs to learn from experience and generalize fast "LLMs can be part of AGI, but not the solution"

Haider.

50,220 次观看 • 7 个月前

It may seem counterintuitive to teach students with limited English proficiency to code. But research shows that learning a programming language has more in common with learning a natural language than you might think.💡

It may seem counterintuitive to teach students with limited English proficiency to code. But research shows that learning a programming language has more in common with learning a natural language than you might think.💡

edutopia

34,196 次观看 • 2 年前

NEWS: Humanoid robot startup Figure announced Helix today, their "in-house AI that reasons like a human." "Our robots equipped with Helix can now pick up virtually any household object without any code or prior training. Helix uses a single set of neural network weights to learn all behaviors." "We're introducing Helix, a generalist Vision-Language-Action (VLA) model that unifies perception, language understanding, and learned control to overcome multiple longstanding challenges in robotics."

NEWS: Humanoid robot startup Figure announced Helix today, their "in-house AI that reasons like a human." "Our robots equipped with Helix can now pick up virtually any household object without any code or prior training. Helix uses a single set of neural network weights to learn all behaviors." "We're introducing Helix, a generalist Vision-Language-Action (VLA) model that unifies perception, language understanding, and learned control to overcome multiple longstanding challenges in robotics."

Sawyer Merritt

181,177 次观看 • 1 年前

Many AI-savvy programmers are now coding very differently than before, by using LLMs to help with their work. You’ll learn these emerging best practices in “Pair Programming with a Large Language Model,” by Google's Laurence Moroney 🇺🇸🇮🇪 🏴󠁧󠁢󠁷󠁬󠁳󠁿. This short course covers using LLMs to simplify and improve your code, assist with debugging, and minimize technical debt by having AI document your code. The use of LLMs as a programming companion is an important shift that's well worth every developer staying on top of. Please check this out!

Many AI-savvy programmers are now coding very differently than before, by using LLMs to help with their work. You’ll learn these emerging best practices in “Pair Programming with a Large Language Model,” by Google's Laurence Moroney 🇺🇸🇮🇪 🏴󠁧󠁢󠁷󠁬󠁳󠁿. This short course covers using LLMs to simplify and improve your code, assist with debugging, and minimize technical debt by having AI document your code. The use of LLMs as a programming companion is an important shift that's well worth every developer staying on top of. Please check this out!

Andrew Ng

373,418 次观看 • 2 年前

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

AK

249,572 次观看 • 2 年前

New short course on Reinforcement Learning from Human Feedback, built in collaboration with @GoogleCloud! In this course, you’ll explore this key technique used to align LLMs with human values and make them more helpful, honest, and safe. Join now:

New short course on Reinforcement Learning from Human Feedback, built in collaboration with @GoogleCloud! In this course, you’ll explore this key technique used to align LLMs with human values and make them more helpful, honest, and safe. Join now:

DeepLearning.AI

19,192 次观看 • 2 年前

Introducing ClickDiffusion! We developed a system for precise image manipulation and generation that combines natural language instructions with visual feedback provided by the user through a direct manipulation interface.

Introducing ClickDiffusion! We developed a system for precise image manipulation and generation that combines natural language instructions with visual feedback provided by the user through a direct manipulation interface.

Alec Helbling

36,293 次观看 • 2 年前