Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Excited to finally share Generative Value Learning (GVL), my Google DeepMind project on extracting universal value functions from long-context VLMs via in-context learning! We discovered a simple method to generate zero-shot and few-shot values for 300+ robot tasks and 50+ datasets using SOTA VLMs like Gemini (Try out the... show more

Jason Ma

20,266 subscribers

98,090 görüntüleme • 1 yıl önce •via X (Twitter)

Bilim & Teknoloji Haberler & Politika Eğitim

Anya Rossi• Live Now

Private livecam show

10 Yorum

Jason Ma profil fotoğrafı

Jason Ma1 yıl önce

First, check out our project website for the paper, interactive demos, and getting your robot video labeled by GVL today! You can even listen to an AI podcast about our paper, or ask Gemini questions about our paper too! We (especially @xf1280) put in a lot of effort in getting these demos up. Let us know how you find these new ways to engage with paper!

Jason Ma profil fotoğrafı

Jason Ma1 yıl önce

Value function is a fundamental component of robotics; it can be used for search, planning, RL, success detection, and many more applications. However, learning a universal value function (UVF) for many robots and tasks has been extremely challenging and traditional value learning algorithms have not shown to scale. In this paper, we explore a totally new direction and ask: Can SOTA VLMs with all its world knowledge and capabilities be repurposed to be universal value functions for all robots and tasks?

Jason Ma profil fotoğrafı

Jason Ma1 yıl önce

The answer is yes, and the method is simple yet intriguing! We propose formulating value learning as an autoregressive prediction task over *shuffled* sequence of the input video. Why? Think about a standard video showing a task unfolding in chronological order. We empirically find that this actually makes it harder for the VLM to estimate progress because it might just latch onto the order of the frames instead of the underlying changes that signify actual progress towards completing the task. By shuffling, we force the VLM to work “harder” to figure out the correct order based on the visual cues of task progress, and doing so significantly improves the faithfulness of the value predictions! In a way, GVL poses value predictions as an ‘’temporal unshuffling’’ puzzle to the VLM; it has all the pieces, but it has to figure out how those pieces fit together in a way that makes sense based on progress towards a goal.

Jason Ma profil fotoğrafı

Jason Ma1 yıl önce

GVL can zero-shot generate dense values and captions for diverse robots, tasks, and viewpoints! Here, we show some examples of GVL on really long-horizon and challenging viewpoints, including laundry folding from @physical_int, shirt hanging from ALOHA Unleashed (@tonyzzhao @ayzwah), wrist camera trajectories from DOBBE (@notmahi) and UMI (@chichengcc). Check out our project website for many additional results!

Jason Ma profil fotoğrafı

Jason Ma1 yıl önce

@xf1280 @physical_int @tonyzzhao More examples on some OXE-datasets and even navigation video! No modification to the algorithm or fine-tuning to VLM needed!

Jason Ma profil fotoğrafı

Jason Ma1 yıl önce

What’s very appealing about GVL is that it can leverage in-context learning to improve its value predictions! By simply pre-pending shuffled frame-value pairs in the VLM context, we find the value prediction quality to steadily improve on a challenging set of 250 ALOHA tasks! The long-context window enables us to pack as many as 5 trajectories (>150 frames) in-context, and we still see performance boost!

Jason Ma profil fotoğrafı

Jason Ma1 yıl önce

GVL can even benefit from cross-embodiment and cross-task in-context learning! That is, we can feed shuffled frames of humans or robots performing other tasks and their values as context, and we again see performance improvement!

Jason Ma profil fotoğrafı

Jason Ma1 yıl önce

The generality of GVL enables many downstream applications, including dataset quality, success detection, and policy learning! I am very excited about the dataset quality estimation results, because it is a new way of using value models and very relevant to today’s robot learning landscape where models are trained on mixtures of datasets, and practitioners need good ways of determining what datasets are high quality. Check out the paper for more details on these applications!

Jason Ma profil fotoğrafı

Jason Ma1 yıl önce

I'd like to thank all my collaborators for making this a super fun and rewarding project: @JoeyHejna @ayzwah @ChuyuanFu @shahdhruv_ @jackyliang42 @drzhuoxu @SeanKirmani @sippeyxp @DannyDriess @xiao_ted @JonathanTompson @obastani @dineshjayaraman @Stacormed @tingnan1986 @DorsaSadigh @xf1280 . Many of them are currently at @corl_conf , make sure to talk to them about our paper! I am particularly grateful to @xf1280 for his mentorship and guidance throughout this project; I benefited a lot from his expertise and insights on frontier VLMs for robotics!

sonu profil fotoğrafı

sonu1 yıl önce

@GoogleDeepMind Great work bro 👍

Benzer Videolar

Robot AI brains, aka Vision-Language-Action models, cannot adapt to new tasks as easily as LLMs like Gemini, ChatGPT, or Grok. LLMs can adapt quickly with their in-context learning (ICL) capabilities. But can we inject ICL abilities into a pre-trained VLA like pi0? Yes! Introducing RICL (Retraining for In-Context Learning), our Conference on Robot Learning (CoRL) 2025 paper. Our RICL-pi0 model can adapt to unseen objects, novel motions, and new scenes with just ICL and RAG (retrieval-augmented generation). RICL-pi0 also boosts performance on the long-tail of tasks. A quick 1 minute video summary:

Robot AI brains, aka Vision-Language-Action models, cannot adapt to new tasks as easily as LLMs like Gemini, ChatGPT, or Grok. LLMs can adapt quickly with their in-context learning (ICL) capabilities. But can we inject ICL abilities into a pre-trained VLA like pi0? Yes! Introducing RICL (Retraining for In-Context Learning), our Conference on Robot Learning (CoRL) 2025 paper. Our RICL-pi0 model can adapt to unseen objects, novel motions, and new scenes with just ICL and RAG (retrieval-augmented generation). RICL-pi0 also boosts performance on the long-tail of tasks. A quick 1 minute video summary:

Kaustubh Sridhar

52,158 görüntüleme • 10 ay önce

Every home is different. That means that to build a useful home robot, we must be able to perform zero-shot generalization on a wide range of tasks. Humanoid company 1X has a solution: world models. 1X Director of Evaluations Daniel Ho joins us on RoboPapers to talk about: - why world models are the future for scaling robot learning - how to use world models for robot control - what world models unlock for evaluating robot model performance - how we can hill-climb from here to general purpose robots Watch Episode #61 of RoboPapers, with Michael Cho - Rbt/Acc and Chris Paxton, now!

Every home is different. That means that to build a useful home robot, we must be able to perform zero-shot generalization on a wide range of tasks. Humanoid company 1X has a solution: world models. 1X Director of Evaluations Daniel Ho joins us on RoboPapers to talk about: - why world models are the future for scaling robot learning - how to use world models for robot control - what world models unlock for evaluating robot model performance - how we can hill-climb from here to general purpose robots Watch Episode #61 of RoboPapers, with Michael Cho - Rbt/Acc and Chris Paxton, now!

RoboPapers

27,567 görüntüleme • 5 ay önce

Vision-language models perform diverse tasks via in-context learning. Time for robots to do the same! Introducing In-Context Robot Transformer (ICRT): a robot policy that learns new tasks by prompting with robot trajectories, without any fine-tuning. [1/N]

Vision-language models perform diverse tasks via in-context learning. Time for robots to do the same! Introducing In-Context Robot Transformer (ICRT): a robot policy that learns new tasks by prompting with robot trajectories, without any fine-tuning. [1/N]

Max Fu

40,446 görüntüleme • 1 yıl önce

Over the last few months, we’ve been thinking about how to learn from “off-domain” data - data from non-robot sources like video or simulation. These data sources are not quite good enough to learn policies (even monolithic VLA models) directly, but they still contain lots of information that can be useful for generalizable robot control. How can we develop robot learning models that are able to make use of this type of data for generalizable control? In new work, that we call HAMSTER, we show that VLMs can be useful for enabling robotic learning from off-domain data, but specifically when used through hierarchical VLA architectures. We show that this class of models can learn generalizable robot policies for the real world from large-scale, off-domain data. A 🧵 (1/10)

Over the last few months, we’ve been thinking about how to learn from “off-domain” data - data from non-robot sources like video or simulation. These data sources are not quite good enough to learn policies (even monolithic VLA models) directly, but they still contain lots of information that can be useful for generalizable robot control. How can we develop robot learning models that are able to make use of this type of data for generalizable control? In new work, that we call HAMSTER, we show that VLMs can be useful for enabling robotic learning from off-domain data, but specifically when used through hierarchical VLA architectures. We show that this class of models can learn generalizable robot policies for the real world from large-scale, off-domain data. A 🧵 (1/10)

Abhishek Gupta

11,994 görüntüleme • 1 yıl önce

Excited to share a new humanoid robot platform we’ve been working on. Berkeley Humanoid is a reliable and low-cost mid-scale research platform for learning-based control. We demonstrate the robot walking on various terrains and dynamic hopping with a simple RL controller.

Excited to share a new humanoid robot platform we’ve been working on. Berkeley Humanoid is a reliable and low-cost mid-scale research platform for learning-based control. We demonstrate the robot walking on various terrains and dynamic hopping with a simple RL controller.

Qiayuan Liao

80,294 görüntüleme • 1 yıl önce

Super excited for the release of Robot Utility Models (RUMs)! RUMs is a simple method to build zero-shot robot policies that can solve useful tasks in completely new homes without any additional training often at 90%+ success rate. 🧵👇

Super excited for the release of Robot Utility Models (RUMs)! RUMs is a simple method to build zero-shot robot policies that can solve useful tasks in completely new homes without any additional training often at 90%+ success rate. 🧵👇

Lerrel Pinto

56,591 görüntüleme • 1 yıl önce

Today, we present a step-change in robotic AI Sunday. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. - Ultra long-horizon tasks - Zero-shot generalization - Advanced dexterity 🧵->

Today, we present a step-change in robotic AI Sunday. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. - Ultra long-horizon tasks - Zero-shot generalization - Advanced dexterity 🧵->

Tony Zhao

2,045,774 görüntüleme • 7 ay önce

This is a single uncut video, showing a robot learning several tasks instantly, after just one demonstration each ... This is possible because we've now been able to achieve in-context learning for everyday robotics tasks, and I'm very excited to announce our latest paper: 🎆 Instant Policy: In-Context Imitation Learning via Graph Diffusion 🎆 (1/6) 🧵👇

This is a single uncut video, showing a robot learning several tasks instantly, after just one demonstration each ... This is possible because we've now been able to achieve in-context learning for everyday robotics tasks, and I'm very excited to announce our latest paper: 🎆 Instant Policy: In-Context Imitation Learning via Graph Diffusion 🎆 (1/6) 🧵👇

Edward Johns

74,680 görüntüleme • 1 yıl önce

Robot foundation models are limited by costly real data, while simulation data is plentiful but visually mismatched to reality. We present Point Bridge, a method that enables zero-shot sim-to-real transfer for robot learning with minimal visual alignment.

Robot foundation models are limited by costly real data, while simulation data is plentiful but visually mismatched to reality. We present Point Bridge, a method that enables zero-shot sim-to-real transfer for robot learning with minimal visual alignment.

Siddhant Haldar

19,862 görüntüleme • 4 ay önce

The robot is learning several novel tasks instantly, after just ONE demonstration each... Instant Policy makes it possible: no extra training, no weight updates, just pure in-context learning. It just got accepted at ICLR 2025, and it’s changing how robots learn. With just a single demo, a robot can pick up a new task and start performing it right away. Why this is a big deal: ✅ Learns tasks instantly with just one or a few demonstrations ✅ Improves over time as more demonstrations are given ✅ Uses simulation-based training with “pseudo-demonstrations” for scalability ✅ Can transfer skills across different robots and even follow language-defined tasks It brings in-context learning to robotics, opening up new possibilities for flexible, real-world automation. You can try it yourself: code and weights are available at • • • • Thank you to Edward Johns, Director of the Robot Learning Lab at Imperial College for sharing their work! 🙏

The robot is learning several novel tasks instantly, after just ONE demonstration each... Instant Policy makes it possible: no extra training, no weight updates, just pure in-context learning. It just got accepted at ICLR 2025, and it’s changing how robots learn. With just a single demo, a robot can pick up a new task and start performing it right away. Why this is a big deal: ✅ Learns tasks instantly with just one or a few demonstrations ✅ Improves over time as more demonstrations are given ✅ Uses simulation-based training with “pseudo-demonstrations” for scalability ✅ Can transfer skills across different robots and even follow language-defined tasks It brings in-context learning to robotics, opening up new possibilities for flexible, real-world automation. You can try it yourself: code and weights are available at • • • • Thank you to Edward Johns, Director of the Robot Learning Lab at Imperial College for sharing their work! 🙏

Ilir Aliu

46,285 görüntüleme • 1 yıl önce

🤖 How can robot policies zero-shot generalize to any new environment and any new object? Introducing our new project: 🚀Data Scaling Laws in Imitation Learning for Robotic Manipulation🚀—bringing us closer to the dream of having robots work as waiters in hot pot restaurants! 🍲

🤖 How can robot policies zero-shot generalize to any new environment and any new object? Introducing our new project: 🚀Data Scaling Laws in Imitation Learning for Robotic Manipulation🚀—bringing us closer to the dream of having robots work as waiters in hot pot restaurants! 🍲

Yang Gao

124,995 görüntüleme • 1 yıl önce

Today we are excited to open up Neuracore to the academic community! Neuracore is a new data foundation built to accelerate robot learning by removing one of the field’s biggest bottlenecks: capturing and working with high-fidelity multimodal robotics data. For the first time, researchers can store, view, and work with robotics data in a cloud-native system built specifically for large-scale learning, and we are making this core platform completely free for academia. The platform lets teams capture every sensor at its native rate, store and visualize data without loss, and then train and deploy models locally using our open-source code (Link in the comments). We are rolling out access to select academic institutions first. Anyone with an academic email can sign up, and if your institution is not part of the initial rollout, you will be able to join the waitlist directly. Beyond providing this infrastructure, we see an opportunity to build a global community where engineers and researchers can share, collaborate, and advance the frontier of robot learning together. Supported by our recent $3M pre-seed round led by Earlybird VC, we are excited to take this mission even further. Our long-term goal is for Neuracore to become the natural home for cutting-edge robot learning algorithms and real-world robotics experimentation, helping accelerate the next wave of Physical AI.

Today we are excited to open up Neuracore to the academic community! Neuracore is a new data foundation built to accelerate robot learning by removing one of the field’s biggest bottlenecks: capturing and working with high-fidelity multimodal robotics data. For the first time, researchers can store, view, and work with robotics data in a cloud-native system built specifically for large-scale learning, and we are making this core platform completely free for academia. The platform lets teams capture every sensor at its native rate, store and visualize data without loss, and then train and deploy models locally using our open-source code (Link in the comments). We are rolling out access to select academic institutions first. Anyone with an academic email can sign up, and if your institution is not part of the initial rollout, you will be able to join the waitlist directly. Beyond providing this infrastructure, we see an opportunity to build a global community where engineers and researchers can share, collaborate, and advance the frontier of robot learning together. Supported by our recent $3M pre-seed round led by Earlybird VC, we are excited to take this mission even further. Our long-term goal is for Neuracore to become the natural home for cutting-edge robot learning algorithms and real-world robotics experimentation, helping accelerate the next wave of Physical AI.

Neuracore

40,620 görüntüleme • 7 ay önce

New Gemini Robotics 1.5 models will enable robots to better reason, plan ahead, use digital tools like Search, and transfer learning from one kind of robot to another. Our next big step towards general-purpose robots that are truly helpful — you can see how the robot reasons as it sorts laundry in the video below.

New Gemini Robotics 1.5 models will enable robots to better reason, plan ahead, use digital tools like Search, and transfer learning from one kind of robot to another. Our next big step towards general-purpose robots that are truly helpful — you can see how the robot reasons as it sorts laundry in the video below.

Sundar Pichai

496,266 görüntüleme • 9 ay önce

Today is the beginning of our moonshot to solve embodied AGI in the physical world. I’m so excited to announce Project GR00T, our new initiative to create a general-purpose foundation model for humanoid robot learning. The GR00T model will enable a robot to understand multimodal instructions, such as language, video, and demonstration, and perform a variety of useful tasks. We are collaborating with many leading humanoid companies around the world, so that GR00T may transfer across embodiments and help the ecosystem thrive. GR00T is born on NVIDIA’s deep technology stack. We simulate in Isaac Lab (new app on Omniverse Isaac Sim for humanoid learning), train on OSMO (new compute orchestration system to scale up models), and deploy to Jetson Thor (new edge GPU chip designed to power GR00T). Announced in Jensen's keynote, Project GR00T is a cornerstone for the “Foundation Agent” roadmap of the newly founded GEAR Lab. At GEAR, we are building generally capable agents that learn to act skillfully in many worlds, virtual and real. See if you can spot "GEAR" in the video ;) Join us on the journey to land on the moon.

Today is the beginning of our moonshot to solve embodied AGI in the physical world. I’m so excited to announce Project GR00T, our new initiative to create a general-purpose foundation model for humanoid robot learning. The GR00T model will enable a robot to understand multimodal instructions, such as language, video, and demonstration, and perform a variety of useful tasks. We are collaborating with many leading humanoid companies around the world, so that GR00T may transfer across embodiments and help the ecosystem thrive. GR00T is born on NVIDIA’s deep technology stack. We simulate in Isaac Lab (new app on Omniverse Isaac Sim for humanoid learning), train on OSMO (new compute orchestration system to scale up models), and deploy to Jetson Thor (new edge GPU chip designed to power GR00T). Announced in Jensen's keynote, Project GR00T is a cornerstone for the “Foundation Agent” roadmap of the newly founded GEAR Lab. At GEAR, we are building generally capable agents that learn to act skillfully in many worlds, virtual and real. See if you can spot "GEAR" in the video ;) Join us on the journey to land on the moon.

Jim Fan

1,076,740 görüntüleme • 2 yıl önce

Excited to share a Google DeepMind Gemini 2.0 Flash Image Generation and Editing Quickstart. We build a Next.js reference app on how to use the new image editing feature of Gemini 2.0 Flash. Demo to test ⬇️ > Generate images from text prompts using Gemini 2.0 Flash > Or upload an image and edit it using prompts

Excited to share a Google DeepMind Gemini 2.0 Flash Image Generation and Editing Quickstart. We build a Next.js reference app on how to use the new image editing feature of Gemini 2.0 Flash. Demo to test ⬇️ > Generate images from text prompts using Gemini 2.0 Flash > Or upload an image and edit it using prompts

Philipp Schmid

52,406 görüntüleme • 1 yıl önce

How to harness foundation models for *generalization in the wild* in robot manipulation? Introducing VoxPoser: use LLM+VLM to label affordances and constraints directly in 3D perceptual space for zero-shot robot manipulation in the real world! 🌐 🧵👇

How to harness foundation models for generalization in the wild in robot manipulation? Introducing VoxPoser: use LLM+VLM to label affordances and constraints directly in 3D perceptual space for zero-shot robot manipulation in the real world! 🌐 🧵👇

Wenlong Huang

293,876 görüntüleme • 3 yıl önce

We teamed up with Google for Developers to create a community partnership focused on using NVIDIA technologies on Google Cloud. With new learning pathways on generative AI, inference, and more — join to connect with developers and start learning 👉

We teamed up with Google for Developers to create a community partnership focused on using NVIDIA technologies on Google Cloud. With new learning pathways on generative AI, inference, and more — join to connect with developers and start learning 👉

NVIDIA AI Developer

13,032 görüntüleme • 8 ay önce

In order for robots to be deployed in the real world, performing tasks of real value, they must be reliable. Unfortunately, even more, most robotic demos work maybe 70-80% of the time at best. The way to get better reliability is to do real-world reinforcement learning: having the robot teach itself how to perform the task up to a high level of success. The key to doing this is to start with a core of expert human data, use that to train a policy then iteratively improve it, until finally finishing with on-policy reinforcement learning. Kun Lei talks through a unified framework for imitation and reinforcement learning based on PPO, which enables this improvement process. In this episode, Kun Lei explains the theory behind his reinforcement learning method and how it allowed his robot to run in a shopping mall juicing oranges for seven hours at a time, among experiments on a wide variety of tasks and embodiments. Watch episode 58 of RoboPapers now, hosted by Michael Cho - Rbt/Acc and Chris Paxton!

In order for robots to be deployed in the real world, performing tasks of real value, they must be reliable. Unfortunately, even more, most robotic demos work maybe 70-80% of the time at best. The way to get better reliability is to do real-world reinforcement learning: having the robot teach itself how to perform the task up to a high level of success. The key to doing this is to start with a core of expert human data, use that to train a policy then iteratively improve it, until finally finishing with on-policy reinforcement learning. Kun Lei talks through a unified framework for imitation and reinforcement learning based on PPO, which enables this improvement process. In this episode, Kun Lei explains the theory behind his reinforcement learning method and how it allowed his robot to run in a shopping mall juicing oranges for seven hours at a time, among experiments on a wide variety of tasks and embodiments. Watch episode 58 of RoboPapers now, hosted by Michael Cho - Rbt/Acc and Chris Paxton!

RoboPapers

18,813 görüntüleme • 5 ay önce

Robotics has changed dramatically over the last eight years. Ted Xiao has been involved in the cutting edge of robot learning through this period, spending those eight years at Google Brain/Google Deepmind. And he’s identified three eras of robot learning. These eras are: - The Era of Existence Proofs - trying different methods like QT-Opt, on-robot RL - The Era of Foundation Models - transitioning to data collection and clean objectives (i.e. supervised learning) - The Era of Scaling - orders of magnitude more data and larger models, enabling reasoning, long-horizon actions, and cross-embodiment transfer Watch Episode 78 of RoboPapers, with Michael Cho - Rbt/Acc and Jiafei Duan to learn more!

Robotics has changed dramatically over the last eight years. Ted Xiao has been involved in the cutting edge of robot learning through this period, spending those eight years at Google Brain/Google Deepmind. And he’s identified three eras of robot learning. These eras are: - The Era of Existence Proofs - trying different methods like QT-Opt, on-robot RL - The Era of Foundation Models - transitioning to data collection and clean objectives (i.e. supervised learning) - The Era of Scaling - orders of magnitude more data and larger models, enabling reasoning, long-horizon actions, and cross-embodiment transfer Watch Episode 78 of RoboPapers, with Michael Cho - Rbt/Acc and Jiafei Duan to learn more!

RoboPapers

36,520 görüntüleme • 2 ay önce

Evaluation is a critical bottleneck in building robot foundation models. Check out our latest work RoboLab, led by Xuning Yang, which addresses this exact challenge. Its a high-fidelity simulation environment for testing these models. A truly generalist policy should be able to complete these tasks zero-shot, and this benchmark highlights exactly how far we still have to go. More info 👇

Evaluation is a critical bottleneck in building robot foundation models. Check out our latest work RoboLab, led by Xuning Yang, which addresses this exact challenge. Its a high-fidelity simulation environment for testing these models. A truly generalist policy should be able to complete these tasks zero-shot, and this benchmark highlights exactly how far we still have to go. More info 👇

Ankit Goyal

29,731 görüntüleme • 2 ay önce