正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Self-supervised representation learning looks a bit like RL. What if we literally use RL as a SSL method for visual representations? Turns out that it works quite well. In new work by Dibya Ghosh, we show how this can be done:

Sergey Levine

132,246 subscribers

48,751 次观看 • 1 年前 •via X (Twitter)

科学技术教育

Anya Rossi• Live Now

Private livecam show

5 条评论

Sergey Levine 的头像

Sergey Levine1 年前

Imagine an MDP where the state is the current crop of the image, an action is to pick a new crop, and rewards are matching textual captions or other (weak or strong) labels. Training a value function for this MDP instantiations a representation learning method.

Sergey Levine 的头像

Sergey Levine1 年前

Reward could come from matching a text label, or provided in a fully unsupervised way via crop consistency. The stronger the reward, the better it works, but even weak rewards like crop consistency lead to improvement. For more, check out the website:

Joanne Mercado 的头像

Joanne Mercado1 年前

@its_dibya *an SSL, but overall your grammar and punctuation are top-tier 💯

Ethan vs Machines 的头像

Ethan vs Machines1 年前

@its_dibya RL for SSL using semantic rewards? Brilliant method. Scaling beyond COCO might be tough here though—Canada’s R&D can’t keep up with compute demands anymore.

ᐸGerardSans/ᐳ🚀🇬🇧 的头像

ᐸGerardSans/ᐳ🚀🇬🇧1 年前

@its_dibya That’s just flattened patching which is something but not really.

相关视频

Our work, "A Primer on SO(3) Action Representations in Deep Reinforcement Learning," was accepted to #ICLR2026! We provide a systematic study of action representation choices in RL, showing that they fundamentally impact training stability and performance. #Robotics #AI #RL

Our work, "A Primer on SO(3) Action Representations in Deep Reinforcement Learning," was accepted to #ICLR2026! We provide a systematic study of action representation choices in RL, showing that they fundamentally impact training stability and performance. #Robotics #AI #RL

Learning Systems and Robotics Lab (is hiring!)

49,655 次观看 • 4 个月前

✨ICLR 2024 Spotlight Still hand-labeling phases or tuning GANs with your trajectories? We propose FLD, a self-supervised representation method that extracts spatial-temporal relationships in high-dimensional trajectories. RL policies informed by FLD show extended generality.🧵

✨ICLR 2024 Spotlight Still hand-labeling phases or tuning GANs with your trajectories? We propose FLD, a self-supervised representation method that extracts spatial-temporal relationships in high-dimensional trajectories. RL policies informed by FLD show extended generality.🧵

Chenhao Li

81,933 次观看 • 2 年前

So we did a bunch of projects with real world reinforcement learning - but it was often too inefficient to be practical to train tabula rasa. This suggests we need better priors, but acquiring these from on-robot data can often be expensive as well. In our recent work, we show that despite being fundamentally inaccurate, simulation can guide provide a cheap way to guide real-world RL finetuning to be super efficient! We propose Simulation-Guided Fine-Tuning (SGFT) - a simple paradigm for sim2real finetuning that uses simulation to provide reward shaping that accelerates real world RL finetuning *beyond* just providing an initialization. TLDR: Use value functions from sim to shape rewards for real-world RL, see large sample efficiency improvements 🧵(1/6)

So we did a bunch of projects with real world reinforcement learning - but it was often too inefficient to be practical to train tabula rasa. This suggests we need better priors, but acquiring these from on-robot data can often be expensive as well. In our recent work, we show that despite being fundamentally inaccurate, simulation can guide provide a cheap way to guide real-world RL finetuning to be super efficient! We propose Simulation-Guided Fine-Tuning (SGFT) - a simple paradigm for sim2real finetuning that uses simulation to provide reward shaping that accelerates real world RL finetuning beyond just providing an initialization. TLDR: Use value functions from sim to shape rewards for real-world RL, see large sample efficiency improvements 🧵(1/6)

Abhishek Gupta

13,637 次观看 • 1 年前

Learned visuomotor policies are notoriously fragile, they break with changes in conditions like lighting, clutter, or object variations amongst other things. In Yunchu @ CoRL2025's latest work, we asked whether we could get these policies to be robust and generalizable with a clever choice of visual representation! The argument we made was - we want a choice of visual representation that specifically adapts to be sufficient, yet minimal for the task at hand. We thought about it from the perspective of flexible, key-point based representations. The key question becomes - how do we choose a sufficient, task-specific, yet minimal set of keypoints as a representation for policy learning. Yunchu proposes a neat way of automatically selecting task-relevant keypoints using a standard supervised learning objective, and using this for robust policy learning. This is largely under the same assumptions as behavior cloning, but with huge gains on robustness. Let’s understand how, 🧵 (1/8)

Learned visuomotor policies are notoriously fragile, they break with changes in conditions like lighting, clutter, or object variations amongst other things. In Yunchu @ CoRL2025's latest work, we asked whether we could get these policies to be robust and generalizable with a clever choice of visual representation! The argument we made was - we want a choice of visual representation that specifically adapts to be sufficient, yet minimal for the task at hand. We thought about it from the perspective of flexible, key-point based representations. The key question becomes - how do we choose a sufficient, task-specific, yet minimal set of keypoints as a representation for policy learning. Yunchu proposes a neat way of automatically selecting task-relevant keypoints using a standard supervised learning objective, and using this for robust policy learning. This is largely under the same assumptions as behavior cloning, but with huge gains on robustness. Let’s understand how, 🧵 (1/8)

Abhishek Gupta

11,355 次观看 • 1 年前

What do we really need for social navigation?🤖 Deep RL? No❌ Imitation learning? No❌ LLMs? No❌ 📊Turns out for over 80% of the time, classical methods >> learning-based solutions! In new work, we rethink social navigation. Thread🧶

What do we really need for social navigation?🤖 Deep RL? No❌ Imitation learning? No❌ LLMs? No❌ 📊Turns out for over 80% of the time, classical methods >> learning-based solutions! In new work, we rethink social navigation. Thread🧶

Rohan Chandra

14,066 次观看 • 2 年前

We can learn a model that provides shaped "process rewards" for robotic RL, that evolves automatically as the policy gets better. This improves performance on benchmarks, and works in the real world! Some fun new work with Raymond Tsao & Andrew Wagenmaker

We can learn a model that provides shaped "process rewards" for robotic RL, that evolves automatically as the policy gets better. This improves performance on benchmarks, and works in the real world! Some fun new work with Raymond Tsao & Andrew Wagenmaker

Sergey Levine

36,440 次观看 • 1 个月前

If you have a policy that uses diffusion/flow (e.g. diffusion VLA), you can run RL where the actor chooses the noise, which is then denoised by the policy to produce an action. This method, which we call diffusion steering (DSRL), leads to a remarkably efficient RL method! 🧵👇

If you have a policy that uses diffusion/flow (e.g. diffusion VLA), you can run RL where the actor chooses the noise, which is then denoised by the policy to produce an action. This method, which we call diffusion steering (DSRL), leads to a remarkably efficient RL method! 🧵👇

Sergey Levine

152,824 次观看 • 1 年前

How can we get VLMs to move their eyes—and reason step-by-step in visually grounded ways? 👀 We introduce ViGoRL, a RL method that anchors reasoning to image regions. 🎯 It outperforms vanilla GRPO and SFT across grounding, spatial tasks, and visual search (86.4% on V*). 👇🧵

How can we get VLMs to move their eyes—and reason step-by-step in visually grounded ways? 👀 We introduce ViGoRL, a RL method that anchors reasoning to image regions. 🎯 It outperforms vanilla GRPO and SFT across grounding, spatial tasks, and visual search (86.4% on V*). 👇🧵

Gabriel Sarch

76,548 次观看 • 1 年前

Every wondered if we can model motion as a language? can we tokenize this new language? is it useful? Turns out tremendously! 🚀 In out latest #NeurIPS2024 paper on QueST: Self-Supervised Skill Abstractions for Learning Continuous Control, we find that action tokenization matters a lot! We can learn skill encodings by representing temporal action abstractions with a discrete codebook. This enables 2 things 1. Better Behaviour Cloning: we can better assimilate multi-task data (>9%) over best paper. This is currently best in class BC method! 2. generalization of this language to represent new tasks in 5-shot transfer to longer horizon tasks! Check out the thread by Atharva Mete for more details. And check out more details at: Joint work with Atharva Mete Albert Wilcox Haotian Xue Yongxin Chen Georgia Tech School of Interactive Computing Machine Learning at Georgia Tech Robotics@GT NVIDIA Robotics

Every wondered if we can model motion as a language? can we tokenize this new language? is it useful? Turns out tremendously! 🚀 In out latest #NeurIPS2024 paper on QueST: Self-Supervised Skill Abstractions for Learning Continuous Control, we find that action tokenization matters a lot! We can learn skill encodings by representing temporal action abstractions with a discrete codebook. This enables 2 things 1. Better Behaviour Cloning: we can better assimilate multi-task data (>9%) over best paper. This is currently best in class BC method! 2. generalization of this language to represent new tasks in 5-shot transfer to longer horizon tasks! Check out the thread by Atharva Mete for more details. And check out more details at: Joint work with Atharva Mete Albert Wilcox Haotian Xue Yongxin Chen Georgia Tech School of Interactive Computing Machine Learning at Georgia Tech Robotics@GT NVIDIA Robotics

Animesh Garg

26,218 次观看 • 1 年前

Goal-conditioned RL (GCRL) is great - unsupervised, can use data (in offline mode), flexibility to define tasks at test time. But can we run GCRL on *language data*?? In our new work we show that language GCRL enables sophisticated test-time reasoning for interactive tasks! 🧵👇

Goal-conditioned RL (GCRL) is great - unsupervised, can use data (in offline mode), flexibility to define tasks at test time. But can we run GCRL on language data?? In our new work we show that language GCRL enables sophisticated test-time reasoning for interactive tasks! 🧵👇

Sergey Levine

18,782 次观看 • 1 年前

(1/N) Will this be the BERT/GPT moment for 3D vision？ Finally, unsupervised pre-training for 3D works. Led by Qitao Zhao , we present E-RayZer — a fully self-supervised 3D reconstruction model that: 🔥Matches or surpasses supervised methods like VGGT 👀Learns transferable 3D representations, outperforming CroCo, VideoMAE, and DINO 📈Scales with more unlabeled data A new recipe for scalable 3D foundation models.

(1/N) Will this be the BERT/GPT moment for 3D vision？ Finally, unsupervised pre-training for 3D works. Led by Qitao Zhao , we present E-RayZer — a fully self-supervised 3D reconstruction model that: 🔥Matches or surpasses supervised methods like VGGT 👀Learns transferable 3D representations, outperforming CroCo, VideoMAE, and DINO 📈Scales with more unlabeled data A new recipe for scalable 3D foundation models.

Hanwen Jiang

58,093 次观看 • 7 个月前

Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! The recipe to achieve this is incredibly simple. 🧵 1/N

Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! The recipe to achieve this is incredibly simple. 🧵 1/N

Qiyang (Colin) Li

48,231 次观看 • 1 年前

Imma be honest. Kinda amazed we found a billboard company willing to do this. Chip in if can-if we raise a bit more we can run tomorrow as well.

Imma be honest. Kinda amazed we found a billboard company willing to do this. Chip in if can-if we raise a bit more we can run tomorrow as well.

Claude Taylor

63,940 次观看 • 6 个月前

If you want a robot to do something well, you need to know how to talk to it. If you don't, you can learn, with Semantic Action RL! In our paper, Jagdeep Bhatia @ RSS 2026, Andrew Wagenmaker, Will Chen show how RL over VLA prompts enables new tasks and learns blazing fast in the real world!

If you want a robot to do something well, you need to know how to talk to it. If you don't, you can learn, with Semantic Action RL! In our paper, Jagdeep Bhatia @ RSS 2026, Andrew Wagenmaker, Will Chen show how RL over VLA prompts enables new tasks and learns blazing fast in the real world!

Sergey Levine

29,139 次观看 • 23 天前

"I guess in times of adversity you build a bit of resilience. As tough as this is, I'm really optimistic what we can build post this." It looks like it will be a fifth season ending injury for the Bombers 💔

"I guess in times of adversity you build a bit of resilience. As tough as this is, I'm really optimistic what we can build post this." It looks like it will be a fifth season ending injury for the Bombers 💔

7AFL

97,434 次观看 • 1 年前

"'It's a naive model, doesn't take into account much about what real neurons are like. Nevertheless, it is a model, and we can ask what we can do with it.' Turns out you can do quite a bit." MIT 6.034 Artificial Intelligence, Fall 2010 Instructor: Patrick Winston

"'It's a naive model, doesn't take into account much about what real neurons are like. Nevertheless, it is a model, and we can ask what we can do with it.' Turns out you can do quite a bit." MIT 6.034 Artificial Intelligence, Fall 2010 Instructor: Patrick Winston

tetsuo

29,774 次观看 • 2 个月前

Excited to finally share Generative Value Learning (GVL), my Google DeepMind project on extracting universal value functions from long-context VLMs via in-context learning! We discovered a simple method to generate zero-shot and few-shot values for 300+ robot tasks and 50+ datasets using SOTA VLMs like Gemini (Try out the demo on our website on your robot video today!) I worked a lot on leveraging foundation models as guidance for robots in my PhD, and to me, this result forges a new frontier in how we can use foundation models for robot learning, given its broad applicability independent of embodiment and task types. Quite excited about how we can build on this work as a community!

Excited to finally share Generative Value Learning (GVL), my Google DeepMind project on extracting universal value functions from long-context VLMs via in-context learning! We discovered a simple method to generate zero-shot and few-shot values for 300+ robot tasks and 50+ datasets using SOTA VLMs like Gemini (Try out the demo on our website on your robot video today!) I worked a lot on leveraging foundation models as guidance for robots in my PhD, and to me, this result forges a new frontier in how we can use foundation models for robot learning, given its broad applicability independent of embodiment and task types. Quite excited about how we can build on this work as a community!

Jason Ma

98,090 次观看 • 1 年前

🐧 how about something like “let’s be together forever”? 🐰 i’d like for us to have a chant that we can use for a long time 🐿️ if we’re going to use “let’s be together”, i’d prefer using our group name and going “let’s be together tomorrow”, rather than ‘forever’ 🦊 something like “let’s be together tomorrow as well” 🐰 “let’s be together tomorrow as well” i like that 🦊 if for example, we use it one day and go “let’s be together tomorrow as well!” then that means we’ll be together again tomorrow 🐰 of course 🦊 then tomorrow, we use “let’s be together tomorrow as well” again, that means we’ll be together the day after that as well 🦊 and then!! when we say “let’s be together tomorrow as well” on the next day, it means that we’ll be together again on the day after that 🐰 you’re right 🦊 then we can use it forever 🦊 let’s go with this one! 🐰 it’s good 👏👏👏 🐰 what do you guys think? i like it 🐿️ i like it too 🐰 tomorrow x together! 🐰🦊🧸🐿️🐧 let’s be together 🦊 tomorrow as well!

🐧 how about something like “let’s be together forever”? 🐰 i’d like for us to have a chant that we can use for a long time 🐿️ if we’re going to use “let’s be together”, i’d prefer using our group name and going “let’s be together tomorrow”, rather than ‘forever’ 🦊 something like “let’s be together tomorrow as well” 🐰 “let’s be together tomorrow as well” i like that 🦊 if for example, we use it one day and go “let’s be together tomorrow as well!” then that means we’ll be together again tomorrow 🐰 of course 🦊 then tomorrow, we use “let’s be together tomorrow as well” again, that means we’ll be together the day after that as well 🦊 and then!! when we say “let’s be together tomorrow as well” on the next day, it means that we’ll be together again on the day after that 🐰 you’re right 🦊 then we can use it forever 🦊 let’s go with this one! 🐰 it’s good 👏👏👏 🐰 what do you guys think? i like it 🐿️ i like it too 🐰 tomorrow x together! 🐰🦊🧸🐿️🐧 let’s be together 🦊 tomorrow as well!

💬

115,256 次观看 • 1 年前

The network for machine intelligence Two years ago, we laid out our vision for a machine learning compute protocol. One that connects every device in the world into an open network for machine intelligence, with no gatekeepers or artificial boundaries. This week, we’ll be sharing some of our early progress, beginning with RL Swarm, a peer-to-peer system for collaborative reinforcement learning over the internet. Next month, we’ll open our Testnet, allowing anyone to contribute to the frontier of open machine intelligence. Introducing RL Swarm RL Swarm is a fully open source system for collaborative reinforcement learning over the internet. It is a live demo of our research findings, which show that models training with RL learn faster when they train as a collective swarm than they do on their own. Join our swarm now to see this in practice. You can participate with consumer hardware at home or a powerful GPU in the cloud. You can follow along with the swarm’s progress by following the links below.

The network for machine intelligence Two years ago, we laid out our vision for a machine learning compute protocol. One that connects every device in the world into an open network for machine intelligence, with no gatekeepers or artificial boundaries. This week, we’ll be sharing some of our early progress, beginning with RL Swarm, a peer-to-peer system for collaborative reinforcement learning over the internet. Next month, we’ll open our Testnet, allowing anyone to contribute to the frontier of open machine intelligence. Introducing RL Swarm RL Swarm is a fully open source system for collaborative reinforcement learning over the internet. It is a live demo of our research findings, which show that models training with RL learn faster when they train as a collective swarm than they do on their own. Join our swarm now to see this in practice. You can participate with consumer hardware at home or a powerful GPU in the cloud. You can follow along with the swarm’s progress by following the links below.

gensyn

228,890 次观看 • 1 年前

Well well well it looks like our instincts were correct, for the massive number of us that sniffed this out I think we can take a well deserved bow.

Well well well it looks like our instincts were correct, for the massive number of us that sniffed this out I think we can take a well deserved bow.

𝓜𝓐𝓖𝓐 𝕏 𝓣𝓘𝓜𝓔𝓢 𝓓𝓐𝓘𝓛𝓨 𝓝𝓔𝓦𝓢🇺🇸

39,175 次观看 • 5 个月前