Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Self-supervised representation learning looks a bit like RL. What if we literally use RL as a SSL method for visual representations? Turns out that it works quite well. In new work by Dibya Ghosh, we show how this can be done:

Sergey Levine

130,700 subscribers

48,715 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

5 Kommentare

Profilbild von Sergey Levine

Sergey Levinevor 1 Jahr

Imagine an MDP where the state is the current crop of the image, an action is to pick a new crop, and rewards are matching textual captions or other (weak or strong) labels. Training a value function for this MDP instantiations a representation learning method.

Profilbild von Sergey Levine

Sergey Levinevor 1 Jahr

Reward could come from matching a text label, or provided in a fully unsupervised way via crop consistency. The stronger the reward, the better it works, but even weak rewards like crop consistency lead to improvement. For more, check out the website:

Profilbild von Joanne Mercado

Joanne Mercadovor 1 Jahr

@its_dibya *an SSL, but overall your grammar and punctuation are top-tier 💯

Profilbild von Ethan vs Machines

Ethan vs Machinesvor 1 Jahr

@its_dibya RL for SSL using semantic rewards? Brilliant method. Scaling beyond COCO might be tough here though—Canada’s R&D can’t keep up with compute demands anymore.

Profilbild von ᐸGerardSans/ᐳ🚀🇬🇧

ᐸGerardSans/ᐳ🚀🇬🇧vor 1 Jahr

@its_dibya That’s just flattened patching which is something but not really.

Ähnliche Videos

Our work, "A Primer on SO(3) Action Representations in Deep Reinforcement Learning," was accepted to #ICLR2026! We provide a systematic study of action representation choices in RL, showing that they fundamentally impact training stability and performance. #Robotics #AI #RL

Our work, "A Primer on SO(3) Action Representations in Deep Reinforcement Learning," was accepted to #ICLR2026! We provide a systematic study of action representation choices in RL, showing that they fundamentally impact training stability and performance. #Robotics #AI #RL

Learning Systems and Robotics Lab (is hiring!)

49,655 Aufrufe • vor 4 Monaten

✨ICLR 2024 Spotlight Still hand-labeling phases or tuning GANs with your trajectories? We propose FLD, a self-supervised representation method that extracts spatial-temporal relationships in high-dimensional trajectories. RL policies informed by FLD show extended generality.🧵

✨ICLR 2024 Spotlight Still hand-labeling phases or tuning GANs with your trajectories? We propose FLD, a self-supervised representation method that extracts spatial-temporal relationships in high-dimensional trajectories. RL policies informed by FLD show extended generality.🧵

Chenhao Li

81,933 Aufrufe • vor 2 Jahren

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

Heni Ben Amor

51,248 Aufrufe • vor 7 Monaten

Scaling laws in deep RL? Turns out that batch size, learning rate, and UTD (update-to-data) for getting the most efficient and scalable deep RL has predictable relationships. Checkout the analysis in new work by Oleg Rybkin & collaborators:

Scaling laws in deep RL? Turns out that batch size, learning rate, and UTD (update-to-data) for getting the most efficient and scalable deep RL has predictable relationships. Checkout the analysis in new work by Oleg Rybkin & collaborators:

Sergey Levine

43,464 Aufrufe • vor 1 Jahr

So we did a bunch of projects with real world reinforcement learning - but it was often too inefficient to be practical to train tabula rasa. This suggests we need better priors, but acquiring these from on-robot data can often be expensive as well. In our recent work, we show that despite being fundamentally inaccurate, simulation can guide provide a cheap way to guide real-world RL finetuning to be super efficient! We propose Simulation-Guided Fine-Tuning (SGFT) - a simple paradigm for sim2real finetuning that uses simulation to provide reward shaping that accelerates real world RL finetuning *beyond* just providing an initialization. TLDR: Use value functions from sim to shape rewards for real-world RL, see large sample efficiency improvements 🧵(1/6)

So we did a bunch of projects with real world reinforcement learning - but it was often too inefficient to be practical to train tabula rasa. This suggests we need better priors, but acquiring these from on-robot data can often be expensive as well. In our recent work, we show that despite being fundamentally inaccurate, simulation can guide provide a cheap way to guide real-world RL finetuning to be super efficient! We propose Simulation-Guided Fine-Tuning (SGFT) - a simple paradigm for sim2real finetuning that uses simulation to provide reward shaping that accelerates real world RL finetuning beyond just providing an initialization. TLDR: Use value functions from sim to shape rewards for real-world RL, see large sample efficiency improvements 🧵(1/6)

Abhishek Gupta

13,591 Aufrufe • vor 1 Jahr

What do we really need for social navigation?🤖 Deep RL? No❌ Imitation learning? No❌ LLMs? No❌ 📊Turns out for over 80% of the time, classical methods >> learning-based solutions! In new work, we rethink social navigation. Thread🧶

What do we really need for social navigation?🤖 Deep RL? No❌ Imitation learning? No❌ LLMs? No❌ 📊Turns out for over 80% of the time, classical methods >> learning-based solutions! In new work, we rethink social navigation. Thread🧶

Rohan Chandra

14,066 Aufrufe • vor 2 Jahren

Learned visuomotor policies are notoriously fragile, they break with changes in conditions like lighting, clutter, or object variations amongst other things. In Yunchu @ CoRL2025's latest work, we asked whether we could get these policies to be robust and generalizable with a clever choice of visual representation! The argument we made was - we want a choice of visual representation that specifically adapts to be sufficient, yet minimal for the task at hand. We thought about it from the perspective of flexible, key-point based representations. The key question becomes - how do we choose a sufficient, task-specific, yet minimal set of keypoints as a representation for policy learning. Yunchu proposes a neat way of automatically selecting task-relevant keypoints using a standard supervised learning objective, and using this for robust policy learning. This is largely under the same assumptions as behavior cloning, but with huge gains on robustness. Let’s understand how, 🧵 (1/8)

Learned visuomotor policies are notoriously fragile, they break with changes in conditions like lighting, clutter, or object variations amongst other things. In Yunchu @ CoRL2025's latest work, we asked whether we could get these policies to be robust and generalizable with a clever choice of visual representation! The argument we made was - we want a choice of visual representation that specifically adapts to be sufficient, yet minimal for the task at hand. We thought about it from the perspective of flexible, key-point based representations. The key question becomes - how do we choose a sufficient, task-specific, yet minimal set of keypoints as a representation for policy learning. Yunchu proposes a neat way of automatically selecting task-relevant keypoints using a standard supervised learning objective, and using this for robust policy learning. This is largely under the same assumptions as behavior cloning, but with huge gains on robustness. Let’s understand how, 🧵 (1/8)

Abhishek Gupta

11,355 Aufrufe • vor 1 Jahr

If you have a policy that uses diffusion/flow (e.g. diffusion VLA), you can run RL where the actor chooses the noise, which is then denoised by the policy to produce an action. This method, which we call diffusion steering (DSRL), leads to a remarkably efficient RL method! 🧵👇

If you have a policy that uses diffusion/flow (e.g. diffusion VLA), you can run RL where the actor chooses the noise, which is then denoised by the policy to produce an action. This method, which we call diffusion steering (DSRL), leads to a remarkably efficient RL method! 🧵👇

Sergey Levine

152,824 Aufrufe • vor 1 Jahr

How can we get VLMs to move their eyes—and reason step-by-step in visually grounded ways? 👀 We introduce ViGoRL, a RL method that anchors reasoning to image regions. 🎯 It outperforms vanilla GRPO and SFT across grounding, spatial tasks, and visual search (86.4% on V*). 👇🧵

How can we get VLMs to move their eyes—and reason step-by-step in visually grounded ways? 👀 We introduce ViGoRL, a RL method that anchors reasoning to image regions. 🎯 It outperforms vanilla GRPO and SFT across grounding, spatial tasks, and visual search (86.4% on V*). 👇🧵

Gabriel Sarch

76,548 Aufrufe • vor 1 Jahr

Eric Jang weighs in on RL vs. supervised learning for humanoid robot manipulation tasks.

Eric Jang weighs in on RL vs. supervised learning for humanoid robot manipulation tasks.

The Humanoid Hub

10,656 Aufrufe • vor 11 Monaten

Every wondered if we can model motion as a language? can we tokenize this new language? is it useful? Turns out tremendously! 🚀 In out latest #NeurIPS2024 paper on QueST: Self-Supervised Skill Abstractions for Learning Continuous Control, we find that action tokenization matters a lot! We can learn skill encodings by representing temporal action abstractions with a discrete codebook. This enables 2 things 1. Better Behaviour Cloning: we can better assimilate multi-task data (>9%) over best paper. This is currently best in class BC method! 2. generalization of this language to represent new tasks in 5-shot transfer to longer horizon tasks! Check out the thread by Atharva Mete for more details. And check out more details at: Joint work with Atharva Mete Albert Wilcox Haotian Xue Yongxin Chen Georgia Tech School of Interactive Computing Machine Learning at Georgia Tech Robotics@GT NVIDIA Robotics

Every wondered if we can model motion as a language? can we tokenize this new language? is it useful? Turns out tremendously! 🚀 In out latest #NeurIPS2024 paper on QueST: Self-Supervised Skill Abstractions for Learning Continuous Control, we find that action tokenization matters a lot! We can learn skill encodings by representing temporal action abstractions with a discrete codebook. This enables 2 things 1. Better Behaviour Cloning: we can better assimilate multi-task data (>9%) over best paper. This is currently best in class BC method! 2. generalization of this language to represent new tasks in 5-shot transfer to longer horizon tasks! Check out the thread by Atharva Mete for more details. And check out more details at: Joint work with Atharva Mete Albert Wilcox Haotian Xue Yongxin Chen Georgia Tech School of Interactive Computing Machine Learning at Georgia Tech Robotics@GT NVIDIA Robotics

Animesh Garg

26,218 Aufrufe • vor 1 Jahr

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

Physical Intelligence

431,287 Aufrufe • vor 3 Monaten

She looks like a rl barbie in this movie imo

She looks like a rl barbie in this movie imo

scarlettxmee

19,361 Aufrufe • vor 6 Monaten

Goal-conditioned RL (GCRL) is great - unsupervised, can use data (in offline mode), flexibility to define tasks at test time. But can we run GCRL on *language data*?? In our new work we show that language GCRL enables sophisticated test-time reasoning for interactive tasks! 🧵👇

Goal-conditioned RL (GCRL) is great - unsupervised, can use data (in offline mode), flexibility to define tasks at test time. But can we run GCRL on language data?? In our new work we show that language GCRL enables sophisticated test-time reasoning for interactive tasks! 🧵👇

Sergey Levine

18,782 Aufrufe • vor 1 Jahr

(1/N) Will this be the BERT/GPT moment for 3D vision？ Finally, unsupervised pre-training for 3D works. Led by Qitao Zhao , we present E-RayZer — a fully self-supervised 3D reconstruction model that: 🔥Matches or surpasses supervised methods like VGGT 👀Learns transferable 3D representations, outperforming CroCo, VideoMAE, and DINO 📈Scales with more unlabeled data A new recipe for scalable 3D foundation models.

(1/N) Will this be the BERT/GPT moment for 3D vision？ Finally, unsupervised pre-training for 3D works. Led by Qitao Zhao , we present E-RayZer — a fully self-supervised 3D reconstruction model that: 🔥Matches or surpasses supervised methods like VGGT 👀Learns transferable 3D representations, outperforming CroCo, VideoMAE, and DINO 📈Scales with more unlabeled data A new recipe for scalable 3D foundation models.

Hanwen Jiang

58,009 Aufrufe • vor 6 Monaten

Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! The recipe to achieve this is incredibly simple. 🧵 1/N

Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! The recipe to achieve this is incredibly simple. 🧵 1/N

Qiyang (Colin) Li

48,231 Aufrufe • vor 11 Monaten

Imma be honest. Kinda amazed we found a billboard company willing to do this. Chip in if can-if we raise a bit more we can run tomorrow as well.

Imma be honest. Kinda amazed we found a billboard company willing to do this. Chip in if can-if we raise a bit more we can run tomorrow as well.

Claude Taylor

63,940 Aufrufe • vor 5 Monaten

We open-sourced QeRL — Quantization-enhanced Reinforcement Learning ! 🧠 4-bit quantized RL training 💪 Train a 32B LLM on a single H100 GPU ⚙️ 1.7× faster overall training 🎯 Accuracy on par with bfloat16-level accuracy 🔥 Supports NVFP4 quantization format Moreover, we show that quantization helps exploration in RL training. Paper: Code: #NVIDIA #AIResearch #ReinforcementLearning #Quantization #LLM #EfficientAI

We open-sourced QeRL — Quantization-enhanced Reinforcement Learning ! 🧠 4-bit quantized RL training 💪 Train a 32B LLM on a single H100 GPU ⚙️ 1.7× faster overall training 🎯 Accuracy on par with bfloat16-level accuracy 🔥 Supports NVFP4 quantization format Moreover, we show that quantization helps exploration in RL training. Paper: Code: #NVIDIA #AIResearch #ReinforcementLearning #Quantization #LLM #EfficientAI

Yukang Chen

69,747 Aufrufe • vor 8 Monaten

"I guess in times of adversity you build a bit of resilience. As tough as this is, I'm really optimistic what we can build post this." It looks like it will be a fifth season ending injury for the Bombers 💔

"I guess in times of adversity you build a bit of resilience. As tough as this is, I'm really optimistic what we can build post this." It looks like it will be a fifth season ending injury for the Bombers 💔

7AFL

97,434 Aufrufe • vor 11 Monaten

To solve AGI, we must first solve Geoguessr For that I built vlm-gym, a simple RL gym written in scratch, in JAX for Qwen3VL-4B (released yesterday) And added Geospot, a RL environment for geolocation and learned VLMs can learn how to geoguess. More:

To solve AGI, we must first solve Geoguessr For that I built vlm-gym, a simple RL gym written in scratch, in JAX for Qwen3VL-4B (released yesterday) And added Geospot, a RL environment for geolocation and learned VLMs can learn how to geoguess. More:

Surya

140,348 Aufrufe • vor 8 Monaten