Loading video...
Video Failed to Load
🏗️ Policy Adaptation from Foundation Model Feedback #CVPR2023 Instead of using foundation model as a pre-trained encoder (generator), we use it as a Teacher (discriminator) to tell where our policy did wrong and helps it adapts to new envs and tasks.
24,396 views • 3 years ago •via X (Twitter)
5 Comments

Work was led by Yuying @tttoaster_ when she was interning in our lab at UCSD, collaborating with @anna_macalus on working with the robots. Please check arxiv: Full video introduction:

Nice work! Using VLMs unlocks automated feedback, which is often quite expensive to produce by humans For another style of this same method (language feedback from CLIP) but in the offline setting, check out our work DIAL:

Thanks for sharing. Yes, definitely quite relevant! Very interesting idea on applying to offline setting. I think the shared core idea is we are all using VLM to provide some sort of reward signals.

Nice, looks very relevant to Reincarnating RL:

Thank you for making the connection. Yes, I think we have a bit of flavor on progressively learning. The focus has been on VLM for supervision but extending more on the progressive learning direction can actually be quite an interesting ... hmm

