正在加载视频...

视频加载失败

Behavioral Foundation Models (BFMs) trained with RL are secretly more powerful than we think. BFM’s directly output a policy believed to be near-optimal given any reward function. Our new work shows that they can actually do much better:

44,182 次观看 • 1 年前 •via X (Twitter)

8 条评论

Harshit Sikchi 的头像
Harshit Sikchi1 年前

2. BFMs learn generalizable representations that allow an embodied agent to act near-optimally for any reward fn by providing a mapping from reward to the corresponding near-optimal policy. They do this by using unsupervised pretraining algorithms like: PSM, FB, HILP, etc

Harshit Sikchi 的头像
Harshit Sikchi1 年前

3. However, unlike language and vision, RL has mostly operated in a tabula rasa fashion. We don’t have RL pretraining methods that can be fine-tuned rapidly for any task. Most RL methods unlearn when we start finetuning,an issue attributed often to miscalibration of value function.

Harshit Sikchi 的头像
Harshit Sikchi1 年前

4. Our first finding is striking: In the space of learned behaviors, the unsupervised RL pretraining based on successor features discovers behaviors that are much better than the ones that are output zero-shot. Below is a thorough evaluation on a number of environments and tasks:

Harshit Sikchi 的头像
Harshit Sikchi1 年前

5. Based on these findings, we present ways to rapidly fine-tune the zero-shot policy output by BFMs to improve performance on any downstream tasks. The algorithms are general, simple, task agnostic, and performant. The key idea is: Search in the latent space of behaviors.

Harshit Sikchi 的头像
Harshit Sikchi1 年前

6. Our proposed algorithms can adapt in 10's of episodes to achieve much better behaviors. Here's an example of search in the latent space of behaviors below to show how the policy evolves during adaptation.

Harshit Sikchi 的头像
Harshit Sikchi1 年前

7. This was work done during my internship at FAIR with wonderful collaborators A.Tirinzoni, A.Touati, @YingchenX , A. Kanervisto, @scottniekum,@yayitsamyzhang spearheaded by @alelazaric and @teopir. Paper (at RLC 2025) :

Taylor W. Killian 的头像
Taylor W. Killian1 年前

This is great looking work Harshit! Congratulations! I'm looking forward to catching up @RL_Conference later this summer!

Harshit Sikchi 的头像
Harshit Sikchi1 年前

@RL_Conference Looking forward to chat!

相关视频