Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

If you have a policy that uses diffusion/flow (e.g. diffusion VLA), you can run RL where the actor chooses the noise, which is then denoised by the policy to produce an action. This method, which we call diffusion steering (DSRL), leads to a remarkably efficient RL method! 🧵👇

152,824 Aufrufe • vor 1 Jahr •via X (Twitter)

9 Kommentare

Profilbild von Sergey Levine
Sergey Levinevor 1 Jahr

DSRL trains an actor and Q-function, treating the diffusion noise as the action space. Because samples from the noise prior map to reasonable actions for the policy, DSRL essentially explores "inside" the set of reasonable pre-trained behaviors, making it extremely efficient.

Profilbild von Sergey Levine
Sergey Levinevor 1 Jahr

DSRL learns essentially in real time, with good results in as little as 50 trials (it's so efficient that a person can literally sit in front of the robot and push a button to assign sparse rewards).

Profilbild von Sergey Levine
Sergey Levinevor 1 Jahr

This was a really fun collaboration led by @ajwagenmaker Project website with paper: To find out more, check out his thread here:

Profilbild von ahad
ahadvor 1 Jahr

would a supervised learning version of this work? where the noise distribution is a parmeter that is also optimized along with policy weights

Profilbild von ahad
ahadvor 1 Jahr

how long would it take to get that first sparse reward with this method?

Profilbild von Himanshu Kumar
Himanshu Kumarvor 1 Jahr

Controlling the noise instead of the action itself is a surprisingly effective approach.

Profilbild von Andres Franco
Andres Francovor 1 Jahr

This is pretty amazing, and the visualization made everything so easy to understand😅

Profilbild von Ran Cheng
Ran Chengvor 1 Jahr

Will making the initial noise distribution a learnable parameter reduce randomness and thus make the model more prone to overfitting?

Profilbild von Joanne Mercado
Joanne Mercadovor 1 Jahr

😅🥹

Ähnliche Videos