Loading video...

Video Failed to Load

Go Home

🚨Current scalable RL algos train a policy w/o value func, which is limiting with learning in open-ended, non-stationary, dynamic environments. But, how to scale value-based RL with more data/compute is unclear... Not anymore: presenting scaling laws for value-based RL 🧵⬇️

37,301 views • 1 year ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos