Loading video...
Video Failed to Load
🚨Current scalable RL algos train a policy w/o value func, which is limiting with learning in open-ended, non-stationary, dynamic environments. But, how to scale value-based RL with more data/compute is unclear... Not anymore: presenting scaling laws for value-based RL 🧵⬇️
37,301 views • 1 year ago •via X (Twitter)
0 Comments
No comments available
Comments from the original post will appear here
