正在加载视频...
视频加载失败
🚨Current scalable RL algos train a policy w/o value func, which is limiting with learning in open-ended, non-stationary, dynamic environments. But, how to scale value-based RL with more data/compute is unclear... Not anymore: presenting scaling laws for value-based RL 🧵⬇️
0 条评论
暂无评论
原始帖子的评论将显示在这里
