正在加载视频...
视频加载失败
We developed a simple, sample-efficient online RL technique for post-training image generation models. We see it as a possible steerable alternative to CFG, driven by any scalar reward, including human preference.
63,395 次观看 • 1 个月前 •via X (Twitter)
0 条评论
暂无评论
原始帖子的评论将显示在这里
