正在加载视频...
视频加载失败
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
0 条评论
暂无评论
原始帖子的评论将显示在这里
相关视频
0:15
Sensitive content
Face card thread?? Lms yall face 🙄
Ari🏳️🌈
475,388 次观看 • 2 年前

