Loading video...
Video Failed to Load
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
102,351 views • 7 months ago •via X (Twitter)
0 Comments
No comments available
Comments from the original post will appear here

