Loading video...

Video Failed to Load

Go Home

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

102,351 views • 7 months ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

Face card thread?? Lms yall face 🙄
0:15

Sensitive content

Face card thread?? Lms yall face 🙄

Ari🏳️‍🌈

475,417 views • 2 years ago