John Yang's banner
John Yang's profile picture

John Yang

@jyangballin5,817 subscribers

CS PhD @Stanford. Created @SWEbench (multi-lingual/modal); SWE-agent; SWE-smith; InterCode; CodeClash; ProgramBench 🆕

Shorts

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

102,240 просмотров