adarsh's banner

adarsh

@adarsh_exe • 7,236 subscribers

founder / co-ceo @mercor_ai, prev @harvard

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Traditional coding benchmarks do not reflect how software is actually built and maintained. That's why we built a new benchmark, APEX-SWE, in partnership with Cognition. It measures whether AI models can perform complex, real-world software engineering work to ship systems that work and debug them when they don't. OpenAI GPT 5.3 Codex (High) tops the leaderboard at 41.5% on Pass@1.

Traditional coding benchmarks do not reflect how software is actually built and maintained. That's why we built a new benchmark, APEX-SWE, in partnership with Cognition. It measures whether AI models can perform complex, real-world software engineering work to ship systems that work and debug them when they don't. OpenAI GPT 5.3 Codex (High) tops the leaderboard at 41.5% on Pass@1.

212,522 次观看 • 3 个月前

没有更多内容可加载