正在加载视频...
视频加载失败
Traditional coding benchmarks do not reflect how software is actually built and maintained. That's why we built a new benchmark, APEX-SWE, in partnership with Cognition. It measures whether AI models can perform complex, real-world software engineering work to ship systems that work and debug them when they don't. OpenAI... show more
0 条评论
暂无评论
原始帖子的评论将显示在这里

