
Zixuan Li
@ZixuanLi_ • 13,989 subscribers
Lead https://t.co/KepyJGAtVs @Zai_org.
Videos

GLM-5.2 delivers a substantial leap in app development capabilities, which also represent demanding long-horizon tasks. Results: - GLM-5.1: 21/70 - GLM-5.2: 48/70 - Claude Fable 5: 56/70 That's more than a twofold improvement from GLM-5.1 to GLM-5.2. These come from an internal benchmark of 35 challenging mobile development tasks, each run twice for a total of 70 trials. We measured task completion, defined as core features working without major issues.
Zixuan Li115,274 просмотров • 19 часов назад
Больше нет контента для загрузки