
Rob Haisfield
@RobertHaisfield • 9,608 subscribers
cofounder of @websim_ai, imagining new internets with our users. GenAI, TfT, BeSci, HCI, UX. Ex-Tana, Edge & Node, Spark Wave
Shorts
Videos

Are AI agents shape rotators? In this new benchmark, we let the models play campaign puzzles in Opus Magnum, a puzzle game by Zachtronics. Ironically, Claude Opus 4.8 performed poorly, being beaten by GPT-5.5, Gemini 3.5 Flash, and GLM 5.2. Claude Fable 5 crushed them all.
Rob Haisfield409,448 次观看 • 2 天前

o1 became obsolete in websim the moment o3-mini-high came out. It's faster than o1, and often a more powerful coder than Claude 3.5 Sonnet. Consistently high quality outputs, and less than a 7th of o1's costs. With o3-mini-high, I was able to make a 3d falling sand sim on the surface of the globe. Sandglobe is a reasonably performant, works on desktop and mobile. It features multiple elements that have interactions with each other, like lava and sand becoming glass, and water flowing to fill space.
Rob Haisfield24,444 次观看 • 1 年前
没有更多内容可加载