Loading video...

Video Failed to Load

Go Home

the thing that makes Opus 4.5 special is you can vibe code forever without it losing the plot. i vibe coded an iOS app this weekend that any frontier model could build in one shot. but the special thing about Opus is i kept adding features and fixing bugs...

85,253 views • 7 months ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

BREAKING NEWS: Anthropic just dropped Claude Ops 4.5!! It is by FAR the best coding model I've ever used. We've been testing it internally Every 📧 for the last few days, and it is an absolute paradigm shift for any kind of coding task. It extends the horizon of what you can vibe code The current generation of new models—Anthropic’s Sonnet 4.5, Google’s Gemini 3, or OpenAI’s Codex Max 5.1—can all competently build a minimum viable product in one shot, or fix a highly technical bug autonomously. But eventually, if you kept pushing them to vibe code more, they’d start to trip over their own feet: The code would be convoluted and contradictory, and you’d get stuck in endless bugs. We have not found that limit yet with Opus 4.5—it seems to be able to vibe code forever. Takes working in parallel to a whole new level because it's far better at planning and coding, it can work with more autonomy—meaning you can do more in parallel without breaking anything . Kieran Klaassen worked on 11 different projects in six hours—and had good results on all of them. Great at design iteration Opus 4.5 is incredibly skilled at iterating through a design autonomously using an MCP like Playwright. previous models would lose the thread after a few cycles, or say a design was done when it wasn't. Opus 4.5 is incredible at autonomously iterating until a design is pixel perfect. we have a full 4,000 word vibe check on Every 📧 right now with everything we tested:

Dan Shipper 📧

272,434 views • 7 months ago

I just compared Claude Code vs Codex vs Cursor CLI The task was to build a Next.js app with Tailwind 4 and shadcn components to collect customer feedback and showcase it with a widget. I gave all three the same prompt and let them go for 30 minutes to see what they came up with. Claude Code with Opus 4.1 Even though I told it to set up the app in the existing project folder, it tried to create a directory for it. After I interrupted and told it not to do that, it built a demo form and landing page with no errors. I had to ask it to make the demo interactive so users could submit a testimonial and preview it. The landing page looked like AI and was pretty basic, but it worked and it was done in a fraction of the time of the others. Total tokens used: 33k Codex with GPT-5 At the end of the 30 minutes I just could not get Codex to produce a working app. It got stuck in a loop of not being able to set up Tailwind 4 and despite many, MANY, attempts, I ended up with a "failed to compile" error. Total tokens used: 102k Cursor Agent with GPT-5 This was the slowest agent by far and a couple of times I actually thought it got stuck in a loop and was close to Ctrl+C'ing to cancel it. The TUI is really nice though, especially how it shows diffs and it did eventually build a working app (after one or two slight errors that needed fixing) The demo was interactive and it had a very minimal design that looked bare but also a lot less like an "AI generated" app than the Opus 4.1 design. It also wasn't too chatty and just did what it needed to do! Code quality was on a par with Opus 4.1, but it did use 5.5x as many tokens to get there. Still cheaper than Opus on a direct comparison but not when you factor in a Claude Code Max subscription. Total tokens: 188k I'll be able to do a proper comparison and record some videos when I'm back from holiday but for now, Opus is still the more capable model out of the box and Claude Code is the more complete CLI product. It will be interesting to see how Cursor evolve their CLI though with commands and subagents because I think with GPT-5 they have a real shot at providing competition for Claude Code if they can optimise output to get similar quality with less tokens. Jump to 0:40 in the video to see the two apps. Which do you think is which? ;)

Ian Nuttall

194,949 views • 10 months ago