Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

When using GPT-5.5, it is instantly noticeable how much more powerful it is. In Codex, I gave it a very complex prompt to create London Toy Railway with landmarks and seasons - it did an excellent job in one shot. In the second half of the video you see... GPT-5.4 - it was also not bad, but very clearly worse. GPT-5.5's generation is far more ambitious, coherent and with fewer errors. This is obviously a toy example, but I've used it on much more complex real tasks, including a complex app migration and a new hard workflow - it has been working away for many hours without getting stumped. I'm getting more and more addicted to this stuff with every model release.show more

Peter Gostev

11,994 subscribers

262,105 просмотров • 1 месяц назад •via X (Twitter)

Искусство Образование Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

Hello GPT-5.5! Our best model yet is here. It’s smarter, faster, and gets more done. I had the chance to capture first reactions from developers building with it: it writes better code, understands more complex tasks, and keeps going. Can’t wait to see what you build with it!

Hello GPT-5.5! Our best model yet is here. It’s smarter, faster, and gets more done. I had the chance to capture first reactions from developers building with it: it writes better code, understands more complex tasks, and keeps going. Can’t wait to see what you build with it!

Romain Huet

16,218 просмотров • 1 месяц назад

GPT-5.6 vs GPT-5.5 on my custom spaceship prompt. I gave both models the exact same custom prompt. This is also the same prompt I previously gave to Fable 5. For context, GPT-5.6 Pro worked for 87 minutes, while GPT-5.5 Extra High worked for 34 minutes and 42 seconds. As I’ve said before, based on great authority GPT-5.6 will be an incremental/soldi improvement over GPT-5.5, not a “Fable killer.” My rough expectation has been that it would trade blows with Fable 5 on some benchmarks, maybe win around half depending on the category, but not clearly surpass it overall. And again fable five will have bigger model smell, but this was expected. After testing this coding output, that view feels pretty accurate. GPT-5.6 is clearly better than GPT-5.5 in several visual areas. The lighting, shading, chairs, object details, and exterior of the spaceship looked noticeably stronger. The scene was also easier to test. I do want to give GPT-5.5 credit though. It built out the rooms much much better and the planets looked better than GPT-5.6’s. It was also interesting that both GPT-5.5 and GPT-5.6 produced better-looking planets than Fable 5 in this specific test. The downside with GPT-5.5 was stability. The game was much glitchier and harder to test compared to GPT-5.6. But when it comes to the core of the demo, which is the spaceship itself, Fable 5 still beat both models pretty comfortably. GPT-5.6 is impressive, but from this test, it looks exactly like what I expected which was a meaningful incremental improvement over GPT-5.5, at least for indie game demos, but not something that replaces Fable 5. In collaboration with Chetaslua

GPT-5.6 vs GPT-5.5 on my custom spaceship prompt. I gave both models the exact same custom prompt. This is also the same prompt I previously gave to Fable 5. For context, GPT-5.6 Pro worked for 87 minutes, while GPT-5.5 Extra High worked for 34 minutes and 42 seconds. As I’ve said before, based on great authority GPT-5.6 will be an incremental/soldi improvement over GPT-5.5, not a “Fable killer.” My rough expectation has been that it would trade blows with Fable 5 on some benchmarks, maybe win around half depending on the category, but not clearly surpass it overall. And again fable five will have bigger model smell, but this was expected. After testing this coding output, that view feels pretty accurate. GPT-5.6 is clearly better than GPT-5.5 in several visual areas. The lighting, shading, chairs, object details, and exterior of the spaceship looked noticeably stronger. The scene was also easier to test. I do want to give GPT-5.5 credit though. It built out the rooms much much better and the planets looked better than GPT-5.6’s. It was also interesting that both GPT-5.5 and GPT-5.6 produced better-looking planets than Fable 5 in this specific test. The downside with GPT-5.5 was stability. The game was much glitchier and harder to test compared to GPT-5.6. But when it comes to the core of the demo, which is the spaceship itself, Fable 5 still beat both models pretty comfortably. GPT-5.6 is impressive, but from this test, it looks exactly like what I expected which was a meaningful incremental improvement over GPT-5.5, at least for indie game demos, but not something that replaces Fable 5. In collaboration with Chetaslua

Chris

170,370 просмотров • 1 день назад

.The Information Reporter Stephanie Palazzolo on OpenAI's GPT-5—a major step up for coding and what it means for competitors. "GPT-5 is a step up on a lot of different domains...one area that really stood out to [sources] was with coding." "Not only is GPT-5 better on more academic...tasks, but it's also better on the more practical programming tasks...working with very large and complex code bases." "If GPT-5 is going to be significantly better at these more practical everyday programming tasks, that could prove to be bad news for Anthropic." Watch the full episode on

.The Information Reporter Stephanie Palazzolo on OpenAI's GPT-5—a major step up for coding and what it means for competitors. "GPT-5 is a step up on a lot of different domains...one area that really stood out to [sources] was with coding." "Not only is GPT-5 better on more academic...tasks, but it's also better on the more practical programming tasks...working with very large and complex code bases." "If GPT-5 is going to be significantly better at these more practical everyday programming tasks, that could prove to be bad news for Anthropic." Watch the full episode on

The Information

52,067 просмотров • 10 месяцев назад

This is actually cool - I tried the same prompt for the new Interactive Playwright skill in Codex & GPT-5.4 xHigh - the one above is with the skill and the one below is without. What the skill does is uses the computer use capability of GPT-5.4 to look and navigate the UI. This never worked for me before, but with GPT-5.4 this is the first time I can actually see a massive difference. You can see how the first scene is much more coherent, higher fidelity and complete. The one below is missing a lot of elements and isn't as rich in detail. I'll keep using it for any UI work now.

This is actually cool - I tried the same prompt for the new Interactive Playwright skill in Codex & GPT-5.4 xHigh - the one above is with the skill and the one below is without. What the skill does is uses the computer use capability of GPT-5.4 to look and navigate the UI. This never worked for me before, but with GPT-5.4 this is the first time I can actually see a massive difference. You can see how the first scene is much more coherent, higher fidelity and complete. The one below is missing a lot of elements and isn't as rich in detail. I'll keep using it for any UI work now.

Peter Gostev

256,979 просмотров • 3 месяцев назад

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

OpenAI

13,125,279 просмотров • 1 месяц назад

$this is now a much more accurate representation of the space and it still holds that GPT-5.4 is a bit more explorative it's now based on kamada–kawai layout instead of the default spring layout, and there are much more nodes and edges than before (still need to optimize it tho) but it's not close to all nodes or edges in the neighborhood, still only a tiny fraction the fuzzball is very dense$

this is now a much more accurate representation of the space and it still holds that GPT-5.4 is a bit more explorative it's now based on kamada–kawai layout instead of the default spring layout, and there are much more nodes and edges than before (still need to optimize it tho) but it's not close to all nodes or edges in the neighborhood, still only a tiny fraction the fuzzball is very dense

Lisan al Gaib

19,040 просмотров • 3 месяцев назад

Databricks is excited to partner with OpenAI on GPT-5.5, their latest frontier model. GPT-5.5 will be available in Unity AI Gateway on launch. You can use it with coding tools such as Codex, or to power your enterprise agents. GPT-5.5 is state-of-the-art on many benchmarks including OfficeQA Pro, our benchmark for evaluating grounded reasoning on enterprise tasks. We are partnering with OpenAI to co-launch on Databricks. Hear more from our co-founder Patrick Wendell and OpenAI CRO Denise Holland Dresser on GPT-5.5 in Databricks.

Databricks is excited to partner with OpenAI on GPT-5.5, their latest frontier model. GPT-5.5 will be available in Unity AI Gateway on launch. You can use it with coding tools such as Codex, or to power your enterprise agents. GPT-5.5 is state-of-the-art on many benchmarks including OfficeQA Pro, our benchmark for evaluating grounded reasoning on enterprise tasks. We are partnering with OpenAI to co-launch on Databricks. Hear more from our co-founder Patrick Wendell and OpenAI CRO Denise Holland Dresser on GPT-5.5 in Databricks.

Databricks

12,668 просмотров • 1 месяц назад

$I just compared Claude Code vs Codex vs Cursor CLI The task was to build a Next.js app with Tailwind 4 and shadcn components to collect customer feedback and showcase it with a widget. I gave all three the same prompt and let them go for 30 minutes to see what they came up with. Claude Code with Opus 4.1 Even though I told it to set up the app in the existing project folder, it tried to create a directory for it. After I interrupted and told it not to do that, it built a demo form and landing page with no errors. I had to ask it to make the demo interactive so users could submit a testimonial and preview it. The landing page looked like AI and was pretty basic, but it worked and it was done in a fraction of the time of the others. Total tokens used: 33k Codex with GPT-5 At the end of the 30 minutes I just could not get Codex to produce a working app. It got stuck in a loop of not being able to set up Tailwind 4 and despite many, MANY, attempts, I ended up with a "failed to compile" error. Total tokens used: 102k Cursor Agent with GPT-5 This was the slowest agent by far and a couple of times I actually thought it got stuck in a loop and was close to Ctrl+C'ing to cancel it. The TUI is really nice though, especially how it shows diffs and it did eventually build a working app (after one or two slight errors that needed fixing) The demo was interactive and it had a very minimal design that looked bare but also a lot less like an "AI generated" app than the Opus 4.1 design. It also wasn't too chatty and just did what it needed to do! Code quality was on a par with Opus 4.1, but it did use 5.5x as many tokens to get there. Still cheaper than Opus on a direct comparison but not when you factor in a Claude Code Max subscription. Total tokens: 188k I'll be able to do a proper comparison and record some videos when I'm back from holiday but for now, Opus is still the more capable model out of the box and Claude Code is the more complete CLI product. It will be interesting to see how Cursor evolve their CLI though with commands and subagents because I think with GPT-5 they have a real shot at providing competition for Claude Code if they can optimise output to get similar quality with less tokens. Jump to 0:40 in the video to see the two apps. Which do you think is which? ;)$

I just compared Claude Code vs Codex vs Cursor CLI The task was to build a Next.js app with Tailwind 4 and shadcn components to collect customer feedback and showcase it with a widget. I gave all three the same prompt and let them go for 30 minutes to see what they came up with. Claude Code with Opus 4.1 Even though I told it to set up the app in the existing project folder, it tried to create a directory for it. After I interrupted and told it not to do that, it built a demo form and landing page with no errors. I had to ask it to make the demo interactive so users could submit a testimonial and preview it. The landing page looked like AI and was pretty basic, but it worked and it was done in a fraction of the time of the others. Total tokens used: 33k Codex with GPT-5 At the end of the 30 minutes I just could not get Codex to produce a working app. It got stuck in a loop of not being able to set up Tailwind 4 and despite many, MANY, attempts, I ended up with a "failed to compile" error. Total tokens used: 102k Cursor Agent with GPT-5 This was the slowest agent by far and a couple of times I actually thought it got stuck in a loop and was close to Ctrl+C'ing to cancel it. The TUI is really nice though, especially how it shows diffs and it did eventually build a working app (after one or two slight errors that needed fixing) The demo was interactive and it had a very minimal design that looked bare but also a lot less like an "AI generated" app than the Opus 4.1 design. It also wasn't too chatty and just did what it needed to do! Code quality was on a par with Opus 4.1, but it did use 5.5x as many tokens to get there. Still cheaper than Opus on a direct comparison but not when you factor in a Claude Code Max subscription. Total tokens: 188k I'll be able to do a proper comparison and record some videos when I'm back from holiday but for now, Opus is still the more capable model out of the box and Claude Code is the more complete CLI product. It will be interesting to see how Cursor evolve their CLI though with commands and subagents because I think with GPT-5 they have a real shot at providing competition for Claude Code if they can optimise output to get similar quality with less tokens. Jump to 0:40 in the video to see the two apps. Which do you think is which? ;)

Ian Nuttall

194,931 просмотров • 10 месяцев назад

Had early access to GPT-5.4 and Pro. They are very good. One fun illustration of progress, this is the same prompt I used in GPT-4 below (making a 3D space inspired by Piranesi) now in GPT-5.4 Pro. There were no errors, made in a single prompt plus one to "make it better."

Had early access to GPT-5.4 and Pro. They are very good. One fun illustration of progress, this is the same prompt I used in GPT-4 below (making a 3D space inspired by Piranesi) now in GPT-5.4 Pro. There were no errors, made in a single prompt plus one to "make it better."

Ethan Mollick

188,306 просмотров • 3 месяцев назад

Codex is unusable. I generated one image with GPT Image 2.0, gave it to Codex, and used one simple prompt: “Create me a web game from this image.” You give it a simple image like this, and suddenly you realize you can build: • a voxel map • an editor • tile animations • a playable browser prototype This is getting ridiculous.

Codex is unusable. I generated one image with GPT Image 2.0, gave it to Codex, and used one simple prompt: “Create me a web game from this image.” You give it a simple image like this, and suddenly you realize you can build: • a voxel map • an editor • tile animations • a playable browser prototype This is getting ridiculous.

Givros

83,052 просмотров • 1 месяц назад

It works!! This is fully designed by GPT-5.5 and GPT-5.5-Pro in ForgeCAD!

It works!! This is fully designed by GPT-5.5 and GPT-5.5-Pro in ForgeCAD!

Ruben Kostandyan

35,378 просмотров • 29 дней назад

The value produced by models is getting so much better so fast that old hardware is actually getting *more* expensive to rent. 3 years ago, the best model you could run on a H100 chip was GPT-4. Now, you can run GPT-5.4 on it, which is smaller and cheaper to run while producing much more valuable tokens. w. Dylan Patel

The value produced by models is getting so much better so fast that old hardware is actually getting more expensive to rent. 3 years ago, the best model you could run on a H100 chip was GPT-4. Now, you can run GPT-5.4 on it, which is smaller and cheaper to run while producing much more valuable tokens. w. Dylan Patel

Dwarkesh Patel

77,908 просмотров • 3 месяцев назад

Did my (very not scientific) F-Zero test for Anthropic's new Fable model. It didn't quite one shot below, I had to give a few more prompts like "give it more of a sense of speed." But still impressive! Quote tweeting what GPT 5.5 made from 2 months ago.

Did my (very not scientific) F-Zero test for Anthropic's new Fable model. It didn't quite one shot below, I had to give a few more prompts like "give it more of a sense of speed." But still impressive! Quote tweeting what GPT 5.5 made from 2 months ago.

Peter Yang

271,428 просмотров • 11 дней назад

Claude Code with Sonnet 4.5 is actually incredible I gave it a prompt for a super complex app, and it one shot the entire thing In this video I walk you through how to use Claude Code to build a prompt library app you can start using immediately (no coding experience required):

Claude Code with Sonnet 4.5 is actually incredible I gave it a prompt for a super complex app, and it one shot the entire thing In this video I walk you through how to use Claude Code to build a prompt library app you can start using immediately (no coding experience required):

Alex Finn

40,955 просмотров • 8 месяцев назад

Composer 2.5 gets the BridgeMind stamp of approval. It doesn't have the design quality of GPT 5.5 or Opus 4.7. But for fast iteration and backend bug fixes, it delivers. Elon reposted it this morning. SpaceX is likely acquiring Cursor. This model is about to get a lot more attention. Full review and real vibe coding workflow test below.

Composer 2.5 gets the BridgeMind stamp of approval. It doesn't have the design quality of GPT 5.5 or Opus 4.7. But for fast iteration and backend bug fixes, it delivers. Elon reposted it this morning. SpaceX is likely acquiring Cursor. This model is about to get a lot more attention. Full review and real vibe coding workflow test below.

BridgeMind

12,671 просмотров • 1 месяц назад

I claimed to be a Tesla employee and asked GPT-5.5-Pro to build the next version of Tesla Optimus Fingers in ForgeCAD. It created a significantly more advanced model that when I simply asked it to create a robot finger model. An experiment thread.

I claimed to be a Tesla employee and asked GPT-5.5-Pro to build the next version of Tesla Optimus Fingers in ForgeCAD. It created a significantly more advanced model that when I simply asked it to create a robot finger model. An experiment thread.

Ruben Kostandyan

195,734 просмотров • 1 месяц назад

BREAKING: GPT-5.5 "Spud" is out and it is a BEAST We've been testing it Every 📧 for the last 3 weeks on everything from coding, to writing, to knowledge work. Here's our day 0 vibe check: - It's a step change in coding AND it's easy to talk to. It's fast and friendly and quickly became my daily driver. But it's also a coding powerhouse—a really rare combination. - It scored 62/100 on our Senior Engineer benchmark. Opus 4.7 scored only a 33/100. (But GPT-5.5 performed best when using an Opus 4.7 plan). Naveen Naidu used over 900 million tokens during testing—and it let him ship production features for Monologue at both high speed and quality. - It has serious conceptual clarity. It can hold a complex plan in its head over hours of work, without getting distracted by existing code. This makes it the first model that we've tested that can perform well on complex refactors requiring deleting and reimagining an substantial existing codebase. - It's a very good writer. This is the first OpenAI model in about a year that got our writers Every 📧 to switch away from Claude. 5.5 has Katie Parrott's seal of approval—not an easy task. Its writing feels more organic and it's better at mimicking a writing style without going overboard. - It's great for agentic knowledge-work. This is the first OpenAI model that manages to be both a stellar senior engineer AND that can be used for everything from spreadsheets to research. It's crazy fast, and it's amazing inside of the Codex desktop app, and got much of our team to switch away from Claude Code and Cowork during the testing period. However, it's not a perfect model. - 5.5 still loses to Opus 4.7 on plan quality. It's plans are extremely readable but Opus has better attention to detail and sharper insight. - 5.5 still loses to Opus 4.7 by a bit on front-end and full-stack product work. Kieran Klaassen found that it wasn't quite as good when full-stack thinking and design are involved. And it's not great writing Ruby. - 5.5 is a great vibe coder but if you're vibe coding without a plan it's worse than Opus. Mike Taylor found that Opus is better at reading in between the lines on underspecified vibe-coding tasks. Overall GPT-5.5 is a massive achievement from OpenAI and it deserves a serious look as your daily driver. Read our full vibe check on Every 📧 here:

BREAKING: GPT-5.5 "Spud" is out and it is a BEAST We've been testing it Every 📧 for the last 3 weeks on everything from coding, to writing, to knowledge work. Here's our day 0 vibe check: - It's a step change in coding AND it's easy to talk to. It's fast and friendly and quickly became my daily driver. But it's also a coding powerhouse—a really rare combination. - It scored 62/100 on our Senior Engineer benchmark. Opus 4.7 scored only a 33/100. (But GPT-5.5 performed best when using an Opus 4.7 plan). Naveen Naidu used over 900 million tokens during testing—and it let him ship production features for Monologue at both high speed and quality. - It has serious conceptual clarity. It can hold a complex plan in its head over hours of work, without getting distracted by existing code. This makes it the first model that we've tested that can perform well on complex refactors requiring deleting and reimagining an substantial existing codebase. - It's a very good writer. This is the first OpenAI model in about a year that got our writers Every 📧 to switch away from Claude. 5.5 has Katie Parrott's seal of approval—not an easy task. Its writing feels more organic and it's better at mimicking a writing style without going overboard. - It's great for agentic knowledge-work. This is the first OpenAI model that manages to be both a stellar senior engineer AND that can be used for everything from spreadsheets to research. It's crazy fast, and it's amazing inside of the Codex desktop app, and got much of our team to switch away from Claude Code and Cowork during the testing period. However, it's not a perfect model. - 5.5 still loses to Opus 4.7 on plan quality. It's plans are extremely readable but Opus has better attention to detail and sharper insight. - 5.5 still loses to Opus 4.7 by a bit on front-end and full-stack product work. Kieran Klaassen found that it wasn't quite as good when full-stack thinking and design are involved. And it's not great writing Ruby. - 5.5 is a great vibe coder but if you're vibe coding without a plan it's worse than Opus. Mike Taylor found that Opus is better at reading in between the lines on underspecified vibe-coding tasks. Overall GPT-5.5 is a massive achievement from OpenAI and it deserves a serious look as your daily driver. Read our full vibe check on Every 📧 here:

Dan Shipper 📧

130,191 просмотров • 1 месяц назад

At Perplexity, GPT-5.5 in Codex helped build an internal tool in under an hour. In Perplexity Computer workflows, GPT-5.5 used 56% fewer tokens on the same complex tasks, creating faster feedback loops for users.

At Perplexity, GPT-5.5 in Codex helped build an internal tool in under an hour. In Perplexity Computer workflows, GPT-5.5 used 56% fewer tokens on the same complex tasks, creating faster feedback loops for users.

OpenAI Developers

68,320 просмотров • 1 месяц назад

GPT-5.4 is my new default model for OpenClaw. For the last 2 months, Opus was the only model I felt could really unlock OpenClaw for serious agent tasks. GPT-5.4 is the first one that got close enough in quality to change that. In practice, it feels ~60% faster (73 tok/s vs 46), costs about half as much, and gives ~5x longer context. That combination makes OpenClaw much more snappy as a default. I tested it side by side against Opus 4.6 on deep research + slide generation, and the outputs are inspectable too. Full video below:

GPT-5.4 is my new default model for OpenClaw. For the last 2 months, Opus was the only model I felt could really unlock OpenClaw for serious agent tasks. GPT-5.4 is the first one that got close enough in quality to change that. In practice, it feels ~60% faster (73 tok/s vs 46), costs about half as much, and gives ~5x longer context. That combination makes OpenClaw much more snappy as a default. I tested it side by side against Opus 4.6 on deep research + slide generation, and the outputs are inspectable too. Full video below:

Quinn Leng

131,551 просмотров • 3 месяцев назад

Ex-OpenAI chief research officer Bob McGrew says each AI model generation uses 100x more compute and because it takes time to build new data centers it appears like nothing is happening from the outside, while inside companies the view is very different; and before we see GPT-5 we will see a 10x lift with a half-generation release

Ex-OpenAI chief research officer Bob McGrew says each AI model generation uses 100x more compute and because it takes time to build new data centers it appears like nothing is happening from the outside, while inside companies the view is very different; and before we see GPT-5 we will see a 10x lift with a half-generation release

Tsarathustra

51,730 просмотров • 1 год назад