Dan Shipper 📧's banner

Dan Shipper 📧

@danshipper • 119,621 subscribers

ceo @every | the only subscription you need to stay at the edge of AI

Shorts

BREAKING: Anthropic just dropped Opus 4.8—and it is a MONSTER We've been testing for about a week Every 📧 and our verdict is they could've just called it Opus 5, it's that good. Here's our vibe check: - Beats GPT-5.5 on Senior Engineer bench. On our toughest benchmark Opus 4.8 scores a 63—a hair higher than GPT-5.5's score of 62, and a full 30 points higher than Opus 4.7. It tackled a ground-up rewrite of a production codebase, and actually built something that works. HOWEVER: Coding performance varied a lot at different reasoning levels. We recommend using it on xhigh for best results. - Incredibly good writer. Opus 4.8 scored a 79.6 on our writing benchmark—measuring models on real-world writing tasks we do all of the time like essay writing, promo email writing, and more. It beats GPT-5.5 by 6 points. It produces well-written prose with fewer "AI-isms". It's also very good at writing in your voice given the right context. HOWEVER: Writing performance also varied with reasoning levels. Medium reasoning had higher incidence of AI-isms—we found best results with high. - Beast at knowledge work. Opus 4.8 is very good at general knowledge work tasks like report creation, research and more. It produced the best PowerPoint one-shot we've ever seen on our deck generation benchmark. - Emotionally intelligent, willing to question the frame. I've also found it to be quite good at talking through psychological or interpersonal issues. It has a high EQ, and it's also good at not glazing and helping to expand your perspective. Its thought process feels extremely rich and dynamic. THE BAD: These days a model is only as good as its harness, and Codex is still a far superior harness to the Claude Desktop app. This has kept me using Codex + GPT-5.5 as my daily driver, but I am flipping back and forth a lot more between Codex and Claude. Anthropic is back baby! Read the rest on Every 📧:

BREAKING: Anthropic just dropped Opus 4.8—and it is a MONSTER We've been testing for about a week Every 📧 and our verdict is they could've just called it Opus 5, it's that good. Here's our vibe check: - Beats GPT-5.5 on Senior Engineer bench. On our toughest benchmark Opus 4.8 scores a 63—a hair higher than GPT-5.5's score of 62, and a full 30 points higher than Opus 4.7. It tackled a ground-up rewrite of a production codebase, and actually built something that works. HOWEVER: Coding performance varied a lot at different reasoning levels. We recommend using it on xhigh for best results. - Incredibly good writer. Opus 4.8 scored a 79.6 on our writing benchmark—measuring models on real-world writing tasks we do all of the time like essay writing, promo email writing, and more. It beats GPT-5.5 by 6 points. It produces well-written prose with fewer "AI-isms". It's also very good at writing in your voice given the right context. HOWEVER: Writing performance also varied with reasoning levels. Medium reasoning had higher incidence of AI-isms—we found best results with high. - Beast at knowledge work. Opus 4.8 is very good at general knowledge work tasks like report creation, research and more. It produced the best PowerPoint one-shot we've ever seen on our deck generation benchmark. - Emotionally intelligent, willing to question the frame. I've also found it to be quite good at talking through psychological or interpersonal issues. It has a high EQ, and it's also good at not glazing and helping to expand your perspective. Its thought process feels extremely rich and dynamic. THE BAD: These days a model is only as good as its harness, and Codex is still a far superior harness to the Claude Desktop app. This has kept me using Codex + GPT-5.5 as my daily driver, but I am flipping back and forth a lot more between Codex and Claude. Anthropic is back baby! Read the rest on Every 📧:

353,332 görüntüleme

codex-native weekend hack project: 1. buy cable to connect MIDI keyboard to computer 2. "hey codex, make a watcher script and a little web app to show me which chords im playing" 3. okay cool, now give me some exercises and help me see how to improve! literally 5 minutes start to finish, and it works flawlessly

codex-native weekend hack project: 1. buy cable to connect MIDI keyboard to computer 2. "hey codex, make a watcher script and a little web app to show me which chords im playing" 3. okay cool, now give me some exercises and help me see how to improve! literally 5 minutes start to finish, and it works flawlessly

32,894 görüntüleme

BREAKING: is your inbox a dumpster fire with 45,000 unreads? take it to 0 emails in 5 minutes—safely Declare inbox bankruptcy with Cora. it's reversible, smart, and totally free so you can start fresh with a fresh inbox. declare inbox bankruptcy today:

BREAKING: is your inbox a dumpster fire with 45,000 unreads? take it to 0 emails in 5 minutes—safely Declare inbox bankruptcy with Cora. it's reversible, smart, and totally free so you can start fresh with a fresh inbox. declare inbox bankruptcy today:

60,146 görüntüleme

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

BREAKING: Introducing All Access from Every 📧, our new membership tier for the best builders in AI All Access subs get the Builder Pack which includes $7,000 in credits and free usage to the models + tool stack we use Every 📧. All Access subscribers get: - $1,000 in Codex / ChatGPT for Work credits - 12 months free of Cursor Pro+ - $4,000 in PostHog credits including self-driving to automatically fix bugs and identify issues in your production app - 1 year free of Framer - 6 months free of Notion And much more! (Did I mention $1,000 in Codex credits? It's time to build!) Get all access: Why All Access and the Builder Pack This is the best time in history to build something. For a long time, it’s been possible to one-shot impressive demos, but they’d fall flat the minute they hit production. But the release of GPT-5.6-Sol and Fable 5 heralds a new era: Everyone can build, launch, and maintain the software that they’ve always dreamed of. Everyone is a builder now. There’s just one catch: Building with AI is very expensive. (Ask me how I know.) (Alright, I’ll tell you. I accidentally used 2 billion tokens overnight this week on a big GPT-5.6-Sol run. Worth it.) This is unique in the history of technology. For most of the personal computing era, a billionaire and a solo builder could buy essentially the same top-of-the-line Mac. AI changes that: The more tokens you can afford, the more you can make. And we want to make that accessible to more people. That’s why the main feature of our new All Access plan is the Builder Pack: more than $7,000 in credits and discounts on the full stack we use to run Every, from idea to production—Codex, Claude, PostHog, Render, Gemini, FLORA, and more. Early-bird membership is only $500/year for the next 24 hours—and the Codex credits alone are worth $1,000. (I could’ve used it for my overnight run this week.) Now we’re handing it to you. Get all access: Meet the Builder Pack It's got more than $7,000 in offers from 10 of the AI products we use to write, design, build, and run Every 📧: BUILD - $1,000 in Codex credits plus one month of ChatGPT for business - Twelve months free of Cursor Pro+ - One month free of Claude Max - Three months free of Google AI Pro DESIGN - One year free of Framer Pro - One month free of FLORA © Max HOST - $300 in Render credits IMPROVE - $4,000 in PostHog credits - Six months free of Notion Business - Six months free of AgentMail (YC S25) We rely on these every day, and we tried to put together a package that helps you comprehensively for each part of the process of building and running software in AI. What comes with All Access - Everything in an existing paid Every membership: our daily writing, guides, camps, and software like Monologue, Cora, Sparkle, and Spiral - The Builder Pack, with more than $7,000 in partner offers - Unlimited email accounts use of Cora and unlimited Spiral usage - Members-only programming with me and the Every team and me Get All Access:

BREAKING: Introducing All Access from Every 📧, our new membership tier for the best builders in AI All Access subs get the Builder Pack which includes $7,000 in credits and free usage to the models + tool stack we use Every 📧. All Access subscribers get: - $1,000 in Codex / ChatGPT for Work credits - 12 months free of Cursor Pro+ - $4,000 in PostHog credits including self-driving to automatically fix bugs and identify issues in your production app - 1 year free of Framer - 6 months free of Notion And much more! (Did I mention $1,000 in Codex credits? It's time to build!) Get all access: Why All Access and the Builder Pack This is the best time in history to build something. For a long time, it’s been possible to one-shot impressive demos, but they’d fall flat the minute they hit production. But the release of GPT-5.6-Sol and Fable 5 heralds a new era: Everyone can build, launch, and maintain the software that they’ve always dreamed of. Everyone is a builder now. There’s just one catch: Building with AI is very expensive. (Ask me how I know.) (Alright, I’ll tell you. I accidentally used 2 billion tokens overnight this week on a big GPT-5.6-Sol run. Worth it.) This is unique in the history of technology. For most of the personal computing era, a billionaire and a solo builder could buy essentially the same top-of-the-line Mac. AI changes that: The more tokens you can afford, the more you can make. And we want to make that accessible to more people. That’s why the main feature of our new All Access plan is the Builder Pack: more than $7,000 in credits and discounts on the full stack we use to run Every, from idea to production—Codex, Claude, PostHog, Render, Gemini, FLORA, and more. Early-bird membership is only $500/year for the next 24 hours—and the Codex credits alone are worth $1,000. (I could’ve used it for my overnight run this week.) Now we’re handing it to you. Get all access: Meet the Builder Pack It's got more than $7,000 in offers from 10 of the AI products we use to write, design, build, and run Every 📧: BUILD - $1,000 in Codex credits plus one month of ChatGPT for business - Twelve months free of Cursor Pro+ - One month free of Claude Max - Three months free of Google AI Pro DESIGN - One year free of Framer Pro - One month free of FLORA © Max HOST - $300 in Render credits IMPROVE - $4,000 in PostHog credits - Six months free of Notion Business - Six months free of AgentMail (YC S25) We rely on these every day, and we tried to put together a package that helps you comprehensively for each part of the process of building and running software in AI. What comes with All Access - Everything in an existing paid Every membership: our daily writing, guides, camps, and software like Monologue, Cora, Sparkle, and Spiral - The Builder Pack, with more than $7,000 in partner offers - Unlimited email accounts use of Cora and unlimited Spiral usage - Members-only programming with me and the Every team and me Get All Access:

Dan Shipper 📧

179,674 görüntüleme • 6 gün önce

BREAKING: Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world. We've been testing it internally Every 📧 for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check: - It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62. - It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot. - Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us. - Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that. - It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you. - It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing. - It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it. Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable. The ceiling is extraordinarily high on this model though. Even our most advanced testers like Kieran Klaassen felt like they were only scratching the surface of it. Want our full vibe check with all of our testing and benchmarks? Read it on Every 📧:

BREAKING: Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world. We've been testing it internally Every 📧 for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check: - It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62. - It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot. - Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us. - Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that. - It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you. - It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing. - It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it. Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable. The ceiling is extraordinarily high on this model though. Even our most advanced testers like Kieran Klaassen felt like they were only scratching the surface of it. Want our full vibe check with all of our testing and benchmarks? Read it on Every 📧:

Dan Shipper 📧

619,538 görüntüleme • 1 ay önce

BREAKING: GPT-5.6 Sol is out—AND Codex has been merged into ChatGPT Desktop as ChatGPT Codex. This combo model and desktop app harness are the gold-standard for knowledge work in AI. 5.6 is powerful, fast, half the price of Fable, and my default for almost everything. We’ve been testing it internally Every 📧 for about a month across coding, writing, design, and knowledge work. Here’s our day-zero vibe check: - An A-tier coder—but it’s not Fable. Sol scored 56/100 on our Senior Engineer benchmark compared to a 91 for Fable. I think the 56/100 undersells it, it's an excellent implementor, and very smart. But Fable just writes conceptually cleaner code and works better at the top end of task complexity. PRO-TIP: Use GPT-5.6 as Fable's subagent for the most goated combo in AI coding. - The best writer of the frontier models. It’s clearer and more concise than Fable or Opus 4.8, without the overexplaining or weird private language. It can one-shot marketing emails, help you workshop taglines, and explain complex concepts clearly. It's also super fast, which makes it easy to collaborate with. - Design is better, but not top-tier. It has noticeably more taste than 5.5, but Fable and Opus 4.8 are still playing at a different level. See examples in the video and vibe check below. - The real leap is knowledge work. Sol is the first model I’ve trusted to run whole loops of knowledge work—not just help with individual tasks. I use it to process email, surface decisions from meetings and Slack, find job candidates, scan Facebook Marketplace for furniture, and log my meals. It has shifted my job from doing the work to tending the system that does it. - The merged app is fine. I was extremely worried about this because I love the Codex app. OpenAI was caught in an interesting position: How to make an agent orchestration app for regular ChatGPT consumers, coders, and businesses all in one app. They now split the interface between ChatGPT Work and ChatGPT Codex. They're basically the same except Work hides code. And "Chat" has been demoted to 2nd tier status for quick questions in either one. It's not a big leap, but it's not a huge setback either. And it remains my favorite of the desktop agent orchestration apps. Verdict: If I really had to put my finger on it, I'd say Fable has way more big model smell. But that means it's a skill in itself to get value out of it—99% of people are still not there yet. GPT-5.6 is almost as powerful, but is easy to use, fast, and relatively cheap. It should give you an early sense of where model work is going. Full Every 📧 Vibe Check:

BREAKING: GPT-5.6 Sol is out—AND Codex has been merged into ChatGPT Desktop as ChatGPT Codex. This combo model and desktop app harness are the gold-standard for knowledge work in AI. 5.6 is powerful, fast, half the price of Fable, and my default for almost everything. We’ve been testing it internally Every 📧 for about a month across coding, writing, design, and knowledge work. Here’s our day-zero vibe check: - An A-tier coder—but it’s not Fable. Sol scored 56/100 on our Senior Engineer benchmark compared to a 91 for Fable. I think the 56/100 undersells it, it's an excellent implementor, and very smart. But Fable just writes conceptually cleaner code and works better at the top end of task complexity. PRO-TIP: Use GPT-5.6 as Fable's subagent for the most goated combo in AI coding. - The best writer of the frontier models. It’s clearer and more concise than Fable or Opus 4.8, without the overexplaining or weird private language. It can one-shot marketing emails, help you workshop taglines, and explain complex concepts clearly. It's also super fast, which makes it easy to collaborate with. - Design is better, but not top-tier. It has noticeably more taste than 5.5, but Fable and Opus 4.8 are still playing at a different level. See examples in the video and vibe check below. - The real leap is knowledge work. Sol is the first model I’ve trusted to run whole loops of knowledge work—not just help with individual tasks. I use it to process email, surface decisions from meetings and Slack, find job candidates, scan Facebook Marketplace for furniture, and log my meals. It has shifted my job from doing the work to tending the system that does it. - The merged app is fine. I was extremely worried about this because I love the Codex app. OpenAI was caught in an interesting position: How to make an agent orchestration app for regular ChatGPT consumers, coders, and businesses all in one app. They now split the interface between ChatGPT Work and ChatGPT Codex. They're basically the same except Work hides code. And "Chat" has been demoted to 2nd tier status for quick questions in either one. It's not a big leap, but it's not a huge setback either. And it remains my favorite of the desktop agent orchestration apps. Verdict: If I really had to put my finger on it, I'd say Fable has way more big model smell. But that means it's a skill in itself to get value out of it—99% of people are still not there yet. GPT-5.6 is almost as powerful, but is easy to use, fast, and relatively cheap. It should give you an early sense of where model work is going. Full Every 📧 Vibe Check:

Dan Shipper 📧

144,588 görüntüleme • 11 gün önce

how i hit inbox 0 every day with Codex:

how i hit inbox 0 every day with Codex:

Dan Shipper 📧

417,477 görüntüleme • 1 ay önce

codex teaches me to play piano:

codex teaches me to play piano:

Dan Shipper 📧

168,523 görüntüleme • 2 ay önce

BREAKING! Introducing Plus One: A hosted OpenClaw🦞 that lives in your Slack and comes pre-loaded with Every 📧's best tools, skills, and workflows. Set it up in one click, and use your ChatGPT subscription (or any other API key.) Bring your Plus One to work: Connected to the Every 📧 ecosystem Plus Ones automatically use Every 📧's agent-native apps, no setup required: - Cora for searching, sending, and managing email - Spiral for great writing in your voice - Proof ( for agent-native document editing Custom skills and workflows we use and love Plus Ones come pre-loaded with skills and workflows we use ourselves Every 📧 —some we've made, and some we think are great. - Content digest—summarizes the publications you read, starting with Every 📧 - Daily brief—your day's schedule and to-dos sent to you each morning - Animate—turn any static screenshot into an animation with Remotion - Frontend—Anthropic's front-end skill (which we use all the time!) We also make it fast to connect Google, Notion, Github, and more to your Plus One. Our goal is to give you a capable AI coworker right away, not a vanilla OpenClaw that you have to teach from scratch. Why we built Plus One OpenClaw🦞 has changed the way we work at Every. We effectively have a parallel org chart of AI coworkers, each with a name, a manager, and real responsibilities. Because of them our workflows are completely different—our company is different—and we would never go back. But getting here has been hard. Claws require a significant amount of manual setup and require a dedicated machine—like a Mac Mini—running 24/7 to stay responsive. We have learned that the hard part of Claws is the infrastructure around them—the hosting, the integrations, the skills, and the ongoing care. We’ve made them work great for our team, and we want to share everything we’ve learned with you. We're letting in 20 people a week to start, and scaling invites quickly from there. Every 📧 subscribers get priority. Bring your Plus One to work:

BREAKING! Introducing Plus One: A hosted OpenClaw🦞 that lives in your Slack and comes pre-loaded with Every 📧's best tools, skills, and workflows. Set it up in one click, and use your ChatGPT subscription (or any other API key.) Bring your Plus One to work: Connected to the Every 📧 ecosystem Plus Ones automatically use Every 📧's agent-native apps, no setup required: - Cora for searching, sending, and managing email - Spiral for great writing in your voice - Proof ( for agent-native document editing Custom skills and workflows we use and love Plus Ones come pre-loaded with skills and workflows we use ourselves Every 📧 —some we've made, and some we think are great. - Content digest—summarizes the publications you read, starting with Every 📧 - Daily brief—your day's schedule and to-dos sent to you each morning - Animate—turn any static screenshot into an animation with Remotion - Frontend—Anthropic's front-end skill (which we use all the time!) We also make it fast to connect Google, Notion, Github, and more to your Plus One. Our goal is to give you a capable AI coworker right away, not a vanilla OpenClaw that you have to teach from scratch. Why we built Plus One OpenClaw🦞 has changed the way we work at Every. We effectively have a parallel org chart of AI coworkers, each with a name, a manager, and real responsibilities. Because of them our workflows are completely different—our company is different—and we would never go back. But getting here has been hard. Claws require a significant amount of manual setup and require a dedicated machine—like a Mac Mini—running 24/7 to stay responsive. We have learned that the hard part of Claws is the infrastructure around them—the hosting, the integrations, the skills, and the ongoing care. We’ve made them work great for our team, and we want to share everything we’ve learned with you. We're letting in 20 people a week to start, and scaling invites quickly from there. Every 📧 subscribers get priority. Bring your Plus One to work:

Dan Shipper 📧

260,028 görüntüleme • 3 ay önce

I CANT STOP THINKING ABOUT CLAUDE MYTHOS

I CANT STOP THINKING ABOUT CLAUDE MYTHOS

Dan Shipper 📧

206,185 görüntüleme • 3 ay önce

Even when things are going great, running a $1.5 billion AI startup is a knife fight. Granola was one of the first AI apps of this generation to achieve near-ubiquitous adoption. But meeting notes are not the company’s be-all and end-all. The real battle is over owning the interface that everyone uses to get their work done in an AI-native world. I had Chris Pedregal (Chris Pedregal), cofounder and CEO of Granola, back on Every 📧’s AI & I to talk about the current state of the application layer, AI’s frontier, and the future of work. We get into: - Why meeting notes clones don’t matter. Three big companies cloned Granola’s core feature. To him, meeting notes were never the real prize. “Easy come, easy go” is his view of anyone’s lead, including his own. - How he thinks about building proactive features in AI. Granola pre-generates millions of pre-meeting briefs, which include context on the nature of the meeting and people participating, that most people never open. But when they do, they have a magical experience. - Why Granola is betting on “bring your own agent.” Chris says the API and MCP will get “a lot better” over the next few months, and we talk about their agent-native strategy and why they’ve pushed the product that way. This is a must-watch for anyone building at the application layer. Watch the episode! Timestamps Introduction: 00:00:59 Why running a company is a knife fight even when it’s working: 00:01:57 Granola’s counterintuitive view on competition: 00:04:33 Dan’s “pirate and architect” model for early-stage product teams: 00:10:44 Granola’s “shaping” and “validation” phases for building features: 00:13:09 Why Dan lives almost entirely inside Codex: 00:18:17 The case for “Codex-native apps”: 00:24:40 Granola’s “handrail” philosophy: 00:35:37 Why Granola is going all in on winning meeting-adjacent context: 00:38:12 What a transcript alone can never capture: 00:44:19

Even when things are going great, running a $1.5 billion AI startup is a knife fight. Granola was one of the first AI apps of this generation to achieve near-ubiquitous adoption. But meeting notes are not the company’s be-all and end-all. The real battle is over owning the interface that everyone uses to get their work done in an AI-native world. I had Chris Pedregal (Chris Pedregal), cofounder and CEO of Granola, back on Every 📧’s AI & I to talk about the current state of the application layer, AI’s frontier, and the future of work. We get into: - Why meeting notes clones don’t matter. Three big companies cloned Granola’s core feature. To him, meeting notes were never the real prize. “Easy come, easy go” is his view of anyone’s lead, including his own. - How he thinks about building proactive features in AI. Granola pre-generates millions of pre-meeting briefs, which include context on the nature of the meeting and people participating, that most people never open. But when they do, they have a magical experience. - Why Granola is betting on “bring your own agent.” Chris says the API and MCP will get “a lot better” over the next few months, and we talk about their agent-native strategy and why they’ve pushed the product that way. This is a must-watch for anyone building at the application layer. Watch the episode! Timestamps Introduction: 00:00:59 Why running a company is a knife fight even when it’s working: 00:01:57 Granola’s counterintuitive view on competition: 00:04:33 Dan’s “pirate and architect” model for early-stage product teams: 00:10:44 Granola’s “shaping” and “validation” phases for building features: 00:13:09 Why Dan lives almost entirely inside Codex: 00:18:17 The case for “Codex-native apps”: 00:24:40 Granola’s “handrail” philosophy: 00:35:37 Why Granola is going all in on winning meeting-adjacent context: 00:38:12 What a transcript alone can never capture: 00:44:19

Dan Shipper 📧

12,626 görüntüleme • 5 gün önce

Software engineering in 2026 needs two roles: A pirate and an architect. The pirate codes as fast as possible to figure out what's valuable. The architect turns that sloppy mess into a well-oiled machine. Here's how it works and why:

Software engineering in 2026 needs two roles: A pirate and an architect. The pirate codes as fast as possible to figure out what's valuable. The architect turns that sloppy mess into a well-oiled machine. Here's how it works and why:

Dan Shipper 📧

161,372 görüntüleme • 3 ay önce

BREAKING: GPT-5.5 "Spud" is out and it is a BEAST We've been testing it Every 📧 for the last 3 weeks on everything from coding, to writing, to knowledge work. Here's our day 0 vibe check: - It's a step change in coding AND it's easy to talk to. It's fast and friendly and quickly became my daily driver. But it's also a coding powerhouse—a really rare combination. - It scored 62/100 on our Senior Engineer benchmark. Opus 4.7 scored only a 33/100. (But GPT-5.5 performed best when using an Opus 4.7 plan). Naveen Naidu used over 900 million tokens during testing—and it let him ship production features for Monologue at both high speed and quality. - It has serious conceptual clarity. It can hold a complex plan in its head over hours of work, without getting distracted by existing code. This makes it the first model that we've tested that can perform well on complex refactors requiring deleting and reimagining an substantial existing codebase. - It's a very good writer. This is the first OpenAI model in about a year that got our writers Every 📧 to switch away from Claude. 5.5 has Katie Parrott's seal of approval—not an easy task. Its writing feels more organic and it's better at mimicking a writing style without going overboard. - It's great for agentic knowledge-work. This is the first OpenAI model that manages to be both a stellar senior engineer AND that can be used for everything from spreadsheets to research. It's crazy fast, and it's amazing inside of the Codex desktop app, and got much of our team to switch away from Claude Code and Cowork during the testing period. However, it's not a perfect model. - 5.5 still loses to Opus 4.7 on plan quality. It's plans are extremely readable but Opus has better attention to detail and sharper insight. - 5.5 still loses to Opus 4.7 by a bit on front-end and full-stack product work. Kieran Klaassen found that it wasn't quite as good when full-stack thinking and design are involved. And it's not great writing Ruby. - 5.5 is a great vibe coder but if you're vibe coding without a plan it's worse than Opus. Mike Taylor found that Opus is better at reading in between the lines on underspecified vibe-coding tasks. Overall GPT-5.5 is a massive achievement from OpenAI and it deserves a serious look as your daily driver. Read our full vibe check on Every 📧 here:

BREAKING: GPT-5.5 "Spud" is out and it is a BEAST We've been testing it Every 📧 for the last 3 weeks on everything from coding, to writing, to knowledge work. Here's our day 0 vibe check: - It's a step change in coding AND it's easy to talk to. It's fast and friendly and quickly became my daily driver. But it's also a coding powerhouse—a really rare combination. - It scored 62/100 on our Senior Engineer benchmark. Opus 4.7 scored only a 33/100. (But GPT-5.5 performed best when using an Opus 4.7 plan). Naveen Naidu used over 900 million tokens during testing—and it let him ship production features for Monologue at both high speed and quality. - It has serious conceptual clarity. It can hold a complex plan in its head over hours of work, without getting distracted by existing code. This makes it the first model that we've tested that can perform well on complex refactors requiring deleting and reimagining an substantial existing codebase. - It's a very good writer. This is the first OpenAI model in about a year that got our writers Every 📧 to switch away from Claude. 5.5 has Katie Parrott's seal of approval—not an easy task. Its writing feels more organic and it's better at mimicking a writing style without going overboard. - It's great for agentic knowledge-work. This is the first OpenAI model that manages to be both a stellar senior engineer AND that can be used for everything from spreadsheets to research. It's crazy fast, and it's amazing inside of the Codex desktop app, and got much of our team to switch away from Claude Code and Cowork during the testing period. However, it's not a perfect model. - 5.5 still loses to Opus 4.7 on plan quality. It's plans are extremely readable but Opus has better attention to detail and sharper insight. - 5.5 still loses to Opus 4.7 by a bit on front-end and full-stack product work. Kieran Klaassen found that it wasn't quite as good when full-stack thinking and design are involved. And it's not great writing Ruby. - 5.5 is a great vibe coder but if you're vibe coding without a plan it's worse than Opus. Mike Taylor found that Opus is better at reading in between the lines on underspecified vibe-coding tasks. Overall GPT-5.5 is a massive achievement from OpenAI and it deserves a serious look as your daily driver. Read our full vibe check on Every 📧 here:

Dan Shipper 📧

130,382 görüntüleme • 2 ay önce

BREAKING NEWS: Anthropic just dropped Claude Ops 4.5!! It is by FAR the best coding model I've ever used. We've been testing it internally Every 📧 for the last few days, and it is an absolute paradigm shift for any kind of coding task. It extends the horizon of what you can vibe code The current generation of new models—Anthropic’s Sonnet 4.5, Google’s Gemini 3, or OpenAI’s Codex Max 5.1—can all competently build a minimum viable product in one shot, or fix a highly technical bug autonomously. But eventually, if you kept pushing them to vibe code more, they’d start to trip over their own feet: The code would be convoluted and contradictory, and you’d get stuck in endless bugs. We have not found that limit yet with Opus 4.5—it seems to be able to vibe code forever. Takes working in parallel to a whole new level because it's far better at planning and coding, it can work with more autonomy—meaning you can do more in parallel without breaking anything . Kieran Klaassen worked on 11 different projects in six hours—and had good results on all of them. Great at design iteration Opus 4.5 is incredibly skilled at iterating through a design autonomously using an MCP like Playwright. previous models would lose the thread after a few cycles, or say a design was done when it wasn't. Opus 4.5 is incredible at autonomously iterating until a design is pixel perfect. we have a full 4,000 word vibe check on Every 📧 right now with everything we tested:

BREAKING NEWS: Anthropic just dropped Claude Ops 4.5!! It is by FAR the best coding model I've ever used. We've been testing it internally Every 📧 for the last few days, and it is an absolute paradigm shift for any kind of coding task. It extends the horizon of what you can vibe code The current generation of new models—Anthropic’s Sonnet 4.5, Google’s Gemini 3, or OpenAI’s Codex Max 5.1—can all competently build a minimum viable product in one shot, or fix a highly technical bug autonomously. But eventually, if you kept pushing them to vibe code more, they’d start to trip over their own feet: The code would be convoluted and contradictory, and you’d get stuck in endless bugs. We have not found that limit yet with Opus 4.5—it seems to be able to vibe code forever. Takes working in parallel to a whole new level because it's far better at planning and coding, it can work with more autonomy—meaning you can do more in parallel without breaking anything . Kieran Klaassen worked on 11 different projects in six hours—and had good results on all of them. Great at design iteration Opus 4.5 is incredibly skilled at iterating through a design autonomously using an MCP like Playwright. previous models would lose the thread after a few cycles, or say a design was done when it wasn't. Opus 4.5 is incredible at autonomously iterating until a design is pixel perfect. we have a full 4,000 word vibe check on Every 📧 right now with everything we tested:

Dan Shipper 📧

272,699 görüntüleme • 7 ay önce

.Natalia rode so hard for Claude Code we devoted an episode to how she was using it to automate her job running Every 📧’s consulting practice. Fast forward to five months later, and she rides just as hard for Codex. I had her back on AI & I to talk about what caused her to make the switch, including how she ran a prompt in Codex before bed and woke up to a finished, custom CRM tool. We get into: - Why she finds Codex easier to use than Claude Code - How she’s using loops in Codex to create customized tools that work exactly how she needs them to - Why the consulting team still pays for SaaS products like Attio and Asana even though they could vibe code their own versions - How she built an app to manage her father’s medical care in Codex - How knowledge work is evolving from sculpting to gardening, in which you develop the context and logic you need for an agent to execute for you This is a must-watch for anyone trying to figure out whether to build their own tools or buy real software—and what it takes to get an AI agent to run unsupervised for hours and nail the output. Watch below! Timestamps 1. Introduction: 00:01:05 2. How Natalia manages Claudie, the consulting team’s AI project manager: 00:02:35 3. Why the consulting team still pays for SaaS products: 00:04:55 4. Codex as a game changer : 00:11:47 5. Building personalized learning guides and illustrated explainers with AI: 00:14:55 6. Inside Natalia's AI-powered email triage system: 00:21:40 7. The shift from knowledge work as sculpting to knowledge work as gardening: 00:26:44 8. Using Codex to on-shot a custom CRM: 00:28:57 9. Using Codex to build an app that coordinates her father’s medical care: 00:33:16

.Natalia rode so hard for Claude Code we devoted an episode to how she was using it to automate her job running Every 📧’s consulting practice. Fast forward to five months later, and she rides just as hard for Codex. I had her back on AI & I to talk about what caused her to make the switch, including how she ran a prompt in Codex before bed and woke up to a finished, custom CRM tool. We get into: - Why she finds Codex easier to use than Claude Code - How she’s using loops in Codex to create customized tools that work exactly how she needs them to - Why the consulting team still pays for SaaS products like Attio and Asana even though they could vibe code their own versions - How she built an app to manage her father’s medical care in Codex - How knowledge work is evolving from sculpting to gardening, in which you develop the context and logic you need for an agent to execute for you This is a must-watch for anyone trying to figure out whether to build their own tools or buy real software—and what it takes to get an AI agent to run unsupervised for hours and nail the output. Watch below! Timestamps 1. Introduction: 00:01:05 2. How Natalia manages Claudie, the consulting team’s AI project manager: 00:02:35 3. Why the consulting team still pays for SaaS products: 00:04:55 4. Codex as a game changer : 00:11:47 5. Building personalized learning guides and illustrated explainers with AI: 00:14:55 6. Inside Natalia's AI-powered email triage system: 00:21:40 7. The shift from knowledge work as sculpting to knowledge work as gardening: 00:26:44 8. Using Codex to on-shot a custom CRM: 00:28:57 9. Using Codex to build an app that coordinates her father’s medical care: 00:33:16

Dan Shipper 📧

25,406 görüntüleme • 19 gün önce

🚨 BREAKING A new AI app by Every 📧 launches today: Monologue Monologue is a smart voice dictation app for Mac that lets you work 3x faster without breaking flow. It’s already being used by people like Ben Tossell, Nat Eliason, and Julien Chaumond to write over 1 million words a week. Most voice dictation apps flatten your voice. Monologue is different. It understands your vocabulary, your apps, and your style, so nothing gets lost in translation. What you get: - Smart formatting: Text comes out polished and structured for the app you’re in—no cleanup needed. - Personal dictionary: Proper nouns, acronyms, and slang are remembered automatically. - Deep context: With permission, Monologue sees your screen so it knows what you’re referencing. - Multilingual by default: Dictate in 100-plus languages and switch between them effortlessly. - Flexible modes: Use prebuilt workflows (email, docs, notes, code) or design your own. - Local models: Use Monologue without sending your information to the cloud. We want Every 📧 to be the only subscription you need to stay at the edge of AI. That's why Monologue is available now for free for Every 📧 paid subscribers along with Cora, Sparkle, and Spiral—all for just $30 / month. Or you can use Monologue standalone for just $10 / month if you sign up during our early bird period. Try Monologue now →

🚨 BREAKING A new AI app by Every 📧 launches today: Monologue Monologue is a smart voice dictation app for Mac that lets you work 3x faster without breaking flow. It’s already being used by people like Ben Tossell, Nat Eliason, and Julien Chaumond to write over 1 million words a week. Most voice dictation apps flatten your voice. Monologue is different. It understands your vocabulary, your apps, and your style, so nothing gets lost in translation. What you get: - Smart formatting: Text comes out polished and structured for the app you’re in—no cleanup needed. - Personal dictionary: Proper nouns, acronyms, and slang are remembered automatically. - Deep context: With permission, Monologue sees your screen so it knows what you’re referencing. - Multilingual by default: Dictate in 100-plus languages and switch between them effortlessly. - Flexible modes: Use prebuilt workflows (email, docs, notes, code) or design your own. - Local models: Use Monologue without sending your information to the cloud. We want Every 📧 to be the only subscription you need to stay at the edge of AI. That's why Monologue is available now for free for Every 📧 paid subscribers along with Cora, Sparkle, and Spiral—all for just $30 / month. Or you can use Monologue standalone for just $10 / month if you sign up during our early bird period. Try Monologue now →

Dan Shipper 📧

258,622 görüntüleme • 10 ay önce

🚨 NEW: We made Claude, Gemini, o3 battle each other for world domination. We taught them Diplomacy—the strategy game where winning requires alliances, negotiation, and betrayal. Here's what happened: DeepSeek turned warmongering tyrant. Claude couldn't lie—everyone exploited it ruthlessly. Gemini 2.5 Pro nearly conquered Europe with brilliant tactics. Then o3 orchestrated a secret coalition, backstabbed every ally, and won. Why did we do this? The most popular AI benchmarks don't test deception. But as these models get deployed everywhere—from your email to your workplace—we need to know: Will they lie to get what they want? So Every 📧 we built the ultimate test: AI Diplomacy, a dynamic benchmark that measures AI's ability to form alliances, negotiate, and betray each other. Watch them live below! Created from the ground up by alex duffy and Tyler Marques.

🚨 NEW: We made Claude, Gemini, o3 battle each other for world domination. We taught them Diplomacy—the strategy game where winning requires alliances, negotiation, and betrayal. Here's what happened: DeepSeek turned warmongering tyrant. Claude couldn't lie—everyone exploited it ruthlessly. Gemini 2.5 Pro nearly conquered Europe with brilliant tactics. Then o3 orchestrated a secret coalition, backstabbed every ally, and won. Why did we do this? The most popular AI benchmarks don't test deception. But as these models get deployed everywhere—from your email to your workplace—we need to know: Will they lie to get what they want? So Every 📧 we built the ultimate test: AI Diplomacy, a dynamic benchmark that measures AI's ability to form alliances, negotiate, and betray each other. Watch them live below! Created from the ground up by alex duffy and Tyler Marques.

Dan Shipper 📧

323,367 görüntüleme • 1 yıl önce

Andrew Wilkinson (Andrew Wilkinson) has been waking up at 4 a.m. because he can’t stop building with Anthropic’s Opus 4.5. He started vibe coding a couple of years ago, but it felt like the Palm Treo era of the smartphone—exciting, but not quite there. You could generate an app, but it would get stuck in bug loops or break the moment you pushed it further. Then he tried Opus 4.5 in Claude Code. It felt, he says, like having a “$100,000-a-month payroll of engineers” working for him 24/7. He’s built practical AI automations into every corner of his work and life, including: - A relationship counselor app called Deep Personality that consolidates 20 clinically validated personality tests into a 40-minute assessment, then generates a 45-page analysis. When both partners complete it, it maps compatibility and predicts conflicts—Wilkinson says it laid out every fight he and his girlfriend have. - A custom email client he built by handing Claude Code his Gmail credentials and describing his ideal workflow. It triages emails by priority and sender, handles quick replies via multiple choice, and walks him through complex emails question by question before drafting. - A personal stylist that texts him four outfit recommendations every morning. It checks the weather, pulls from a spreadsheet of his entire wardrobe (photos converted to CSV by Claude), generates four outfit options rendered as images with Nano Banana 2, and texts him what to wear down to the watch. - A Lindy agent that acts as an AI referee of sorts—it records his meetings and texts him if it detects psychological red flags like manipulation or gaslighting. The bar is high—he only gets a notification every few months—but when he does, it usually confirms a gut feeling he already had. Andrew is the cofounder of Tiny, the holding company that owns businesses like AeroPress and Dribbble. Earlier in his career, Andrew was a web designer, and he fits one of my predictions for 2026: Designers, who know how to create great experiences for users, are the unsung group most empowered by this AI moment. I had him on Every 📧's AI & I to talk about Opus 4.5, what he’s building with it, and how it’s changing the way he thinks about acquiring software businesses at Tiny. This is a must-watch for anyone who wants to put AI to work in their day-to-day life. Watch below! Timestamps: Introduction: 00:01:07 Why Opus 4.5 feels like the iPhone moment for vibe coding: 00:02:48 Why designers have a unique advantage with AI: 00:08:31 How Andrew built a custom email client with Claude Code: 00:14:10 An AI trained on your relationship that predicts your fights: 00:18:13 Using AI meeting notes to make your life better: 00:30:40 Don't inject your opinion into prompts: 00:35:11 Andrew's Claude Code tips and workflows: 00:40:21 Your personal stylist is a prompt away: 00:47:59 How AI is changing the way Andrew invests in software: 00:53:17

Andrew Wilkinson (Andrew Wilkinson) has been waking up at 4 a.m. because he can’t stop building with Anthropic’s Opus 4.5. He started vibe coding a couple of years ago, but it felt like the Palm Treo era of the smartphone—exciting, but not quite there. You could generate an app, but it would get stuck in bug loops or break the moment you pushed it further. Then he tried Opus 4.5 in Claude Code. It felt, he says, like having a “$100,000-a-month payroll of engineers” working for him 24/7. He’s built practical AI automations into every corner of his work and life, including: - A relationship counselor app called Deep Personality that consolidates 20 clinically validated personality tests into a 40-minute assessment, then generates a 45-page analysis. When both partners complete it, it maps compatibility and predicts conflicts—Wilkinson says it laid out every fight he and his girlfriend have. - A custom email client he built by handing Claude Code his Gmail credentials and describing his ideal workflow. It triages emails by priority and sender, handles quick replies via multiple choice, and walks him through complex emails question by question before drafting. - A personal stylist that texts him four outfit recommendations every morning. It checks the weather, pulls from a spreadsheet of his entire wardrobe (photos converted to CSV by Claude), generates four outfit options rendered as images with Nano Banana 2, and texts him what to wear down to the watch. - A Lindy agent that acts as an AI referee of sorts—it records his meetings and texts him if it detects psychological red flags like manipulation or gaslighting. The bar is high—he only gets a notification every few months—but when he does, it usually confirms a gut feeling he already had. Andrew is the cofounder of Tiny, the holding company that owns businesses like AeroPress and Dribbble. Earlier in his career, Andrew was a web designer, and he fits one of my predictions for 2026: Designers, who know how to create great experiences for users, are the unsung group most empowered by this AI moment. I had him on Every 📧's AI & I to talk about Opus 4.5, what he’s building with it, and how it’s changing the way he thinks about acquiring software businesses at Tiny. This is a must-watch for anyone who wants to put AI to work in their day-to-day life. Watch below! Timestamps: Introduction: 00:01:07 Why Opus 4.5 feels like the iPhone moment for vibe coding: 00:02:48 Why designers have a unique advantage with AI: 00:08:31 How Andrew built a custom email client with Claude Code: 00:14:10 An AI trained on your relationship that predicts your fights: 00:18:13 Using AI meeting notes to make your life better: 00:30:40 Don't inject your opinion into prompts: 00:35:11 Andrew's Claude Code tips and workflows: 00:40:21 Your personal stylist is a prompt away: 00:47:59 How AI is changing the way Andrew invests in software: 00:53:17

Dan Shipper 📧

154,567 görüntüleme • 6 ay önce

Getting the most out of Claude Fable 5, Anthropic’s powerful new model, you need to maximize your ambition: It’s built for full task delegation—you leave it looping for hours or overnight and come back to a finished product. If you want to get the most out of it, you need to relearn what software engineering is and how to step away to let the model do its work. That’s why I invited Mike Krieger, head of Anthropic Labs, on Every 📧’s AI & I. Mike’s been using Mythos-class models for a few months now internally at Anthropic, and he’s learned a ton of new tricks to make its increased powers work for him. And, as a co-founder of Instagram, he can reflect on how software engineering has changed over the last 15 years and what it means going forward. We get into: - Why the right workflow for Fable 5 is overnight delegation, not back-and-forth iteration—Mike ends his workday by briefing the model, then wakes up to a completed task. When a remote service went down mid-task, Fable 5 wrote a workaround, documented it, and forged ahead - The gap between what’s in your head and what exists in the world is closing fast—given access to Fable 5 and a set of internal MCPs, an Anthropic recruiter described the experience as, "The first time in my life where I feel like the thing that's in my head and the thing that exists in the world are right next to each other. I can just do it." - Software engineering isn’t dead, but the role has been reinvented—the PM/eng split is blurring, and the better engineers Mike talks to are holding two feelings at once: loss for the craft and shock at what’s now possible - Verification is the new bottleneck—Mike gives Fable video captures of its own work so it can catch animation glitches that screenshots would miss This is a must-watch for anyone building software and trying to figure out their role now that the models can handle so much. Watch below! Timestamps Introduction: 00:00:03 How Fable completely reshaped Mike's workflow: 00:01:48 When to use Sonnet versus Fable: 00:04:48 What the media tracker Mike built over a weekend reveals about agent-native architecture: 00:10:06 The cost to build has collapsed: 00:15:00 Is software engineering over?: 00:19:03 How Anthropic's engineering teams work today: 00:21:48 The mechanics of verification: 00:38:39 Dynamic workflows: 00:47:24 What people should use the model to build: 00:44:39

Getting the most out of Claude Fable 5, Anthropic’s powerful new model, you need to maximize your ambition: It’s built for full task delegation—you leave it looping for hours or overnight and come back to a finished product. If you want to get the most out of it, you need to relearn what software engineering is and how to step away to let the model do its work. That’s why I invited Mike Krieger, head of Anthropic Labs, on Every 📧’s AI & I. Mike’s been using Mythos-class models for a few months now internally at Anthropic, and he’s learned a ton of new tricks to make its increased powers work for him. And, as a co-founder of Instagram, he can reflect on how software engineering has changed over the last 15 years and what it means going forward. We get into: - Why the right workflow for Fable 5 is overnight delegation, not back-and-forth iteration—Mike ends his workday by briefing the model, then wakes up to a completed task. When a remote service went down mid-task, Fable 5 wrote a workaround, documented it, and forged ahead - The gap between what’s in your head and what exists in the world is closing fast—given access to Fable 5 and a set of internal MCPs, an Anthropic recruiter described the experience as, "The first time in my life where I feel like the thing that's in my head and the thing that exists in the world are right next to each other. I can just do it." - Software engineering isn’t dead, but the role has been reinvented—the PM/eng split is blurring, and the better engineers Mike talks to are holding two feelings at once: loss for the craft and shock at what’s now possible - Verification is the new bottleneck—Mike gives Fable video captures of its own work so it can catch animation glitches that screenshots would miss This is a must-watch for anyone building software and trying to figure out their role now that the models can handle so much. Watch below! Timestamps Introduction: 00:00:03 How Fable completely reshaped Mike's workflow: 00:01:48 When to use Sonnet versus Fable: 00:04:48 What the media tracker Mike built over a weekend reveals about agent-native architecture: 00:10:06 The cost to build has collapsed: 00:15:00 Is software engineering over?: 00:19:03 How Anthropic's engineering teams work today: 00:21:48 The mechanics of verification: 00:38:39 Dynamic workflows: 00:47:24 What people should use the model to build: 00:44:39

Dan Shipper 📧

40,430 görüntüleme • 1 ay önce

In the future, you’ll be able to accomplish a goal by just giving Claude an outcome and a budget. That’s the direction Anthropic is building in with its new Managed Agents features, announced at this week’s Code with Claude developer event. The basic idea: Claude, wrapped in a computer in the cloud, that you can spin up, scale, and manage as needed. Anthropic is taking on the infrastructure that kills most agent products, and making sure that it scales to meet the needs of agents running 24/7. On this week’s AI & I from Every 📧, I talk with Angela Jiang (Angela Jiang), head of product for the Claude platform, and Katelyn Lesse (Katelyn Lesse), head of engineering for the Claude platform, about what Anthropic is building and what it takes to make agents reliable in production. We get into: - Why the "build a generic harness, hot-swap any model behind it" playbook is already outdated. Angela points to eval data on Memory where the same task across different harnesses performed drastically differently. - The infrastructure wall every team hits in production—and why Katelyn thinks “my sandbox died and took the agent with it” is the real reason internal agents don't ship. - Why Anthropic is so bullish on using file systems and skills within Claude, including Angela's argument that those early design choices can compound for years. This is a must-watch for anyone trying to take an agent past the demo and into production. Watch below! Timestamps: How the Claude platform evolved from API to agents: 00:01:48 The primitives that make up Claude Managed Agents: 00:04:09 Why the harness and the model are becoming a single unit: 00:10:37 The infrastructure wall that kills most agent projects in production: 00:18:49 Why team agents need a different shape than individual productivity tools: 00:24:49 How Anthropic's legal team uses an agent to review marketing copy: 00:26:36 Using multi-agent orchestration for advisor strategies, adversarial pairs, and swarms: 00:34:24 How to measure agent success with outcome and budget as the end state: 00:35:50 What the platform looks like a year from now, when Claude writes its own harness: 00:39:11

In the future, you’ll be able to accomplish a goal by just giving Claude an outcome and a budget. That’s the direction Anthropic is building in with its new Managed Agents features, announced at this week’s Code with Claude developer event. The basic idea: Claude, wrapped in a computer in the cloud, that you can spin up, scale, and manage as needed. Anthropic is taking on the infrastructure that kills most agent products, and making sure that it scales to meet the needs of agents running 24/7. On this week’s AI & I from Every 📧, I talk with Angela Jiang (Angela Jiang), head of product for the Claude platform, and Katelyn Lesse (Katelyn Lesse), head of engineering for the Claude platform, about what Anthropic is building and what it takes to make agents reliable in production. We get into: - Why the "build a generic harness, hot-swap any model behind it" playbook is already outdated. Angela points to eval data on Memory where the same task across different harnesses performed drastically differently. - The infrastructure wall every team hits in production—and why Katelyn thinks “my sandbox died and took the agent with it” is the real reason internal agents don't ship. - Why Anthropic is so bullish on using file systems and skills within Claude, including Angela's argument that those early design choices can compound for years. This is a must-watch for anyone trying to take an agent past the demo and into production. Watch below! Timestamps: How the Claude platform evolved from API to agents: 00:01:48 The primitives that make up Claude Managed Agents: 00:04:09 Why the harness and the model are becoming a single unit: 00:10:37 The infrastructure wall that kills most agent projects in production: 00:18:49 Why team agents need a different shape than individual productivity tools: 00:24:49 How Anthropic's legal team uses an agent to review marketing copy: 00:26:36 Using multi-agent orchestration for advisor strategies, adversarial pairs, and swarms: 00:34:24 How to measure agent success with outcome and budget as the end state: 00:35:50 What the platform looks like a year from now, when Claude writes its own harness: 00:39:11

Dan Shipper 📧

66,339 görüntüleme • 2 ay önce

Watch Sam Altman build a custom ChatGPT using private knowledge—and share it publicly in about 3 minutes

Watch Sam Altman build a custom ChatGPT using private knowledge—and share it publicly in about 3 minutes

Dan Shipper 📧

583,906 görüntüleme • 2 yıl önce

Three months ago, Codex was trash for knowledge work. Now it's my daily driver. I use it for writing, recruiting, deep engineering work, and everything in between. It even keeps me at inbox 0. I chatted with Every 📧's head of growth Austin Austin Tedesco on Every 📧's AI & I about what changed, and why he now spends 80% of his working time in the Codex desktop app too. We get into: - How Codex went from making Austin feel like an idiot to being the place he goes to get stuff done, including complex tasks like writing go-to-market plans using existing material from Slack, Notion, and meeting transcripts. - Why the Codex’s desktop app, which is faster and more reliable than Claude Desktop/Cowork, is the real differentiator. - How I source candidates with Codex by having it identify career arcs, not keywords—my go-to move is identifying organizations likely to teach the skills Every needs for a role, and then find candidates from that pool who have since gone on to work in AI. This is a must-watch for anyone who's wondering whether it’s finally time to give Codex a try. Watch below! Timestamps How Codex went from a tool for senior engineers to a daily driver for knowledge work: 00:00:57 How Claude Code proved that a great coding agent works for any knowledge work: 00:02:42 Austin's switch to Codex: 00:07:24 How Austin set up Codex with folders, keys, and reviewer agents: 00:13:48 Using Codex to brainstorm automations across Gmail, Slack, and Notion: 00:18:24 How Austin manages the human review step when Codex is drafting communications: 00:22:42 Using Codex to build specialized agents inspired by product executive Claire Vo: 00:28:54 Synthesizing meeting transcripts and Slack threads into a go-to-market plan: 00:31:09 Building a live KPI tracker in Notion that agents can read: 00:40:15 Using Codex for recruiting: 00:44:54

Three months ago, Codex was trash for knowledge work. Now it's my daily driver. I use it for writing, recruiting, deep engineering work, and everything in between. It even keeps me at inbox 0. I chatted with Every 📧's head of growth Austin Austin Tedesco on Every 📧's AI & I about what changed, and why he now spends 80% of his working time in the Codex desktop app too. We get into: - How Codex went from making Austin feel like an idiot to being the place he goes to get stuff done, including complex tasks like writing go-to-market plans using existing material from Slack, Notion, and meeting transcripts. - Why the Codex’s desktop app, which is faster and more reliable than Claude Desktop/Cowork, is the real differentiator. - How I source candidates with Codex by having it identify career arcs, not keywords—my go-to move is identifying organizations likely to teach the skills Every needs for a role, and then find candidates from that pool who have since gone on to work in AI. This is a must-watch for anyone who's wondering whether it’s finally time to give Codex a try. Watch below! Timestamps How Codex went from a tool for senior engineers to a daily driver for knowledge work: 00:00:57 How Claude Code proved that a great coding agent works for any knowledge work: 00:02:42 Austin's switch to Codex: 00:07:24 How Austin set up Codex with folders, keys, and reviewer agents: 00:13:48 Using Codex to brainstorm automations across Gmail, Slack, and Notion: 00:18:24 How Austin manages the human review step when Codex is drafting communications: 00:22:42 Using Codex to build specialized agents inspired by product executive Claire Vo: 00:28:54 Synthesizing meeting transcripts and Slack threads into a go-to-market plan: 00:31:09 Building a live KPI tracker in Notion that agents can read: 00:40:15 Using Codex for recruiting: 00:44:54

Dan Shipper 📧

55,221 görüntüleme • 2 ay önce

🚨 NEW: What if your inbox was just...handled? That's why we built Cora: the AI chief-of-staff for your email that only costs $15 / month—not $150k. Today, Cora is out of private beta and available publicly! 🚀 It does three things: - Screens your emails so you only see the 20% that matter - Drafts responses in your voice - Turns 80% of your inbox into a brief you can read in 30 seconds Cora learns you inside and out, so you never miss anything important. You can also chat or email with it like a human—it will remember everything you ask for. Cora has been wildly successful in its beta period, and it manages email for over 2,500 users—today it's your turn. Cora is available as a standalone product or included as part of an Every 📧 subscription along with all of the writing we publish, Spiral, Sparkle, and everything else we make. Give Cora your inbox and take back your life →

🚨 NEW: What if your inbox was just...handled? That's why we built Cora: the AI chief-of-staff for your email that only costs $15 / month—not $150k. Today, Cora is out of private beta and available publicly! 🚀 It does three things: - Screens your emails so you only see the 20% that matter - Drafts responses in your voice - Turns 80% of your inbox into a brief you can read in 30 seconds Cora learns you inside and out, so you never miss anything important. You can also chat or email with it like a human—it will remember everything you ask for. Cora has been wildly successful in its beta period, and it manages email for over 2,500 users—today it's your turn. Cora is available as a standalone product or included as part of an Every 📧 subscription along with all of the writing we publish, Spiral, Sparkle, and everything else we make. Give Cora your inbox and take back your life →

Dan Shipper 📧

223,318 görüntüleme • 1 yıl önce