Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Introducing /visual-plan - a skill to generate rich, visual plans for Claude Code and Codex. Plan mode in Claude Code is incredible. But I always find my eyes glazing over when it gives me this huge markdown essay in my terminal. I found I can make much better visual... plans with reusable components. So I made a skill called `/visual-plan`. It generates plans as MDX with visual, interactive components. Diagrams, interactive API specs, schema design changes, annotated code, and even pan and zoomable wireframes. So for any UI work, you can look at a wireframe first, comment on it, iterate, and then have the agent work. I’ve found this to be a much more intuitive interface for reasoning about what the agent is doing. It’s somewhat inspired by that popular post about how HTML is better than Markdown. But HTML can be slow and verbose to write. And it doesn’t look good checked into a repo. This has really made me feel like humans and engineering are entering a new abstraction phase, where we reason about things at the plan level. As long as the plan is good, agents are getting more and more reliable at executing on it. Almost to the degree that we trust the C compiler to compile to assembly reliably. Plans are the new intermediate representation. I also made a skill for the reverse of this, called `/visual-recap`. After the agent works, it gives you a recap of everything it did. Same idea: wireframes, interactive API specs and diffs, schemas, annotated code, etc. So now when you’re reviewing what the agent did for you, or looking at a pull request of somebody else’s code, you can see a visual recap instead of just reading a wall of text. It’s all free and open source. You can find it on my GitHub. Will link to it in the reply because we all know how dumb these algorithms are with links.show more

Steve (Builder.io)

130,602 subscribers

118,354 Aufrufe • vor 4 Tagen •via X (Twitter)

Bildung Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

For those of you tired of yet another AI-based IDE: Check out Traycer. It's a Visual Studio Code extension (free forever for anyone working on open-source repositories), and it does something different from everyone else. You write a high-level prompt outlining the changes you want to make to your codebase, and the extension generates a plan for you before you touch the code. (It reminds me of GitHub Copilot Workspaces.) You can review the plan, understand exactly the changes the extension will make to your code, and make any modifications. When you are ready, you execute the plan, and Traycer stages all of the changes so you have a new opportunity to review everything. Only then does Traycer apply the changes. This is ideal for large codebases where you'd rather study the changes the AI plans to make before they make it into your code. Look at the video I recorded to see how it works. Thanks to the Traycer team for collaborating with me on this post!

For those of you tired of yet another AI-based IDE: Check out Traycer. It's a Visual Studio Code extension (free forever for anyone working on open-source repositories), and it does something different from everyone else. You write a high-level prompt outlining the changes you want to make to your codebase, and the extension generates a plan for you before you touch the code. (It reminds me of GitHub Copilot Workspaces.) You can review the plan, understand exactly the changes the extension will make to your code, and make any modifications. When you are ready, you execute the plan, and Traycer stages all of the changes so you have a new opportunity to review everything. Only then does Traycer apply the changes. This is ideal for large codebases where you'd rather study the changes the AI plans to make before they make it into your code. Look at the video I recorded to see how it works. Thanks to the Traycer team for collaborating with me on this post!

Santiago

83,805 Aufrufe • vor 1 Jahr

My favorite way of interacting with Claude Code is to have it generate static HTML files as outputs (reports, explorations, code structure, mockups etc.) I wanted to iterate on the file by commenting in browser and having Claude update the output live. So, I built this Claude Skill👇 How it works: - Install Claude Code skill (ask it to clone repo) - Build an HTML page for anything (e.g. research coding agents and generate HTML report) - Ask it to make the page interactive That's it. CC will launch a localhost server and allow you to then leave comments on the page itself and once it updates, will give you a tour of changes. It's like Google Docs kind of comments/iteration but for HTML pages.

My favorite way of interacting with Claude Code is to have it generate static HTML files as outputs (reports, explorations, code structure, mockups etc.) I wanted to iterate on the file by commenting in browser and having Claude update the output live. So, I built this Claude Skill👇 How it works: - Install Claude Code skill (ask it to clone repo) - Build an HTML page for anything (e.g. research coding agents and generate HTML report) - Ask it to make the page interactive That's it. CC will launch a localhost server and allow you to then leave comments on the page itself and once it updates, will give you a tour of changes. It's like Google Docs kind of comments/iteration but for HTML pages.

Paras Chopra

213,802 Aufrufe • vor 25 Tagen

Your Plan SUCKS! 👀 Inspired by the legend Andrej Karpathy himself, I created a new skill: LLM Council. Create better plans, by committee. Supports: > Codex CLI > Gemini CLI > Claude Code > OpenCode Call the skill with your feature that you'd like to build. It will ask you some clarifying question, and then launch up to four parallel planning agents to create a detailed plan. Once the plans are in, all plans are anonymized and the "The Judge" will critique and choose the best plan *OR* the best parts from all of them. Finally, it will output a final-plan, which you can review and refine. I even created a nice UI for you to review and refine your plans. This Skill has been tested on Linux only, but should work on other platforms. Please report any bugs! Links in the comments.

Your Plan SUCKS! 👀 Inspired by the legend Andrej Karpathy himself, I created a new skill: LLM Council. Create better plans, by committee. Supports: > Codex CLI > Gemini CLI > Claude Code > OpenCode Call the skill with your feature that you'd like to build. It will ask you some clarifying question, and then launch up to four parallel planning agents to create a detailed plan. Once the plans are in, all plans are anonymized and the "The Judge" will critique and choose the best plan OR the best parts from all of them. Finally, it will output a final-plan, which you can review and refine. I even created a nice UI for you to review and refine your plans. This Skill has been tested on Linux only, but should work on other platforms. Please report any bugs! Links in the comments.

am.will

68,983 Aufrufe • vor 5 Monaten

I made the worlds best Clipboard for your AI Agents Its called Bluey and its a dumb pointer which you can use to give 10x better context to your agent And you can point at your screen, talk to it, and it will generate a really good annotated screenshot which you can now give to your AI Agent You can use it while designing your website You can use it while explaining something on your screen. And it understands you and you can be really vague about things like "move this" , modify the color "here" and it will understand what you are pointing at. Im loving using it and I built it for myself as I was tired of clicking screenshots all the time and attaching it to my codex or claude code sessions Now I can just point, speak or type and drag and drop it anywhere I am giving free access to it in the comments and I would love for you to try it out. Join the discord and I am sharing the download link there. By the way everything stays 100 percent local. No screenshots are shared anywhere!

I made the worlds best Clipboard for your AI Agents Its called Bluey and its a dumb pointer which you can use to give 10x better context to your agent And you can point at your screen, talk to it, and it will generate a really good annotated screenshot which you can now give to your AI Agent You can use it while designing your website You can use it while explaining something on your screen. And it understands you and you can be really vague about things like "move this" , modify the color "here" and it will understand what you are pointing at. Im loving using it and I built it for myself as I was tired of clicking screenshots all the time and attaching it to my codex or claude code sessions Now I can just point, speak or type and drag and drop it anywhere I am giving free access to it in the comments and I would love for you to try it out. Join the discord and I am sharing the download link there. By the way everything stays 100 percent local. No screenshots are shared anywhere!

Milind S

11,871 Aufrufe • vor 17 Tagen

'I’ve always loved working with visual artists and following on from the competition that we ran with Stability AI around the release of i/o, we are setting up 5050 to encourage the creation of more of these collaborations. So if you have any type of visual work that you would like to try out alongside my music in a video, this is a place where you can explore it. Also, it didn’t seem fair that the music creators get all the revenue on platforms such as YouTube, so we’ve called it 5050 as we look to have a fairer share between the visual and music creators.' - pg Visit for more information & to watch the Olive Tree video by Oranguerillatan

'I’ve always loved working with visual artists and following on from the competition that we ran with Stability AI around the release of i/o, we are setting up 5050 to encourage the creation of more of these collaborations. So if you have any type of visual work that you would like to try out alongside my music in a video, this is a place where you can explore it. Also, it didn’t seem fair that the music creators get all the revenue on platforms such as YouTube, so we’ve called it 5050 as we look to have a fairer share between the visual and music creators.' - pg Visit for more information & to watch the Olive Tree video by Oranguerillatan

Peter Gabriel

44,298 Aufrufe • vor 1 Jahr

Claude released a new design feature that's great for iOS app design. So I extracted the code and skills and turned it into a skill that can be used by any agent. Since Codex has comments and web preview built in... it works basically the same. (Oh and you can use claude 4.7 if you think it's better at design)

Claude released a new design feature that's great for iOS app design. So I extracted the code and skills and turned it into a skill that can be used by any agent. Since Codex has comments and web preview built in... it works basically the same. (Oh and you can use claude 4.7 if you think it's better at design)

Riley Brown

86,540 Aufrufe • vor 2 Monaten

Bash is all you need! Which is why I'm introducing my holiday project: just-bash just-bash is a pretty complete implementation of bash in TypeScript designed to be used as a bash tool by AI agents. Because it turns out agents love exploring data via shell scripts, even beyond coding. It comes with grep, sed, awk and the 99th percentile features that an agent like Claude Code or Cursor would use. In fact, Claude Code can use it for secure bash execution. In the package - A bash-tool for AI SDK - A binary for use by yourself or your coding agents - An overlay filesystem to feed files to your agent securely - A Vercel Sandbox compatible API, so you can quickly upgrade to a real VM if you need to run binaries - An example AI agent that explores the just-bash code base using just-bash - I imported the Oils shell bash compatibility suite and just-bash passes a very good chunk What is interesting about this codebase: It was essentially entirely written by Opus 4.5. Coding agents love bash and they are good at reproducing it. They are also great at text-book recursive descent parsers and AST tweet-walk interpreters. That said, it is, like, a lot of code and I didn't read it all 😅. This is very much a hack, but it also seems to be _really_ useful. I haven't really found anything agents want to use that it doesn't support and it's fast and secure (caveats apply). It doesn't have write access to your computer and the filesystem is given a root that the agent cannot escape from. Find it at Related: Our recent blog post how we migrated our data analysis agent to bash tools and achieved incredible quality improvements The video shows the example agent investigating the just-bash code base

Bash is all you need! Which is why I'm introducing my holiday project: just-bash just-bash is a pretty complete implementation of bash in TypeScript designed to be used as a bash tool by AI agents. Because it turns out agents love exploring data via shell scripts, even beyond coding. It comes with grep, sed, awk and the 99th percentile features that an agent like Claude Code or Cursor would use. In fact, Claude Code can use it for secure bash execution. In the package - A bash-tool for AI SDK - A binary for use by yourself or your coding agents - An overlay filesystem to feed files to your agent securely - A Vercel Sandbox compatible API, so you can quickly upgrade to a real VM if you need to run binaries - An example AI agent that explores the just-bash code base using just-bash - I imported the Oils shell bash compatibility suite and just-bash passes a very good chunk What is interesting about this codebase: It was essentially entirely written by Opus 4.5. Coding agents love bash and they are good at reproducing it. They are also great at text-book recursive descent parsers and AST tweet-walk interpreters. That said, it is, like, a lot of code and I didn't read it all 😅. This is very much a hack, but it also seems to be _really_ useful. I haven't really found anything agents want to use that it doesn't support and it's fast and secure (caveats apply). It doesn't have write access to your computer and the filesystem is given a root that the agent cannot escape from. Find it at Related: Our recent blog post how we migrated our data analysis agent to bash tools and achieved incredible quality improvements The video shows the example agent investigating the just-bash code base

Malte Ubl

124,713 Aufrufe • vor 5 Monaten

I built a Visual Studio Code extension to turn Visual Studio code into a custom learning environment for my Machine Learning class. Attached, you'll see a 2-minute video of how it works. I haven't tried this with my students yet (the February cohort will be the first to do so), but I think it will be huge. I wish more people would offer a learning experience like this. My next cohort starts in February and you can join at You'll get lifetime access to the best Machine Learning engineering cohort online.

I built a Visual Studio Code extension to turn Visual Studio code into a custom learning environment for my Machine Learning class. Attached, you'll see a 2-minute video of how it works. I haven't tried this with my students yet (the February cohort will be the first to do so), but I think it will be huge. I wish more people would offer a learning experience like this. My next cohort starts in February and you can join at You'll get lifetime access to the best Machine Learning engineering cohort online.

Santiago

82,704 Aufrufe • vor 1 Jahr

Ever seen a fresh (20x) Claude Max account's 5-hour usage allowance get drained in ~14 minutes? Feast your eyes on my bizarre life now with this screen recording of a recent live work session, something I've gotten at least 100 requests for over the past month. Maybe you can understand now why I need so many accounts and how I can work on so many different projects. You can also see the truth of what I was saying recently about how, once your plan is done and the beads made and polished, it's mostly just machine-tending the swarm that doesn't require much thought. Lots of just telling it to get the next bead and work on it, to review code, to re-read AGENTS dot md after a compaction, etc. And you can see how I use gemini-cli for code review. I give Google a lot of crap for the harness being broken and the capacity overloads, but when it works, it's actually really good for this code review use case. I don't usually let it write new code, though, because I think Opus and 5.2 do a better job. Also, sorry the recording is a bit blurry; I have a 5K resolution monitor and screen recordings usually are hard to watch from it. And btw, this really wasn't that normal of a session for me, it was more frenetic than usual, because I don't want to dox myself or my clients by accident. Hence all the ceaseless terminal tab swirling. I usually do more planning work while this stuff is going on, but I wanted to minimize the chances of leaking important information. That's also why I didn't refresh the Gemini login in the WezTerm window, which killed me, trust me. It's the reason I hate doing these screen recordings in the first place; it kills my productivity. Anyway, hope you liked it. I will also post to YouTube, see reply for link. Thanks for watching.

Ever seen a fresh (20x) Claude Max account's 5-hour usage allowance get drained in ~14 minutes? Feast your eyes on my bizarre life now with this screen recording of a recent live work session, something I've gotten at least 100 requests for over the past month. Maybe you can understand now why I need so many accounts and how I can work on so many different projects. You can also see the truth of what I was saying recently about how, once your plan is done and the beads made and polished, it's mostly just machine-tending the swarm that doesn't require much thought. Lots of just telling it to get the next bead and work on it, to review code, to re-read AGENTS dot md after a compaction, etc. And you can see how I use gemini-cli for code review. I give Google a lot of crap for the harness being broken and the capacity overloads, but when it works, it's actually really good for this code review use case. I don't usually let it write new code, though, because I think Opus and 5.2 do a better job. Also, sorry the recording is a bit blurry; I have a 5K resolution monitor and screen recordings usually are hard to watch from it. And btw, this really wasn't that normal of a session for me, it was more frenetic than usual, because I don't want to dox myself or my clients by accident. Hence all the ceaseless terminal tab swirling. I usually do more planning work while this stuff is going on, but I wanted to minimize the chances of leaking important information. That's also why I didn't refresh the Gemini login in the WezTerm window, which killed me, trust me. It's the reason I hate doing these screen recordings in the first place; it kills my productivity. Anyway, hope you liked it. I will also post to YouTube, see reply for link. Thanks for watching.

Jeffrey Emanuel

86,013 Aufrufe • vor 5 Monaten

Nothing beats open-source! We now have a fully open-source coding agent that can compete with any proprietary solution out there: • 500+ available models • It doesn't train on your code • It doesn't charge a penny for tokens • It's open. You can change if you want. Kilo Code is the #1 app on OpenRouter right now, and the community is releasing new features on GitHub at rocket speed (about 20 releases in the last 30 days). I installed it in Visual Studio Code, but you can install it in pretty much any IDE. I need to spend more time with it, but here is a very interesting design choice: Most coding agents support two modes: an "agentic" mode for writing code and an "ask" mode for talking to the model without making any changes. Kilo Code supports 5 different modes: 1. Architect mode - Research, design, and planning 2. Debug mode - Problem solving and troubleshooting 3. Code mode - To write code. 4. Ask mode - To talk to the model. 5. Orchestrator mode - Combine every agent to solve problems. On paper, I really like this. I'll need more time to try out the specialized agents and see whether niching becomes an advantage. Here is the link to the website: Thanks to the Kilo for partnering with me on this post.

Nothing beats open-source! We now have a fully open-source coding agent that can compete with any proprietary solution out there: • 500+ available models • It doesn't train on your code • It doesn't charge a penny for tokens • It's open. You can change if you want. Kilo Code is the #1 app on OpenRouter right now, and the community is releasing new features on GitHub at rocket speed (about 20 releases in the last 30 days). I installed it in Visual Studio Code, but you can install it in pretty much any IDE. I need to spend more time with it, but here is a very interesting design choice: Most coding agents support two modes: an "agentic" mode for writing code and an "ask" mode for talking to the model without making any changes. Kilo Code supports 5 different modes: 1. Architect mode - Research, design, and planning 2. Debug mode - Problem solving and troubleshooting 3. Code mode - To write code. 4. Ask mode - To talk to the model. 5. Orchestrator mode - Combine every agent to solve problems. On paper, I really like this. I'll need more time to try out the specialized agents and see whether niching becomes an advantage. Here is the link to the website: Thanks to the Kilo for partnering with me on this post.

Santiago

73,062 Aufrufe • vor 7 Monaten

New Andrej Karpathy interview Says AI agent failures stem from user skill, not model capability. Poor instructions cause errors. He suggests delegating 20-minute macro actions like coding and research to parallel agents and reviewing their work. --- "I think everything, like so many things, even if they don't work, I think to a large extent you feel like it's a skill issue. It's not that the capability is not there; it's that you just haven't found a way to string together what's available. Like, I didn't give good enough instructions to the agents in the file, or whatever it may be. I don't have a nice enough memory tool that I put in there, or something like that. So, it all kind of feels like a skill issue when it doesn't work to some extent. You want to see how you can parallelize them, and you want to be a 'Pierce tender,' basically. Pierce famously has a funny photo where he's in front of lots of these Codex agents behind the monitor. They all take about 20 minutes if you run them correctly and use high effort. You have multiple—you know, 10 or 20—pull requests checked out. It's just like you can do much larger macro actions. It's not just, 'Here's a line of code, here's a new function.' It's like, 'Here's a new functionality, delegate it to agent one. Here's a new functionality that's not going to interfere with the other one, give it to agent two.' Then, you try to review their work as best as you can, depending on how much you care about that code. You look for these macro actions that you can manipulate your software repository by. Another agent is doing some research, another agent is writing code, another one is coming up with a plan for some new implementation. Everything just happens in these macro actions over your repository. You're just trying to become really good at it and develop a muscle memory for it. It's very rewarding when it actually works, but it's also a new thing to learn. Hence, the psychosis." --- From No Priors YT channel (link in comment)

New Andrej Karpathy interview Says AI agent failures stem from user skill, not model capability. Poor instructions cause errors. He suggests delegating 20-minute macro actions like coding and research to parallel agents and reviewing their work. --- "I think everything, like so many things, even if they don't work, I think to a large extent you feel like it's a skill issue. It's not that the capability is not there; it's that you just haven't found a way to string together what's available. Like, I didn't give good enough instructions to the agents in the file, or whatever it may be. I don't have a nice enough memory tool that I put in there, or something like that. So, it all kind of feels like a skill issue when it doesn't work to some extent. You want to see how you can parallelize them, and you want to be a 'Pierce tender,' basically. Pierce famously has a funny photo where he's in front of lots of these Codex agents behind the monitor. They all take about 20 minutes if you run them correctly and use high effort. You have multiple—you know, 10 or 20—pull requests checked out. It's just like you can do much larger macro actions. It's not just, 'Here's a line of code, here's a new function.' It's like, 'Here's a new functionality, delegate it to agent one. Here's a new functionality that's not going to interfere with the other one, give it to agent two.' Then, you try to review their work as best as you can, depending on how much you care about that code. You look for these macro actions that you can manipulate your software repository by. Another agent is doing some research, another agent is writing code, another one is coming up with a plan for some new implementation. Everything just happens in these macro actions over your repository. You're just trying to become really good at it and develop a muscle memory for it. It's very rewarding when it actually works, but it's also a new thing to learn. Hence, the psychosis." --- From No Priors YT channel (link in comment)

Rohan Paul

23,090 Aufrufe • vor 3 Monaten

This is how i build everything now. Before the agents touch a line of code "before you do any building make 10 mock ups in html of how we can make it look" I then pick a path and ask for more mockups until I feel happy with the feel. The only way to work with these things.

This is how i build everything now. Before the agents touch a line of code "before you do any building make 10 mock ups in html of how we can make it look" I then pick a path and ask for more mockups until I feel happy with the feel. The only way to work with these things.

0xSero

25,695 Aufrufe • vor 3 Monaten

I asked Garry Tan how to use meta prompting to get better at AI: "My partners at YC Jared Friedman and Pete Koomen showed me how to do this. You can take almost anything that you do all the time and just drop it into a context window. And then say, “Here’s a bunch of inputs and outputs." And maybe you also add a bunch of notes. And then you tell it, “Write me a prompt that can act as an agent that takes this input and makes this output over here.” You can do this for almost any type of knowledge work. And you can even introspect. "What are things you notice that I did to convert this from the input to the output?”. And then you can just start using the prompt. Initially, it’s going to suck. Because it’s just not that smart yet. But what’s funny is now, I also use it to Iterate my writing. You can be very direct, "I would never say that", "Don’t say it like this", or "Oh, you used the long word there, use the short word". Just speak to it conversationally. And then when you're happy with the output, you can use that new output to make a new prompt. "Based on this conversation, give me a better initial prompt that incorporates all the things we talked about." And you can do this with literally everything. And in theory, there’s so much it applies to that people do day-to-day. You could use it for tweets. You could use it for editing podcasts. You can use it for pretty much everything. I have a folder of prompts that I use all the time. My YouTube prompt is on v27 or something. I'll go through this process with all the different max models. I'll use GPT 5.2 Pro. I’ll use Grok. I'll use Claude. Then, I’ll take all the outputs from all the models and put them into Claude and say "Here’s my prompt, here’s the output from four LLMs, including yourself. Rate each response and tell me what the pros and cons of each approach are." And I usually say "give it to me in numbered form". And then you can agree with one, disagree with two, tell it three is this or that. And then after that, you say given all of this, synthesize it."

I asked Garry Tan how to use meta prompting to get better at AI: "My partners at YC Jared Friedman and Pete Koomen showed me how to do this. You can take almost anything that you do all the time and just drop it into a context window. And then say, “Here’s a bunch of inputs and outputs." And maybe you also add a bunch of notes. And then you tell it, “Write me a prompt that can act as an agent that takes this input and makes this output over here.” You can do this for almost any type of knowledge work. And you can even introspect. "What are things you notice that I did to convert this from the input to the output?”. And then you can just start using the prompt. Initially, it’s going to suck. Because it’s just not that smart yet. But what’s funny is now, I also use it to Iterate my writing. You can be very direct, "I would never say that", "Don’t say it like this", or "Oh, you used the long word there, use the short word". Just speak to it conversationally. And then when you're happy with the output, you can use that new output to make a new prompt. "Based on this conversation, give me a better initial prompt that incorporates all the things we talked about." And you can do this with literally everything. And in theory, there’s so much it applies to that people do day-to-day. You could use it for tweets. You could use it for editing podcasts. You can use it for pretty much everything. I have a folder of prompts that I use all the time. My YouTube prompt is on v27 or something. I'll go through this process with all the different max models. I'll use GPT 5.2 Pro. I’ll use Grok. I'll use Claude. Then, I’ll take all the outputs from all the models and put them into Claude and say "Here’s my prompt, here’s the output from four LLMs, including yourself. Rate each response and tell me what the pros and cons of each approach are." And I usually say "give it to me in numbered form". And then you can agree with one, disagree with two, tell it three is this or that. And then after that, you say given all of this, synthesize it."

The Peel

51,632 Aufrufe • vor 3 Monaten

If you watch this ~50 minute screen recording closely (yeah, I know, it's long; there are also some times when my computer was very slow and laggy, just skip past that part. And at one point I had to run and get my 9-month-old a new bottle and left it on a boring screen, sorry!), I believe you can see real signs of the kind of runaway, recursive AI self-improvement that people have been warning of for a while (Mr. Kurzweil most notably and prophetically). Why do I say that? What's different now? Well, there's a reason my set of agent coding tooling is called the Flywheel. These tools all mutually self-reinforce each other. And they all flow directly into my ntm tool (short for "named_tmux_manager"), which acts as a sort of integration point and nerve center for the tools (this is becoming more true by the minute as I'm now seriously working on ntm). Now, ntm was something I started making to automate some aspects of my workflow, but it was the kind of thing where, until it was perfect, it sort of just slowed me down. So I didn't actually use it even though I kept working on it and trying to improve it, and suggested to users that they try it in my tutorials. Well anyway, I finally got around to "dogfooding" ntm last night, and now it's going to get very dramatically better at an alarming rate. Some of that is from applying my "idea wizard" prompt to generate more useful features and building that stuff out and addressing obvious pain points I encountered during my newfound usage of the tool. But a lot comes from my realization that, once again, ntm's true utility is not as a tool for ME, but for an agent. That is, ntm lets one instance of Claude Code or Codex act as, well, me, do the things that I had been doing manually. Do I wish I had started using ntm earlier? No, for two big reasons: 1) Doing it manually helped me build up my intuition massively, which directly led me down the path of creating useful prompt strategies and workflows; these often began as ad-hoc prompts that I realized could be generalized and made more versatile/universal. Lesson: don't prematurely automate until you have an intimate, intuitive feel for your "core value-add loop." Otherwise you'll have a fully automated system quickly that efficiently and automatically does a stupid or otherwise sub-optimal thing. 2) My eyes have been opened to the beauty and power of Skills. I'm not talking about your garden-variety skills that are just a simple markdown file. I'm talking about true tour-de-force directories of perfectly structured and organized files that are filled with good information, insights, workflows, etc., but presented in a way that is highly optimized for consumption by AI agents, with extreme attention paid to things like perfect progressive disclosure, token density, agent-ergonomics, agent-intuitiveness, etc. And also Skills that go way beyond markdown files, with full integration into Claude Code where it makes sense via hooks, sub-agents, and even Python scripts. These kinds of skills are a qualitative difference in expressive power and usefulness and a total game changer. They are also effectively composable, creating almost an algebra of skills that let you use them together in powerful ways. I'm working on a subscription service website and CLI tool now to share what I've learned here most effectively, stay tuned for that in the coming days. Anyway, I now know what to make and how to make it. So, getting back to that screen recording, what does it show that makes me claim recursive self-improvement is here? If you keep your eye on the upper left tmux pane, that's the "controller" agent. It is using ntm to control all the other panes which are also running Claude Code (but ntm fully supports other agent types like Codex and Gemini-CLI, and it's trivially easy to mix and match them if you wanted to have, say, 8 CCs and 6 Codexes for writing the code and 3 Gemini-CLIs for reviewing code.) Now, there's nothing that crazy about this much so far. But where it starts to get very cool is that as the session continues and we encounter real-world problems, things like my ridiculously overloaded computer that keeps hanging for long periods, Claude Code instances that crash and get into a frozen, unresponsive state, it can learn from that. And you can see it using my skill writing skill to refine its ntm vibe coding skill in real time. And then take that skill and refine it to be more intuitive for itself. Or use my cass tool skill to search all the session histories to look for problems that came up and strategize how to solve them. The most useful part was when, towards the end of the session, I told it to reflect on all the things we had done and problems we encountered. One way it can usefully leverage those reflections is by improving its ntm vibe coding skill to make it cover more edge cases and exigencies. But the other, more fundamental, way is for it to conceive of and design the optimal new features and functionality for ntm itself so that the tool embodies those lessons in a first-class way. This offloads cognition from its brain onto its tooling, just like how a person can lean on spellcheck or a calculator. It codifies correct, effective reasoning at the tool level, where it's more reliable and robust and repeatable. And btw, did you notice what code base it was working on the whole time? It was none other than ntm itself! So as it worked on its own tool, it had reflections and ideas about how to further improve the tool. Now, it could have just as easily gotten those insights and ideas while using ntm to work on a different project, but the fact that it was working on itself is almost gloriously meta and recursive. So by the end, after learning from tending to a big group of agent workers (btw, I have previously emphasized doing everything in a really distributed/decentralized way, where each fungible agent gets identical marching orders that tell it to use my bv tool to find the optimal bead to work on. This does work very well, but occasionally results in some contention and overlap from thundering herd, or at least wastes time/tokens/communication in avoiding that before the agents waste time duplicating work. But in this new ntm-oriented workflow, I was able to have the controller agent in the upper left use bv itself and then optimally parcel out the instructions to each agent so that we could know for sure that there's no overlap), I ended up with a ton of new beads for new features, which I had it optimize and polish a few times. Now I can swap to a new Claude Max account and have the swarm implement all those new features! It should only take a couple passes like the one shown in the screen recording to get everything implemented. Then we can rinse and repeat, having the agent read through the full session histories of each agent and its experience from its own session in sending ntm commands and seeing how they worked out in practice, to come up with the next batch of changes to both its ntm vibe coding skill AND to the ntm tool itself. Do you see how rapidly this turns into Skynet? My mistake earlier was in focusing on making myself a "faster horse" as Henry Ford used to joke about customers wanting before he showed them what they should really want (a Model T). That is, something that would make my experience nicer while doing this agent swarm based development workflow. But the obvious lesson is that you should make all your tooling agent-first because the agents are just better at this stuff. You can still watch, and of course I did add a ridiculous number of very nice human-centric features to ntm that you'll be seeing in the next day or two, but those are really kind of "for fun" to make us humans feel better about the process. All the real value-add is happening "by agents, for agents." PS: Towards the end, you can see me switch to my Mac and tell Claude to improve the skill that I made earlier today for taking the mkv screen recording files from OBS Studio and muxing them into MP4 files for sharing, while downloading songs from YouTube to serve as the background music. I made it so it can also grab the thumbnails and generate little song credit cards that show up in the lower right corner. This worked perfectly the first time! I'll include some screenshots in a response post showing how that worked, but it was awesome to witness. Skills are POWERFUL. I'll also post a link to this video on YouTube if you prefer to watch it there.

If you watch this ~50 minute screen recording closely (yeah, I know, it's long; there are also some times when my computer was very slow and laggy, just skip past that part. And at one point I had to run and get my 9-month-old a new bottle and left it on a boring screen, sorry!), I believe you can see real signs of the kind of runaway, recursive AI self-improvement that people have been warning of for a while (Mr. Kurzweil most notably and prophetically). Why do I say that? What's different now? Well, there's a reason my set of agent coding tooling is called the Flywheel. These tools all mutually self-reinforce each other. And they all flow directly into my ntm tool (short for "named_tmux_manager"), which acts as a sort of integration point and nerve center for the tools (this is becoming more true by the minute as I'm now seriously working on ntm). Now, ntm was something I started making to automate some aspects of my workflow, but it was the kind of thing where, until it was perfect, it sort of just slowed me down. So I didn't actually use it even though I kept working on it and trying to improve it, and suggested to users that they try it in my tutorials. Well anyway, I finally got around to "dogfooding" ntm last night, and now it's going to get very dramatically better at an alarming rate. Some of that is from applying my "idea wizard" prompt to generate more useful features and building that stuff out and addressing obvious pain points I encountered during my newfound usage of the tool. But a lot comes from my realization that, once again, ntm's true utility is not as a tool for ME, but for an agent. That is, ntm lets one instance of Claude Code or Codex act as, well, me, do the things that I had been doing manually. Do I wish I had started using ntm earlier? No, for two big reasons: 1) Doing it manually helped me build up my intuition massively, which directly led me down the path of creating useful prompt strategies and workflows; these often began as ad-hoc prompts that I realized could be generalized and made more versatile/universal. Lesson: don't prematurely automate until you have an intimate, intuitive feel for your "core value-add loop." Otherwise you'll have a fully automated system quickly that efficiently and automatically does a stupid or otherwise sub-optimal thing. 2) My eyes have been opened to the beauty and power of Skills. I'm not talking about your garden-variety skills that are just a simple markdown file. I'm talking about true tour-de-force directories of perfectly structured and organized files that are filled with good information, insights, workflows, etc., but presented in a way that is highly optimized for consumption by AI agents, with extreme attention paid to things like perfect progressive disclosure, token density, agent-ergonomics, agent-intuitiveness, etc. And also Skills that go way beyond markdown files, with full integration into Claude Code where it makes sense via hooks, sub-agents, and even Python scripts. These kinds of skills are a qualitative difference in expressive power and usefulness and a total game changer. They are also effectively composable, creating almost an algebra of skills that let you use them together in powerful ways. I'm working on a subscription service website and CLI tool now to share what I've learned here most effectively, stay tuned for that in the coming days. Anyway, I now know what to make and how to make it. So, getting back to that screen recording, what does it show that makes me claim recursive self-improvement is here? If you keep your eye on the upper left tmux pane, that's the "controller" agent. It is using ntm to control all the other panes which are also running Claude Code (but ntm fully supports other agent types like Codex and Gemini-CLI, and it's trivially easy to mix and match them if you wanted to have, say, 8 CCs and 6 Codexes for writing the code and 3 Gemini-CLIs for reviewing code.) Now, there's nothing that crazy about this much so far. But where it starts to get very cool is that as the session continues and we encounter real-world problems, things like my ridiculously overloaded computer that keeps hanging for long periods, Claude Code instances that crash and get into a frozen, unresponsive state, it can learn from that. And you can see it using my skill writing skill to refine its ntm vibe coding skill in real time. And then take that skill and refine it to be more intuitive for itself. Or use my cass tool skill to search all the session histories to look for problems that came up and strategize how to solve them. The most useful part was when, towards the end of the session, I told it to reflect on all the things we had done and problems we encountered. One way it can usefully leverage those reflections is by improving its ntm vibe coding skill to make it cover more edge cases and exigencies. But the other, more fundamental, way is for it to conceive of and design the optimal new features and functionality for ntm itself so that the tool embodies those lessons in a first-class way. This offloads cognition from its brain onto its tooling, just like how a person can lean on spellcheck or a calculator. It codifies correct, effective reasoning at the tool level, where it's more reliable and robust and repeatable. And btw, did you notice what code base it was working on the whole time? It was none other than ntm itself! So as it worked on its own tool, it had reflections and ideas about how to further improve the tool. Now, it could have just as easily gotten those insights and ideas while using ntm to work on a different project, but the fact that it was working on itself is almost gloriously meta and recursive. So by the end, after learning from tending to a big group of agent workers (btw, I have previously emphasized doing everything in a really distributed/decentralized way, where each fungible agent gets identical marching orders that tell it to use my bv tool to find the optimal bead to work on. This does work very well, but occasionally results in some contention and overlap from thundering herd, or at least wastes time/tokens/communication in avoiding that before the agents waste time duplicating work. But in this new ntm-oriented workflow, I was able to have the controller agent in the upper left use bv itself and then optimally parcel out the instructions to each agent so that we could know for sure that there's no overlap), I ended up with a ton of new beads for new features, which I had it optimize and polish a few times. Now I can swap to a new Claude Max account and have the swarm implement all those new features! It should only take a couple passes like the one shown in the screen recording to get everything implemented. Then we can rinse and repeat, having the agent read through the full session histories of each agent and its experience from its own session in sending ntm commands and seeing how they worked out in practice, to come up with the next batch of changes to both its ntm vibe coding skill AND to the ntm tool itself. Do you see how rapidly this turns into Skynet? My mistake earlier was in focusing on making myself a "faster horse" as Henry Ford used to joke about customers wanting before he showed them what they should really want (a Model T). That is, something that would make my experience nicer while doing this agent swarm based development workflow. But the obvious lesson is that you should make all your tooling agent-first because the agents are just better at this stuff. You can still watch, and of course I did add a ridiculous number of very nice human-centric features to ntm that you'll be seeing in the next day or two, but those are really kind of "for fun" to make us humans feel better about the process. All the real value-add is happening "by agents, for agents." PS: Towards the end, you can see me switch to my Mac and tell Claude to improve the skill that I made earlier today for taking the mkv screen recording files from OBS Studio and muxing them into MP4 files for sharing, while downloading songs from YouTube to serve as the background music. I made it so it can also grab the thumbnails and generate little song credit cards that show up in the lower right corner. This worked perfectly the first time! I'll include some screenshots in a response post showing how that worked, but it was awesome to witness. Skills are POWERFUL. I'll also post a link to this video on YouTube if you prefer to watch it there.

Jeffrey Emanuel

25,483 Aufrufe • vor 5 Monaten

The #1 problem with coding agents right now: Ask them to solve one problem, and they will make 10 other changes you didn't want. This happens to me every day. It happens to everyone I talk to as well. We have a solution for this now. The team Augment Code released a "Task List" feature for their coding assistant that solves this problem. Augment Code is partnering with me on this post. In case you haven't used them before: • Augment Code is a fully-fledged coding assistant • Their specialty are large projects • Fastest coding indexing I've seen • Has a free forever community edition Now, you can ask their coding agent to generate a Task List before doing anything. This will give you a plan you can review, edit, and augment if you need to. You can export this plan, load it on a different session, or even share it across projects. It makes a huge difference: The task list constrains the agent so you won't get any "unintended" changes anymore. It also puts you in control of everything the agent does. Check the video to see the agent working through a task list. You can also try this 100% free: (By the way, they also have support for remote agents. You can basically have those agents write your code while you are sleeping.)

The #1 problem with coding agents right now: Ask them to solve one problem, and they will make 10 other changes you didn't want. This happens to me every day. It happens to everyone I talk to as well. We have a solution for this now. The team Augment Code released a "Task List" feature for their coding assistant that solves this problem. Augment Code is partnering with me on this post. In case you haven't used them before: • Augment Code is a fully-fledged coding assistant • Their specialty are large projects • Fastest coding indexing I've seen • Has a free forever community edition Now, you can ask their coding agent to generate a Task List before doing anything. This will give you a plan you can review, edit, and augment if you need to. You can export this plan, load it on a different session, or even share it across projects. It makes a huge difference: The task list constrains the agent so you won't get any "unintended" changes anymore. It also puts you in control of everything the agent does. Check the video to see the agent working through a task list. You can also try this 100% free: (By the way, they also have support for remote agents. You can basically have those agents write your code while you are sleeping.)

Santiago

41,738 Aufrufe • vor 10 Monaten

New Tools for a New Era. Coding agents like Claude and Cursor have dramatically reduced the time it takes to go from idea to functional software. But the experience of designing and refining with them sucks. One reason for this is that while the terminal is an incredible tool for communicating direction with language, it is a terrible tool for defining and exploring visual and interactive objects. Here is one idea for how we might fix a small part of that. In the old world, when you wanted to create a transition or animation in your app, you would type some code, refresh your local server, and click to run your animation. It probably wasn't right, because after all no one can know what 'cubic-bezier(0.3, 0.05, 0.45, 1)' really feels like when you read it. You need to see it. Feel it. Interact with it in a real world context. So you'd edit some values, save, refresh, and keep guessing and checking until it felt right. Today, you can write a quick, single-use tool that's a visual studio for designing animations. You can then configure some components and containers common in all apps, and explore different animations in real time, adjusting key properties, and getting it just right. Then, you can copy a highly detailed prompt (or export a skill containing all your animations) that captures your intent and direction with perfect clarity. Paste this into your terminal and your agent instantly implements it everywhere. To me, this is an improvement over the old world, and a better way to work in today's. I'm extremely excited to see the ways in which our ability to rapidly create software will shape how we design software tomorrow. Feedback, ideas, and critiques welcome!

New Tools for a New Era. Coding agents like Claude and Cursor have dramatically reduced the time it takes to go from idea to functional software. But the experience of designing and refining with them sucks. One reason for this is that while the terminal is an incredible tool for communicating direction with language, it is a terrible tool for defining and exploring visual and interactive objects. Here is one idea for how we might fix a small part of that. In the old world, when you wanted to create a transition or animation in your app, you would type some code, refresh your local server, and click to run your animation. It probably wasn't right, because after all no one can know what 'cubic-bezier(0.3, 0.05, 0.45, 1)' really feels like when you read it. You need to see it. Feel it. Interact with it in a real world context. So you'd edit some values, save, refresh, and keep guessing and checking until it felt right. Today, you can write a quick, single-use tool that's a visual studio for designing animations. You can then configure some components and containers common in all apps, and explore different animations in real time, adjusting key properties, and getting it just right. Then, you can copy a highly detailed prompt (or export a skill containing all your animations) that captures your intent and direction with perfect clarity. Paste this into your terminal and your agent instantly implements it everywhere. To me, this is an improvement over the old world, and a better way to work in today's. I'm extremely excited to see the ways in which our ability to rapidly create software will shape how we design software tomorrow. Feedback, ideas, and critiques welcome!

joshpuckett

81,788 Aufrufe • vor 5 Monaten

Right now, I have a few dozen tools to automate my life. Every single thing that I do more than once every week is now a tool: either a Claude Code skill, a scheduled workflow, or an application. • I spend the time doing this once. • I schedule it. • I forget about it (or at least, try to) CREAO is one of the platforms I've used extensively for building some of these automations. They are partnering with me on this post. They are one of the only platforms where you can go from a conversation to a scheduled agent that quickly. This is how it works: 1. You describe the problem you want to solve 2. CREAO's agent builds the logic and executes it 3. You can iterate with the agent to improve the solution 4. When you are done, you turn that solution into a mini app 5. You can schedule that app to run any time you need it Something important: These agents are deterministic. They don't use an LLM to generate answers, so they will always return the same output given the same input. This is critical: when you're automating something that runs every Monday at 9 am, you need it to return the same result every time, not a "creative" answer.

Right now, I have a few dozen tools to automate my life. Every single thing that I do more than once every week is now a tool: either a Claude Code skill, a scheduled workflow, or an application. • I spend the time doing this once. • I schedule it. • I forget about it (or at least, try to) CREAO is one of the platforms I've used extensively for building some of these automations. They are partnering with me on this post. They are one of the only platforms where you can go from a conversation to a scheduled agent that quickly. This is how it works: 1. You describe the problem you want to solve 2. CREAO's agent builds the logic and executes it 3. You can iterate with the agent to improve the solution 4. When you are done, you turn that solution into a mini app 5. You can schedule that app to run any time you need it Something important: These agents are deterministic. They don't use an LLM to generate answers, so they will always return the same output given the same input. This is critical: when you're automating something that runs every Monday at 9 am, you need it to return the same result every time, not a "creative" answer.

Santiago

13,708 Aufrufe • vor 2 Monaten

I just shipped Design Feedback Agent using Replit ⠕ agent without writing a single line of code. You can use this agent to get constructive feedback on your designs--website, mobile app, email newsletter etc. Check out the video demo, and try it for yourself at My biggest takeaway as I continue to experiment with these tools is how good they are getting at building complex flows and how precise they are at taking design feedback and applying it accurately from a text prompt. I'm excited to build even more complex products to see how far I can push these tools. Let me know what you think of this agent and how you would improve it.

I just shipped Design Feedback Agent using Replit ⠕ agent without writing a single line of code. You can use this agent to get constructive feedback on your designs--website, mobile app, email newsletter etc. Check out the video demo, and try it for yourself at My biggest takeaway as I continue to experiment with these tools is how good they are getting at building complex flows and how precise they are at taking design feedback and applying it accurately from a text prompt. I'm excited to build even more complex products to see how far I can push these tools. Let me know what you think of this agent and how you would improve it.

Levin Stanley

17,762 Aufrufe • vor 1 Jahr